Alexander Helmut Jung

Alexander received a master's degree and a PhD degree in electrical engineering and signal processing from the Technical University of Vienna in 2008 and 2012, respectively. Alexander is currently an assistant professor for Machine Learning at Aalto University and serves as an associate editor for IEEE Signal Processing Letters and Chapter Chair for the IEEE Finland Joint Chapter SP/CAS. He has received a Best Student Paper Award at IEEE ICASSP 2011, an Amazon Web Services Machine Learning Award in 2018 and was chosen as the Teacher of the Year 2018 by the Department of Computer Science at Aalto University.

 

Networked Federated Multitask Learning

  12/noviembre/2021  Seminario PISIS-UANL 2021      Asistencia : 17

Introduction

Federated learning (FL) is a recent main thread of machine learning which studies collaborative learning from heterogeneous collections of local datasets. In this talk, the speaker discusses the current methods for distributed learning of personalized "tailored" models for local datasets. These methods are based on regularized empirical risk minimization using total variation as a regularizer. A scalable distribution learning algorithm is obtainable via primal-dual methods for total variation minimization for this type of problem. . Furthermore, the statistical properties of these methods can be analyzed conveniently via a duality between total variation minimization and network flow optimization. This work uses this duality to derive sufficient conditions for total variation minimization to learn accurate local models.

Summary

In this talk, the speaker explains that he has two main focus topics, Networked federated multitask learning and explainable machine learning. For this second topic, it is very important to have a comprehensive view of how machine learning works, especially when we work with fields that need a level of confidence in the results these models show.

Explainable machine learning looks into methods that can explain or compute explanations from the predictions we get from models, and these explanations can come in different forms: relevant example of training set, subset of features, counterfactuals, free text explanation, or a court sentence. From the perspective of information theory, it searches to construct explanations to answer the entropy of the predictions the model generates. This would make it easy to explain how this model reaches an optimal explanation,as it can be explained with a mathematical or linear problem.

The speaker uses an undirected graph called empirical graph to model network data, which broadly represents such network data. Each node in the empirical graph represents a local dataset that can be represented in different forms. This local dataset is related to other local datasets with “similar" features encoded in the edges or links of the empirical graph. There is also a quantifiable similarity value , assigning a weight number indicating how strong the similarity is between two connected local data sets.

The main goal of this work is to fit a model with data to learn something from it. The aforementioned network structure is used. The weights are assigned based on a parameter vector consisting of parametric models such as linear regression, a generalized linear model, Gaussian-Markov random field or any kind of model that can be parameterized with a finite set of numbers. There is also a need for a loss function tto find a good choice for these weights or parameters. The loss function works when we are training our model. This function helps to minimize the error by comparing itself with the desired output and changing the value of the weights to better its result.

A measure representing the amount of variation of the weight vectors over the edges is defined to help form the clusters, enforcing a generalized total variation, the value-seeking to minimize the weight vectors approximately constant over the well-connected subset of clusters. To minimize the Generalized Total Variation (GTV), dual problems are used.

The use of dual problems allows one to simultaneously solve the primal problem, ,a non-differentiable function that is more efficient. The study ends with a proximal point algorithm, resulting in a scalable message-passing algorithm that is robust against various failures, allows stochastic versions, no sensitive data exchanged, and handles imperfect communication links.

Conclusion

This quick overview of the entire work helps us understand the formulation of federated learning network data as a GTV minimization problem; this formulation helps unify the several existing approaches for federated learning. This GTV problem can be solved by an established primal-dual method for complex optimization. Stochastic methods that can build and make these types of problems more scalable and robust implementations.

Reseñas anteriores