-
Notifications
You must be signed in to change notification settings - Fork 9
First set of FL pocket references #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
e45c285
Checking in WIP of the first set of FL pocket references
emersodb 6fbe568
Addressing Veronica's comments and fixing other issues after a proofr…
emersodb 84da59c
Addressing some comments from Andrei
emersodb e530be4
More PR comment changes
emersodb 84080ee
host images
nerdai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,14 @@ | ||
| # Core Concepts | ||
|
|
||
| # Core Concepts in Federated Learning | ||
|
|
||
| {{ #aipr_header }} | ||
|
|
||
| In this chapter, we'll introduce several of the fundamental concepts for | ||
| understanding federated learning (FL). We begin by discussing some of the | ||
| different [flavors of FL](fl_flavors.md) and why they constitute | ||
| distinct subdomains each with their own applications, challenges, and research | ||
| literature. Next, we briefly discuss three of the most important building | ||
| blocks associated with FL pipelines: [Clients](client.md), | ||
| [Servers](server.md), and [Aggregation](aggregation.md). | ||
|
|
||
| {{#author emersodb}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| <!-- markdownlint-disable-file MD033 MD013 --> | ||
|
|
||
| # Aggregation Strategies | ||
|
|
||
| {{ #aipr_header }} | ||
|
|
||
| In FL workflows, servers are responsible for a number of crucial components, as | ||
| discussed in [Servers and FL Orchestration](server.md). One of these roles is | ||
| that of aggregation and synchronization of the results of distributed client | ||
| training processes. This is most prominent in Horizontal FL, where the server | ||
| is responsible for executing, among other things, an aggregation strategy. | ||
|
|
||
| In most Horizontal FL algorithms, there is a concept of a _server round_ | ||
| wherein each decentralized client trains a model (or models) using local | ||
| training data. After local training has concluded, each client sends the model | ||
| weights back to the server. These model weights are combined into a single | ||
| set of weights using an aggregation strategy. One of the earliest forms of | ||
| such a strategy, and still one of the most widely used, is FedAvg.[^1] | ||
| In FedAvg, client model weights are combined using a weighted averaging scheme. | ||
| More details on this strategy can be found in [FedAvg](../horizontal/vanilla_fl/fedavg.md). | ||
|
|
||
| Other forms of FL, beyond Horizontal, incorporate aggregation strategies in | ||
| various forms. For example, in Vertical FL, the clients must synthesize | ||
| partial gradient information received from other clients in the system in order | ||
| to properly perform gradient descent for their local model split in SplitNN | ||
| algorithms.[^2] This process, however, isn't necessarily the responsibility of | ||
| an FL server. Nevertheless, aggregation strategies are most prominently | ||
| featured and the subject of significant research in Horizontal FL frameworks. | ||
| As is seen in the sections of [Horizontal Federated Learning](../horizontal/index.md), | ||
| many variations and extensions of FedAvg have been proposed to improve | ||
| convergence, deal with data heterogeneity challenges, stabilize training | ||
| dynamics, and produce better models. We'll dive into many of these advances | ||
| in subsequent chapters. | ||
|
|
||
| #### References & Useful Links <!-- markdownlint-disable-line MD001 --> | ||
|
|
||
| [^1]: | ||
| [H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. | ||
| Communication-efficient learning of deep networks from decentralized data. | ||
| Proceedings of the 20th AISTATS, 2017.](https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf) | ||
|
|
||
| [^2]: | ||
| [Gupta, Otkrist and Raskar, Ramesh, Distributed learning of deep neural | ||
| network over multiple agents, Journal of Network and Computer Applications, | ||
| Vol.116, pp.1–8, 2018.](https://arxiv.org/pdf/1810.06060) | ||
|
|
||
| {{#author emersodb}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,50 @@ | ||
| # Client | ||
| <!-- markdownlint-disable-file MD033 --> | ||
|
|
||
| # The Role of Clients in Federated Learning | ||
|
|
||
| {{ #aipr_header }} | ||
|
|
||
| As discussed in [The Different Flavors of Federated Learning](fl_flavors.md), | ||
| FL is a collection of methods that aim to facilitate training ML models on | ||
| decentralized training datasets. The entities that house these datasets are | ||
| often referred to as clients. Any procedures that involve working directly | ||
| with raw data are typically the responsibility of the clients participating in | ||
| the FL systems. In addition, clients are only privy to their own local datasets | ||
| and generally receive no raw data from other participants. | ||
|
|
||
| Some FL methods consider the use of related public or synthetic data, | ||
| potentially modeled after local client data. However, there are often caveats | ||
| to each of these settings. The former setting is restricted by the assumed | ||
| existence of relevant public data and the level of "relatedness" can have | ||
| notable implications in the FL process. In the latter setting, data synthesis | ||
| has privacy implications that might undermine the goal of keeping data separate | ||
| in the first place. | ||
|
|
||
| Because each client is canonically the only one with access to the data stored | ||
| in its dataset, they are predominantly responsible for model training, through | ||
| some mechanism, on their local data. In Horizontal FL, this often manifests as | ||
| performing some form of gradient-based optimization targeting a local loss | ||
| function incorporating local data. In Vertical FL, partial forward passes | ||
| and gradients are constructed based on information from the partial (local) | ||
| features in each client. | ||
|
|
||
| <figure> | ||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/ClientDiagram.svg" alt="Client ", width="350"> <!-- markdownlint-disable-line MD013 --> | ||
| <figcaption>Visualization of some assets for FL clients.</figcaption> | ||
| </center> | ||
| </figure> | ||
|
|
||
| The figure above is a simplified illustration of the various resources housed | ||
| within an FL client. Each of these components needs to be considered to ensure | ||
| that federated training proceeds smoothly. For example, given the size of the | ||
| model to be trained and the desired training settings like batch size, will | ||
| the client have enough memory to perform backpropagation? Will the training | ||
| iterations complete in a reasonable amount of time? Is the network bandwidth | ||
| going to be sufficient to facilitate efficient communication with other | ||
| components of the FL system? | ||
|
|
||
| In subsequent chapters, we'll discuss the exact role clients play in FL, and | ||
| how they interact with other components of the FL system. | ||
|
|
||
| {{#author emersodb}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,165 @@ | ||
| <!-- markdownlint-disable-file MD033 MD013 --> | ||
|
|
||
| # The Different Flavors of Federated Learning | ||
|
|
||
| {{ #aipr_header }} | ||
|
|
||
| Machine learning (ML) models are most commonly trained on a centralized pool of | ||
| data, meaning that all training data is accessible to a single training | ||
| process. Federated learning (FL) is used to train ML models on decentralized | ||
| data, such that data is compartmentalized. The sites at which the data is held | ||
| and trained are typically referred to as clients. Training data is most often | ||
| decentralized when it cannot or should not be moved from its location. This | ||
| might be the case for various reasons, including privacy regulations, security | ||
| concerns, or resource constraints. Many industries are subject to strict | ||
| privacy laws, compliance requirements, or data handling requirements, among | ||
| other important considerations. As such, data centralization is often | ||
| infeasible or ill-advised. On the other hand, it is well known that access to | ||
| larger quantities of representative training data often leads to better ML | ||
| models.[^1] Thus, in spite of the potential challenges associated | ||
| with decentralized training, there is significant incentive to facilitate | ||
| distributed model training. | ||
|
|
||
| There are many different flavors of FL. Covering the full set of variations is | ||
| beyond the scope of these references. However, this reference will cover a few | ||
| of the major types considered in practice. | ||
|
|
||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/Distributed Data Diagram.svg" alt="Decentralized Datasets", width="400"> | ||
| </center> | ||
|
|
||
| ## Horizontal Vs. Vertical FL | ||
|
|
||
| One of the primary distinctions in FL methodologies is whether one is aiming to | ||
| perform Horizontal or Vertical FL. The choice of methodological framework here | ||
| is primarily driven by the kind of training data that exists and why you are | ||
| doing FL in the first place. | ||
|
|
||
| ### Horizontal FL: More Data, Same Features | ||
|
|
||
| In Horizontal FL, it is assumed that models will be trained on a **unified** | ||
| set of features and targets. That is, across the distributed datasets, each | ||
| training point has the same set of features with the same set of | ||
| interpretations, pre-processing steps, and ranges of potential values, for | ||
| example. The goal in Horizontal FL is to facilitate access to | ||
| **additional data points** during the training of a model. For more details, see | ||
| [Horizontal FL](../horizontal/index.md). | ||
|
|
||
| <figure> | ||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/horizontal_fl.svg" alt="Horizontal FL", width="500"> | ||
| <figcaption>Feature spaces are shared between clients, enabling access to more unique training data points.</figcaption> | ||
| </center> | ||
| </figure> | ||
|
|
||
| ### Vertical FL: More Features, Same Generators | ||
|
|
||
| While Horizontal FL is concerned with accessing more data points during training, | ||
| Vertical FL aims to add additional predictive features to improve model | ||
| predictions. In Vertical FL, there is a shared target or set of targets to be | ||
| predicted across distributed datasets and it is assumed that all datasets share | ||
| a non-empty intersection of "data generators" that can be "linked" in some way. | ||
| For example, the "data generators" might be individual customers of different | ||
| retailers. Two retailers, might want to collaboratively train a customer | ||
| segmentation model to improve predictions for their shared customer base. Each | ||
| retailer has unique information about the customer from their interactions | ||
| that, when combined, might improve prediction performance. | ||
|
|
||
| <figure> | ||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/vertical_fl.svg" alt="Vertical FL", width="500"> | ||
| <figcaption>"Data generators" are shared between clients with unique features.</figcaption> | ||
| </center> | ||
| </figure> | ||
|
|
||
| To produce a useful distributed training dataset in Vertical FL, datasets are | ||
| privately "aligned" such that only the intersection of "data generators" are | ||
| considered in training. In most cases, the datasets are ordered to ensure that | ||
| disparate features are meaningfully aligned by the underlying generator. | ||
| Depending on the properties of the datasets, fewer individual data points may | ||
| be available for training, but hopefully they have been enriched with | ||
| additional important features. For more details, see [Vertical FL](../vertical/index.md). | ||
|
|
||
| ## Cross-Device Vs. Cross-Silo FL | ||
|
|
||
| An important distinction between standard ML training and decentralized model | ||
| training is the presence of multiple, and potentially diverse, compute | ||
| environments. Leaving aside settings with the possibility of significant | ||
| resource disparities across data hosting environments, there are still many | ||
| things to consider that influence the kinds of FL techniques to use. There are | ||
| two main categories with general, but not firm, separating characteristics: | ||
| Cross-Silo FL and Cross-Device FL. In the table below, key distinctions between | ||
| the two types of FL are summarized. | ||
|
|
||
| | Type | Cross-Silo | Cross-Device | | ||
| | --------------------- | -------------------------------------- | ----------------------------------- | | ||
| | **# of Participants** | Small- to medium-sized pool of clients | Large pool of participants | | ||
| | **Compute** | Moderate to large compute | Limited compute resources | | ||
| | **Dataset Size** | Moderate to large datasets | Typically small datasets | | ||
| | **Reliability** | Stable connection and participation | Potentially unreliable participants | | ||
|
|
||
| A quintessential example of a cross-device setting is training a model using | ||
| data housed on different cell-phones. There are potentially millions of devices | ||
| participating in training, each with limited computing resources. At any given | ||
| time, a phone must be switched off or disconnected from the internet. | ||
| Alternatively, cross-silo settings might arise in training a model between | ||
| companies or institutions, such as banks or hospitals. They likely have larger | ||
| datasets at each site and access to more computational resources. There will | ||
| be fewer participants in training, but they are more likely to reliably | ||
| contribute to the training system. | ||
|
|
||
| Knowing which category of FL one is operating in helps inform design decisions | ||
| and FL component choices. For example, the model being trained may need to be | ||
| below a certain size or the memory/compute needs of an FL technique might be | ||
| prohibitive. A good example of the latter is [Ditto](../horizontal/personalized/ditto.md), | ||
| which requires larger compute resources than many other methods. | ||
|
|
||
| ## One Model or a Model Zoo | ||
|
|
||
| The final distinction that is highlighted here is whether the model architecture | ||
| to be trained is the same (homogeneous) across disparate sites or if it differs | ||
| (heterogeneous). In many settings, the goal is to train a homogeneous model | ||
| architecture across FL participants. In the context of Horizontal FL, this | ||
| implies that each client has an identical copy of the architecture with shared | ||
| feature and label dimensions, as in the figure below. | ||
|
|
||
| <figure> | ||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/shared_labels.svg" alt="Homogeneous Architectures"> | ||
| <figcaption>Each client participating in Horizontal FL typically trains the same architecture.</figcaption> | ||
| </center> | ||
| </figure> | ||
|
|
||
| Alternatively, there are FL techniques which aim to federally train collections | ||
| of heterogeneous architectures across clients.[^2] That is, each | ||
| participant in the FL system might be training a **different** model | ||
| architecture. Such a setting may arise, for example, if participants would | ||
| like to benefit from the expanded training data pool offered through Horizontal | ||
| FL, but want to train their own, proprietary model architecture, rather than a | ||
| shared model design across all clients. As another example, perhaps certain | ||
| participants, facing compute constraints, aim to train a model of more | ||
| manageable size given the resources at their disposal. | ||
|
|
||
| <figure> | ||
| <center> | ||
| <img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/heterogeneous_architectures.svg" alt="Homogeneous Architectures"> | ||
| <figcaption>Model heterogeneous FL attempts to wrangle a zoo of model architectures across participants.</figcaption> | ||
| </center> | ||
| </figure> | ||
|
|
||
| The primary focus of the current pocket references will consider the | ||
| homogeneous architecture setting. However, there is significant research across | ||
| each of the different flavors of FL discussed above. | ||
|
|
||
| #### References & Useful Links <!-- markdownlint-disable-line MD001 --> | ||
|
emersodb marked this conversation as resolved.
|
||
|
|
||
| [^1]: | ||
| [C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable | ||
| effectiveness of data in deep learning era. In ICCV 2017, pages 843–852, 2017. doi: 10.1109/ICCV.2017.97.](https://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Revisiting_Unreasonable_Effectiveness_ICCV_2017_paper.pdf) | ||
|
|
||
| [^2]: | ||
| [Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, and Dacheng Tao. 2023. | ||
| Heterogeneous Federated Learning: State-of-the-art and Research Challenges. | ||
| ACM Comput. Surv. 56, 3, Article 79 (March 2024), 44 pages. https://doi.org/10.1145/3625558](https://arxiv.org/pdf/2307.10616) | ||
|
|
||
| {{#author emersodb}} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # Core Concepts |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.