Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ repos:
static|
args:
[
"--skip=*/algolia.js",
"--skip=*/algolia.js,**/*.svg",
"--ignore-words-list",
"rouge,crate",
]
16 changes: 10 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,6 @@ You can contribute to the AI Pocket Reference project in several ways:

3. **Install `pre-commit`**

```bash
# Install and set up pre-commit hooks
pip install pre-commit
pre-commit install
```

The pre-commit hooks help maintain consistent code quality across contributions
by:

Expand All @@ -91,6 +85,16 @@ You can contribute to the AI Pocket Reference project in several ways:
These checks run automatically before each commit to maintain the quality of
our pocket references.

pre-commit requires a valid python and pip installation. We recommend that
you create a virtual environment associated with this repository into
which you use the code below to install pre-commit

```bash
# Install and set up pre-commit hooks
pip install pre-commit
pre-commit install
```

4. **Make Your Changes**

The AI Pocket Reference is organized as a collection of mdBooks, with separate
Expand Down
14 changes: 14 additions & 0 deletions books/_common/mdbook-ai-pocket-reference.css
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,17 @@
.coal .vector-logo .dark-logo,
.navy .vector-logo .dark-logo,
.ayu .vector-logo .dark-logo { display: block; }

/* Only invert algorithm SVGs in dark themes */
.coal img[src*="algorithm-"][src$=".svg"],
.navy img[src*="algorithm-"][src$=".svg"],
.ayu img[src*="algorithm-"][src$=".svg"] {
filter: invert(1) hue-rotate(180deg);
}

/* But don't invert logos or other specific SVGs */
.coal .vector-logo img,
.navy .vector-logo img,
.ayu .vector-logo img {
filter: none;
}
3 changes: 2 additions & 1 deletion books/fl/src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ Welcome to AI Pocket References: Federated Learning (FL) Collection. This compil
encapsulates core concepts as well as advanced methods for implementing FL — one
of the main techniques for building AI models in a decentralized setting.

Be sure to check out our other collections of [AI Pocket References!](https://vectorinstitute.github.io/ai-pocket-reference/)
Be sure to check out our other collections of
[AI Pocket References!](https://vectorinstitute.github.io/ai-pocket-reference/)
31 changes: 16 additions & 15 deletions books/fl/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,23 @@
# Federated Learning

- [Core Concepts](core/README.md)

- [Flavors of FL](core/fl_flavors.md)
- [Client](core/client.md)
- [Server](core/server.md)
- [Aggregation Strategy](core/strategy.md)

- [Aggregation](core/aggregation.md)
- [Horizontal FL](horizontal/README.md)

- [Aggregation Strategies](horizontal/aggregation/README.md)
- [FedAdam](horizontal/personalized/fedadam.md)
- [FedProx](horizontal/personalized/fedprox.md)
- [MOON](horizontal/personalized/moon.md)
- [Personalized FL](horizontal/personalized/README.md)
- [Ditto](horizontal/personalized/ditto.md)
- [FedPer](horizontal/personalized/fedper.md)
- [Fenda](horizontal/personalized/fenda.md)

- [Vertical FL](vertical/README.md)

- [Advanced Strategies](vertical/advanced/README.md)
- [Vanilla FL](horizontal/vanilla_fl/README.md)
- [FedSGD](horizontal/vanilla_fl/fedsgd.md)
- [FedAvg](horizontal/vanilla_fl/fedavg.md)
- [Robust Global FL]() <-- (horizontal/robust_global_fl/README.md) -->
- [FedAdam]() <-- (horizontal/robust_global_fl/fedadam.md) -->
- [FedProx]() <-- (horizontal/robust_global_fl/fedprox.md) -->
- [MOON]() <-- (horizontal/robust_global_fl/moon.md) -->
- [Personalized FL]() <-- (horizontal/personalized/README.md) -->
- [FedPer]() <-- (horizontal/personalized/fedper.md) -->
- [FENDA-FL]() <-- (horizontal/personalized/fenda.md) -->
- [Ditto]() <-- (horizontal/personalized/ditto.md) -->

- [Vertical FL]() <-- (vertical/README.md) -->
- [Advanced Strategies]() <-- (vertical/advanced/README.md) -->
15 changes: 14 additions & 1 deletion books/fl/src/core/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,14 @@
# Core Concepts

# Core Concepts in Federated Learning

{{ #aipr_header }}

In this chapter, we'll introduce several of the fundamental concepts for
understanding federated learning (FL). We begin by discussing some of the
different [flavors of FL](fl_flavors.md) and why they constitute
distinct subdomains each with their own applications, challenges, and research
literature. Next, we briefly discuss three of the most important building
blocks associated with FL pipelines: [Clients](client.md),
[Servers](server.md), and [Aggregation](aggregation.md).

{{#author emersodb}}
47 changes: 47 additions & 0 deletions books/fl/src/core/aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<!-- markdownlint-disable-file MD033 MD013 -->

# Aggregation Strategies

{{ #aipr_header }}

In FL workflows, servers are responsible for a number of crucial components, as
discussed in [Servers and FL Orchestration](server.md). One of these roles is
that of aggregation and synchronization of the results of distributed client
training processes. This is most prominent in Horizontal FL, where the server
is responsible for executing, among other things, an aggregation strategy.

In most Horizontal FL algorithms, there is a concept of a _server round_
wherein each decentralized client trains a model (or models) using local
training data. After local training has concluded, each client sends the model
weights back to the server. These model weights are combined into a single
set of weights using an aggregation strategy. One of the earliest forms of
such a strategy, and still one of the most widely used, is FedAvg.[^1]
In FedAvg, client model weights are combined using a weighted averaging scheme.
More details on this strategy can be found in [FedAvg](../horizontal/vanilla_fl/fedavg.md).

Other forms of FL, beyond Horizontal, incorporate aggregation strategies in
various forms. For example, in Vertical FL, the clients must synthesize
partial gradient information received from other clients in the system in order
to properly perform gradient descent for their local model split in SplitNN
algorithms.[^2] This process, however, isn't necessarily the responsibility of
an FL server. Nevertheless, aggregation strategies are most prominently
featured and the subject of significant research in Horizontal FL frameworks.
As is seen in the sections of [Horizontal Federated Learning](../horizontal/index.md),
many variations and extensions of FedAvg have been proposed to improve
convergence, deal with data heterogeneity challenges, stabilize training
dynamics, and produce better models. We'll dive into many of these advances
in subsequent chapters.

#### References & Useful Links <!-- markdownlint-disable-line MD001 -->

[^1]:
[H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas.
Communication-efficient learning of deep networks from decentralized data.
Proceedings of the 20th AISTATS, 2017.](https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf)

[^2]:
[Gupta, Otkrist and Raskar, Ramesh, Distributed learning of deep neural
network over multiple agents, Journal of Network and Computer Applications,
Vol.116, pp.1–8, 2018.](https://arxiv.org/pdf/1810.06060)

{{#author emersodb}}
51 changes: 50 additions & 1 deletion books/fl/src/core/client.md
Original file line number Diff line number Diff line change
@@ -1 +1,50 @@
# Client
<!-- markdownlint-disable-file MD033 -->

# The Role of Clients in Federated Learning

{{ #aipr_header }}

As discussed in [The Different Flavors of Federated Learning](fl_flavors.md),
FL is a collection of methods that aim to facilitate training ML models on
decentralized training datasets. The entities that house these datasets are
often referred to as clients. Any procedures that involve working directly
with raw data are typically the responsibility of the clients participating in
the FL systems. In addition, clients are only privy to their own local datasets
and generally receive no raw data from other participants.

Some FL methods consider the use of related public or synthetic data,
potentially modeled after local client data. However, there are often caveats
to each of these settings. The former setting is restricted by the assumed
existence of relevant public data and the level of "relatedness" can have
notable implications in the FL process. In the latter setting, data synthesis
has privacy implications that might undermine the goal of keeping data separate
in the first place.

Because each client is canonically the only one with access to the data stored
in its dataset, they are predominantly responsible for model training, through
some mechanism, on their local data. In Horizontal FL, this often manifests as
performing some form of gradient-based optimization targeting a local loss
function incorporating local data. In Vertical FL, partial forward passes
and gradients are constructed based on information from the partial (local)
features in each client.

<figure>
<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/ClientDiagram.svg" alt="Client ", width="350"> <!-- markdownlint-disable-line MD013 -->
<figcaption>Visualization of some assets for FL clients.</figcaption>
</center>
</figure>

The figure above is a simplified illustration of the various resources housed
within an FL client. Each of these components needs to be considered to ensure
that federated training proceeds smoothly. For example, given the size of the
model to be trained and the desired training settings like batch size, will
the client have enough memory to perform backpropagation? Will the training
iterations complete in a reasonable amount of time? Is the network bandwidth
going to be sufficient to facilitate efficient communication with other
components of the FL system?

In subsequent chapters, we'll discuss the exact role clients play in FL, and
how they interact with other components of the FL system.

{{#author emersodb}}
165 changes: 165 additions & 0 deletions books/fl/src/core/fl_flavors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
<!-- markdownlint-disable-file MD033 MD013 -->

# The Different Flavors of Federated Learning

{{ #aipr_header }}

Machine learning (ML) models are most commonly trained on a centralized pool of
data, meaning that all training data is accessible to a single training
process. Federated learning (FL) is used to train ML models on decentralized
data, such that data is compartmentalized. The sites at which the data is held
and trained are typically referred to as clients. Training data is most often
decentralized when it cannot or should not be moved from its location. This
might be the case for various reasons, including privacy regulations, security
concerns, or resource constraints. Many industries are subject to strict
privacy laws, compliance requirements, or data handling requirements, among
other important considerations. As such, data centralization is often
infeasible or ill-advised. On the other hand, it is well known that access to
larger quantities of representative training data often leads to better ML
models.[^1] Thus, in spite of the potential challenges associated
with decentralized training, there is significant incentive to facilitate
distributed model training.

There are many different flavors of FL. Covering the full set of variations is
beyond the scope of these references. However, this reference will cover a few
of the major types considered in practice.

<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/Distributed Data Diagram.svg" alt="Decentralized Datasets", width="400">
</center>

## Horizontal Vs. Vertical FL

One of the primary distinctions in FL methodologies is whether one is aiming to
perform Horizontal or Vertical FL. The choice of methodological framework here
is primarily driven by the kind of training data that exists and why you are
doing FL in the first place.

### Horizontal FL: More Data, Same Features

In Horizontal FL, it is assumed that models will be trained on a **unified**
set of features and targets. That is, across the distributed datasets, each
training point has the same set of features with the same set of
interpretations, pre-processing steps, and ranges of potential values, for
example. The goal in Horizontal FL is to facilitate access to
**additional data points** during the training of a model. For more details, see
[Horizontal FL](../horizontal/index.md).

<figure>
<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/horizontal_fl.svg" alt="Horizontal FL", width="500">
<figcaption>Feature spaces are shared between clients, enabling access to more unique training data points.</figcaption>
Comment thread
emersodb marked this conversation as resolved.
</center>
</figure>

### Vertical FL: More Features, Same Generators

While Horizontal FL is concerned with accessing more data points during training,
Vertical FL aims to add additional predictive features to improve model
predictions. In Vertical FL, there is a shared target or set of targets to be
predicted across distributed datasets and it is assumed that all datasets share
a non-empty intersection of "data generators" that can be "linked" in some way.
For example, the "data generators" might be individual customers of different
retailers. Two retailers, might want to collaboratively train a customer
segmentation model to improve predictions for their shared customer base. Each
retailer has unique information about the customer from their interactions
that, when combined, might improve prediction performance.

<figure>
<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/vertical_fl.svg" alt="Vertical FL", width="500">
<figcaption>"Data generators" are shared between clients with unique features.</figcaption>
</center>
</figure>

To produce a useful distributed training dataset in Vertical FL, datasets are
privately "aligned" such that only the intersection of "data generators" are
considered in training. In most cases, the datasets are ordered to ensure that
disparate features are meaningfully aligned by the underlying generator.
Depending on the properties of the datasets, fewer individual data points may
be available for training, but hopefully they have been enriched with
additional important features. For more details, see [Vertical FL](../vertical/index.md).

## Cross-Device Vs. Cross-Silo FL

An important distinction between standard ML training and decentralized model
training is the presence of multiple, and potentially diverse, compute
environments. Leaving aside settings with the possibility of significant
resource disparities across data hosting environments, there are still many
things to consider that influence the kinds of FL techniques to use. There are
two main categories with general, but not firm, separating characteristics:
Cross-Silo FL and Cross-Device FL. In the table below, key distinctions between
the two types of FL are summarized.

| Type | Cross-Silo | Cross-Device |
| --------------------- | -------------------------------------- | ----------------------------------- |
| **# of Participants** | Small- to medium-sized pool of clients | Large pool of participants |
| **Compute** | Moderate to large compute | Limited compute resources |
| **Dataset Size** | Moderate to large datasets | Typically small datasets |
| **Reliability** | Stable connection and participation | Potentially unreliable participants |

A quintessential example of a cross-device setting is training a model using
data housed on different cell-phones. There are potentially millions of devices
participating in training, each with limited computing resources. At any given
time, a phone must be switched off or disconnected from the internet.
Alternatively, cross-silo settings might arise in training a model between
companies or institutions, such as banks or hospitals. They likely have larger
datasets at each site and access to more computational resources. There will
be fewer participants in training, but they are more likely to reliably
contribute to the training system.

Knowing which category of FL one is operating in helps inform design decisions
and FL component choices. For example, the model being trained may need to be
below a certain size or the memory/compute needs of an FL technique might be
prohibitive. A good example of the latter is [Ditto](../horizontal/personalized/ditto.md),
which requires larger compute resources than many other methods.

## One Model or a Model Zoo

The final distinction that is highlighted here is whether the model architecture
to be trained is the same (homogeneous) across disparate sites or if it differs
(heterogeneous). In many settings, the goal is to train a homogeneous model
architecture across FL participants. In the context of Horizontal FL, this
implies that each client has an identical copy of the architecture with shared
feature and label dimensions, as in the figure below.

<figure>
<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/shared_labels.svg" alt="Homogeneous Architectures">
<figcaption>Each client participating in Horizontal FL typically trains the same architecture.</figcaption>
</center>
</figure>

Alternatively, there are FL techniques which aim to federally train collections
of heterogeneous architectures across clients.[^2] That is, each
participant in the FL system might be training a **different** model
architecture. Such a setting may arise, for example, if participants would
like to benefit from the expanded training data pool offered through Horizontal
FL, but want to train their own, proprietary model architecture, rather than a
shared model design across all clients. As another example, perhaps certain
participants, facing compute constraints, aim to train a model of more
manageable size given the resources at their disposal.

<figure>
<center>
<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/heterogeneous_architectures.svg" alt="Homogeneous Architectures">
<figcaption>Model heterogeneous FL attempts to wrangle a zoo of model architectures across participants.</figcaption>
</center>
</figure>

The primary focus of the current pocket references will consider the
homogeneous architecture setting. However, there is significant research across
each of the different flavors of FL discussed above.

#### References & Useful Links <!-- markdownlint-disable-line MD001 -->
Comment thread
emersodb marked this conversation as resolved.

[^1]:
[C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable
effectiveness of data in deep learning era. In ICCV 2017, pages 843–852, 2017. doi: 10.1109/ICCV.2017.97.](https://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Revisiting_Unreasonable_Effectiveness_ICCV_2017_paper.pdf)

[^2]:
[Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, and Dacheng Tao. 2023.
Heterogeneous Federated Learning: State-of-the-art and Research Challenges.
ACM Comput. Surv. 56, 3, Article 79 (March 2024), 44 pages. https://doi.org/10.1145/3625558](https://arxiv.org/pdf/2307.10616)

{{#author emersodb}}
1 change: 1 addition & 0 deletions books/fl/src/core/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Core Concepts
Loading