VectorInstitute · emersodb · May 9, 2025 · May 8, 2025 · May 8, 2025 · May 9, 2025
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -39,7 +39,7 @@ repos:
         static|
       args:
         [
-          "--skip=*/algolia.js",
+          "--skip=*/algolia.js,**/*.svg",
           "--ignore-words-list",
           "rouge,crate",
         ]
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -76,12 +76,6 @@ You can contribute to the AI Pocket Reference project in several ways:
 
 3. **Install `pre-commit`**
 
-   ```bash
-   # Install and set up pre-commit hooks
-   pip install pre-commit
-   pre-commit install
-   ```
-
    The pre-commit hooks help maintain consistent code quality across contributions
    by:
 
@@ -91,6 +85,16 @@ You can contribute to the AI Pocket Reference project in several ways:
    These checks run automatically before each commit to maintain the quality of
    our pocket references.
 
+   pre-commit requires a valid python and pip installation. We recommend that
+   you create a virtual environment associated with this repository into
+   which you use the code below to install pre-commit
+
+   ```bash
+   # Install and set up pre-commit hooks
+   pip install pre-commit
+   pre-commit install
+   ```
+
 4. **Make Your Changes**
 
    The AI Pocket Reference is organized as a collection of mdBooks, with separate

diff --git a/books/_common/mdbook-ai-pocket-reference.css b/books/_common/mdbook-ai-pocket-reference.css
@@ -19,3 +19,17 @@
 .coal .vector-logo .dark-logo,
 .navy .vector-logo .dark-logo,
 .ayu .vector-logo .dark-logo { display: block; }
+
+/* Only invert algorithm SVGs in dark themes */
+.coal img[src*="algorithm-"][src$=".svg"],
+.navy img[src*="algorithm-"][src$=".svg"],
+.ayu img[src*="algorithm-"][src$=".svg"] {
+  filter: invert(1) hue-rotate(180deg);
+}
+
+/* But don't invert logos or other specific SVGs */
+.coal .vector-logo img,
+.navy .vector-logo img,
+.ayu .vector-logo img {
+  filter: none;
+}
diff --git a/books/fl/src/README.md b/books/fl/src/README.md
@@ -4,4 +4,5 @@ Welcome to AI Pocket References: Federated Learning (FL) Collection. This compil
 encapsulates core concepts as well as advanced methods for implementing FL — one
 of the main techniques for building AI models in a decentralized setting.
 
-Be sure to check out our other collections of [AI Pocket References!](https://vectorinstitute.github.io/ai-pocket-reference/)
+Be sure to check out our other collections of
+[AI Pocket References!](https://vectorinstitute.github.io/ai-pocket-reference/)
diff --git a/books/fl/src/SUMMARY.md b/books/fl/src/SUMMARY.md
@@ -11,22 +11,23 @@
 # Federated Learning
 
 - [Core Concepts](core/README.md)
-
+  - [Flavors of FL](core/fl_flavors.md)
   - [Client](core/client.md)
   - [Server](core/server.md)
-  - [Aggregation Strategy](core/strategy.md)
-
+  - [Aggregation](core/aggregation.md)
 - [Horizontal FL](horizontal/README.md)
 
-  - [Aggregation Strategies](horizontal/aggregation/README.md)
-    - [FedAdam](horizontal/personalized/fedadam.md)
-    - [FedProx](horizontal/personalized/fedprox.md)
-    - [MOON](horizontal/personalized/moon.md)
-  - [Personalized FL](horizontal/personalized/README.md)
-    - [Ditto](horizontal/personalized/ditto.md)
-    - [FedPer](horizontal/personalized/fedper.md)
-    - [Fenda](horizontal/personalized/fenda.md)
-
-- [Vertical FL](vertical/README.md)
-
-  - [Advanced Strategies](vertical/advanced/README.md)
+  - [Vanilla FL](horizontal/vanilla_fl/README.md)
+    - [FedSGD](horizontal/vanilla_fl/fedsgd.md)
+    - [FedAvg](horizontal/vanilla_fl/fedavg.md)
+  - [Robust Global FL]() <-- (horizontal/robust_global_fl/README.md) -->
+    - [FedAdam]() <-- (horizontal/robust_global_fl/fedadam.md) -->
+    - [FedProx]() <-- (horizontal/robust_global_fl/fedprox.md) -->
+    - [MOON]() <-- (horizontal/robust_global_fl/moon.md) -->
+  - [Personalized FL]() <-- (horizontal/personalized/README.md) -->
+    - [FedPer]() <-- (horizontal/personalized/fedper.md) -->
+    - [FENDA-FL]() <-- (horizontal/personalized/fenda.md) -->
+    - [Ditto]() <-- (horizontal/personalized/ditto.md) -->
+
+- [Vertical FL]() <-- (vertical/README.md) -->
+  - [Advanced Strategies]() <-- (vertical/advanced/README.md) -->
diff --git a/books/fl/src/core/README.md b/books/fl/src/core/README.md
@@ -1 +1,14 @@
-# Core Concepts
+
+# Core Concepts in Federated Learning
+
+{{ #aipr_header }}
+
+In this chapter, we'll introduce several of the fundamental concepts for
+understanding federated learning (FL). We begin by discussing some of the
+different [flavors of FL](fl_flavors.md) and why they constitute
+distinct subdomains each with their own applications, challenges, and research
+literature. Next, we briefly discuss three of the most important building
+blocks associated with FL pipelines: [Clients](client.md),
+[Servers](server.md), and [Aggregation](aggregation.md).
+
+{{#author emersodb}}
diff --git a/books/fl/src/core/aggregation.md b/books/fl/src/core/aggregation.md
@@ -0,0 +1,47 @@
+<!-- markdownlint-disable-file MD033 MD013 -->
+
+# Aggregation Strategies
+
+{{ #aipr_header }}
+
+In FL workflows, servers are responsible for a number of crucial components, as
+discussed in [Servers and FL Orchestration](server.md). One of these roles is
+that of aggregation and synchronization of the results of distributed client
+training processes. This is most prominent in Horizontal FL, where the server
+is responsible for executing, among other things, an aggregation strategy.
+
+In most Horizontal FL algorithms, there is a concept of a _server round_
+wherein each decentralized client trains a model (or models) using local
+training data. After local training has concluded, each client sends the model
+weights back to the server. These model weights are combined into a single
+set of weights using an aggregation strategy. One of the earliest forms of
+such a strategy, and still one of the most widely used, is FedAvg.[^1]
+In FedAvg, client model weights are combined using a weighted averaging scheme.
+More details on this strategy can be found in [FedAvg](../horizontal/vanilla_fl/fedavg.md).
+
+Other forms of FL, beyond Horizontal, incorporate aggregation strategies in
+various forms. For example, in Vertical FL, the clients must synthesize
+partial gradient information received from other clients in the system in order
+to properly perform gradient descent for their local model split in SplitNN
+algorithms.[^2] This process, however, isn't necessarily the responsibility of
+an FL server. Nevertheless, aggregation strategies are most prominently
+featured and the subject of significant research in Horizontal FL frameworks.
+As is seen in the sections of [Horizontal Federated Learning](../horizontal/index.md),
+many variations and extensions of FedAvg have been proposed to improve
+convergence, deal with data heterogeneity challenges, stabilize training
+dynamics, and produce better models. We'll dive into many of these advances
+in subsequent chapters.
+
+#### References & Useful Links <!-- markdownlint-disable-line MD001 -->
+
+[^1]:
+    [H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas.
+    Communication-efficient learning of deep networks from decentralized data.
+    Proceedings of the 20th AISTATS, 2017.](https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf)
+
+[^2]:
+    [Gupta, Otkrist and Raskar, Ramesh, Distributed learning of deep neural
+    network over multiple agents, Journal of Network and Computer Applications,
+    Vol.116, pp.1–8, 2018.](https://arxiv.org/pdf/1810.06060)
+
+{{#author emersodb}}
diff --git a/books/fl/src/core/client.md b/books/fl/src/core/client.md
@@ -1 +1,50 @@
-# Client
+<!-- markdownlint-disable-file MD033 -->
+
+# The Role of Clients in Federated Learning
+
+{{ #aipr_header }}
+
+As discussed in [The Different Flavors of Federated Learning](fl_flavors.md),
+FL is a collection of methods that aim to facilitate training ML models on
+decentralized training datasets. The entities that house these datasets are
+often referred to as clients. Any procedures that involve working directly
+with raw data are typically the responsibility of the clients participating in
+the FL systems. In addition, clients are only privy to their own local datasets
+and generally receive no raw data from other participants.
+
+Some FL methods consider the use of related public or synthetic data,
+potentially modeled after local client data. However, there are often caveats
+to each of these settings. The former setting is restricted by the assumed
+existence of relevant public data and the level of "relatedness" can have
+notable implications in the FL process. In the latter setting, data synthesis
+has privacy implications that might undermine the goal of keeping data separate
+in the first place.
+
+Because each client is canonically the only one with access to the data stored
+in its dataset, they are predominantly responsible for model training, through
+some mechanism, on their local data. In Horizontal FL, this often manifests as
+performing some form of gradient-based optimization targeting a local loss
+function incorporating local data. In Vertical FL, partial forward passes
+and gradients are constructed based on information from the partial (local)
+features in each client.
+
+<figure>
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/ClientDiagram.svg" alt="Client ", width="350"> <!-- markdownlint-disable-line MD013 -->
+<figcaption>Visualization of some assets for FL clients.</figcaption>
+</center>
+</figure>
+
+The figure above is a simplified illustration of the various resources housed
+within an FL client. Each of these components needs to be considered to ensure
+that federated training proceeds smoothly. For example, given the size of the
+model to be trained and the desired training settings like batch size, will
+the client have enough memory to perform backpropagation? Will the training
+iterations complete in a reasonable amount of time? Is the network bandwidth
+going to be sufficient to facilitate efficient communication with other
+components of the FL system?
+
+In subsequent chapters, we'll discuss the exact role clients play in FL, and
+how they interact with other components of the FL system.
+
+{{#author emersodb}}
diff --git a/books/fl/src/core/fl_flavors.md b/books/fl/src/core/fl_flavors.md
@@ -0,0 +1,165 @@
+<!-- markdownlint-disable-file MD033 MD013 -->
+
+# The Different Flavors of Federated Learning
+
+{{ #aipr_header }}
+
+Machine learning (ML) models are most commonly trained on a centralized pool of
+data, meaning that all training data is accessible to a single training
+process. Federated learning (FL) is used to train ML models on decentralized
+data, such that data is compartmentalized. The sites at which the data is held
+and trained are typically referred to as clients. Training data is most often
+decentralized when it cannot or should not be moved from its location. This
+might be the case for various reasons, including privacy regulations, security
+concerns, or resource constraints. Many industries are subject to strict
+privacy laws, compliance requirements, or data handling requirements, among
+other important considerations. As such, data centralization is often
+infeasible or ill-advised. On the other hand, it is well known that access to
+larger quantities of representative training data often leads to better ML
+models.[^1] Thus, in spite of the potential challenges associated
+with decentralized training, there is significant incentive to facilitate
+distributed model training.
+
+There are many different flavors of FL. Covering the full set of variations is
+beyond the scope of these references. However, this reference will cover a few
+of the major types considered in practice.
+
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/Distributed Data Diagram.svg" alt="Decentralized Datasets", width="400">
+</center>
+
+## Horizontal Vs. Vertical FL
+
+One of the primary distinctions in FL methodologies is whether one is aiming to
+perform Horizontal or Vertical FL. The choice of methodological framework here
+is primarily driven by the kind of training data that exists and why you are
+doing FL in the first place.
+
+### Horizontal FL: More Data, Same Features
+
+In Horizontal FL, it is assumed that models will be trained on a **unified**
+set of features and targets. That is, across the distributed datasets, each
+training point has the same set of features with the same set of
+interpretations, pre-processing steps, and ranges of potential values, for
+example. The goal in Horizontal FL is to facilitate access to
+**additional data points** during the training of a model. For more details, see
+[Horizontal FL](../horizontal/index.md).
+
+<figure>
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/horizontal_fl.svg" alt="Horizontal FL", width="500">
+<figcaption>Feature spaces are shared between clients, enabling access to more unique training data points.</figcaption>
+</center>
+</figure>
+
+### Vertical FL: More Features, Same Generators
+
+While Horizontal FL is concerned with accessing more data points during training,
+Vertical FL aims to add additional predictive features to improve model
+predictions. In Vertical FL, there is a shared target or set of targets to be
+predicted across distributed datasets and it is assumed that all datasets share
+a non-empty intersection of "data generators" that can be "linked" in some way.
+For example, the "data generators" might be individual customers of different
+retailers. Two retailers, might want to collaboratively train a customer
+segmentation model to improve predictions for their shared customer base. Each
+retailer has unique information about the customer from their interactions
+that, when combined, might improve prediction performance.
+
+<figure>
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/vertical_fl.svg" alt="Vertical FL", width="500">
+<figcaption>"Data generators" are shared between clients with unique features.</figcaption>
+</center>
+</figure>
+
+To produce a useful distributed training dataset in Vertical FL, datasets are
+privately "aligned" such that only the intersection of "data generators" are
+considered in training. In most cases, the datasets are ordered to ensure that
+disparate features are meaningfully aligned by the underlying generator.
+Depending on the properties of the datasets, fewer individual data points may
+be available for training, but hopefully they have been enriched with
+additional important features. For more details, see [Vertical FL](../vertical/index.md).
+
+## Cross-Device Vs. Cross-Silo FL
+
+An important distinction between standard ML training and decentralized model
+training is the presence of multiple, and potentially diverse, compute
+environments. Leaving aside settings with the possibility of significant
+resource disparities across data hosting environments, there are still many
+things to consider that influence the kinds of FL techniques to use. There are
+two main categories with general, but not firm, separating characteristics:
+Cross-Silo FL and Cross-Device FL. In the table below, key distinctions between
+the two types of FL are summarized.
+
+| Type                  | Cross-Silo                             | Cross-Device                        |
+| --------------------- | -------------------------------------- | ----------------------------------- |
+| **# of Participants** | Small- to medium-sized pool of clients | Large pool of participants          |
+| **Compute**           | Moderate to large compute              | Limited compute resources           |
+| **Dataset Size**      | Moderate to large datasets             | Typically small datasets            |
+| **Reliability**       | Stable connection and participation    | Potentially unreliable participants |
+
+A quintessential example of a cross-device setting is training a model using
+data housed on different cell-phones. There are potentially millions of devices
+participating in training, each with limited computing resources. At any given
+time, a phone must be switched off or disconnected from the internet.
+Alternatively, cross-silo settings might arise in training a model between
+companies or institutions, such as banks or hospitals. They likely have larger
+datasets at each site and access to more computational resources. There will
+be fewer participants in training, but they are more likely to reliably
+contribute to the training system.
+
+Knowing which category of FL one is operating in helps inform design decisions
+and FL component choices. For example, the model being trained may need to be
+below a certain size or the memory/compute needs of an FL technique might be
+prohibitive. A good example of the latter is [Ditto](../horizontal/personalized/ditto.md),
+which requires larger compute resources than many other methods.
+
+## One Model or a Model Zoo
+
+The final distinction that is highlighted here is whether the model architecture
+to be trained is the same (homogeneous) across disparate sites or if it differs
+(heterogeneous). In many settings, the goal is to train a homogeneous model
+architecture across FL participants. In the context of Horizontal FL, this
+implies that each client has an identical copy of the architecture with shared
+feature and label dimensions, as in the figure below.
+
+<figure>
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/shared_labels.svg" alt="Homogeneous Architectures">
+<figcaption>Each client participating in Horizontal FL typically trains the same architecture.</figcaption>
+</center>
+</figure>
+
+Alternatively, there are FL techniques which aim to federally train collections
+of heterogeneous architectures across clients.[^2] That is, each
+participant in the FL system might be training a **different** model
+architecture. Such a setting may arise, for example, if participants would
+like to benefit from the expanded training data pool offered through Horizontal
+FL, but want to train their own, proprietary model architecture, rather than a
+shared model design across all clients. As another example, perhaps certain
+participants, facing compute constraints, aim to train a model of more
+manageable size given the resources at their disposal.
+
+<figure>
+<center>
+<img src="https://d3ddy8balm3goa.cloudfront.net/vector-ai-pocket-refs/fl/heterogeneous_architectures.svg" alt="Homogeneous Architectures">
+<figcaption>Model heterogeneous FL attempts to wrangle a zoo of model architectures across participants.</figcaption>
+</center>
+</figure>
+
+The primary focus of the current pocket references will consider the
+homogeneous architecture setting. However, there is significant research across
+each of the different flavors of FL discussed above.
+
+#### References & Useful Links <!-- markdownlint-disable-line MD001 -->
+
+[^1]:
+    [C. Sun, A. Shrivastava, S. Singh, and A. Gupta. Revisiting unreasonable
+    effectiveness of data in deep learning era. In ICCV 2017, pages 843–852, 2017. doi: 10.1109/ICCV.2017.97.](https://openaccess.thecvf.com/content_ICCV_2017/papers/Sun_Revisiting_Unreasonable_Effectiveness_ICCV_2017_paper.pdf)
+
+[^2]:
+    [Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, and Dacheng Tao. 2023.
+    Heterogeneous Federated Learning: State-of-the-art and Research Challenges.
+    ACM Comput. Surv. 56, 3, Article 79 (March 2024), 44 pages. https://doi.org/10.1145/3625558](https://arxiv.org/pdf/2307.10616)
+
+{{#author emersodb}}
diff --git a/books/fl/src/core/index.md b/books/fl/src/core/index.md
@@ -0,0 +1 @@
+# Core Concepts