As a machine learning engineer, I often find myself trying to figure out how to get access to the best datasets to inform my model development and evaluation. Researchers and other ML engineers like me have the expertise required to put this data to use in solving important and previously intractable problems, but the required datasets are often locked away in silos due to perceived regulatory, technical, or privacy risk barriers to data collaboration. To help resolve this gridlock, we’ve been working on a new data science paradigm called federated learning, which allows model developers to send algorithms to data, without requiring access to raw data or data transfer.

While federated learning has made a splash in the popular press through the introduction of Google’s Federated Learning of Cohorts (FLoCs) and Apple’s use of the technique to power Siri, it has yet to emerge as a standard commercial technique for data collaboration across sectors at scale. In this article, I hope to explore why that might be the case and make developers aware of emerging frameworks you can use to easily get up to speed on federated learning and safely access the data you need!

What is Federated Learning?

Before we dive into why federated learning is not yet commonplace, it’s important to understand it from a foundational perspective. In the most broad or vanilla definition, Federated Learning (FL) is simply a Machine Learning (ML) setting where many clients collaboratively train a model under the orchestration of a central server while keeping the training data decentralised. That’s all!

The term was introduced in a paper by McMahan et al (2016) [1] from Google and has taken off since then as a way to facilitate collaboration whilst ensuring privacy. The technology was initially introduced specifically for mobile and edge device applications where we have millions or potentially even billions of unreliable clients (reliability here refers to the likelihood of dropping out due to a variety of different reasons e.g. network issues, battery status, compute limitations, etc.), and we want to train a global model that we ship to all devices.

For example, let’s say you want to predict the next word a user is going to type into their mobile phone. Previously, in order to do this, each mobile phone would have to upload its data to the cloud where a central orchestrator (e.g. Google) could train a model on all users’ typing data in one place. In practice, this would likely be across numerous data-centres and compute nodes, but the key is that all the data in this historical scenario is visible in its entirety to the orchestrator.

What FL does is turn this idea on its head.

Instead of sending the data to the model, we send the model to the data.

How does it work?

The algorithm works as follows:

Client Selection: The central server orchestrating the training process samples from a set of clients meeting eligibility requirements.
Broadcast: The selected clients download the current model weights and training program (assuming training is not being done from scratch).
Client computation: Each selected device locally computes an update to the model parameters.
Aggregation: The central server collects all the model updates from the devices and aggregates them.
Model update: The server locally updates the shared model based on the aggregated update.

Steps 2-5 are repeated until our model has converged

This procedure is demonstrated in the figure below:

Instead of the orchestrator being able to see everyone’s data, they just receive each user’s update to the model parameters. The user data never leaves the device, but the orchestrator can still receive the same results. Typically, FL is also combined with one or more other Privacy Enhancing Technologies (PETs) to ensure that the underlying data also cannot be extracted from the model updates.

This method is now in use by both Google and Apple for a variety of different use cases such as speech recognition and language modelling and is termed Cross-device FL. Applying this same protocol to a smaller number of more reliable clients/organisations is termed Cross-silo FL and is starting to get some traction across industries.

Why aren’t we all using FL yet?

There are quite a few reasons, aside from a lack of regulatory requirements, for why federated learning has not been widely adopted by developers, including:

Education: Concepts emerging from research or research arms of tech organisations often take time to permeate to the rest of the market. There is still some educational work that needs to be done to convince data owners of the efficacy of FL as a technique to help with data protection requirements.
Point-Solution Development: Several of the existing platforms for federated learning were developed to be fit-for-purpose or for a specific Big Tech use case rather than general purpose use at scale. Frameworks for general purpose use are just emerging in the market.
Ease-of-Adoption: The learning curve for getting up and running has thus required developers to deeply understand federated learning and other privacy-preserving techniques and retro-fit their data accordingly to make data operations work. Historically, a lack of commoditised federated learning platforms meant that developers needed to have intrinsic motivation to research and understand the concept prior to connecting and using data.

The good news is we are making headway on attacking these challenges as an industry. One of the key steps is the availability of open frameworks for federated learning use cases. Here are a few of the common available platforms and the sectors they focus on supporting:

Framework	Highlighted Sectors/Use Cases
OpenMined - PySyft + PyGrid	- Academic Research - Public Sector Research - Online Safety & Transparency - Telco Collaboration
Flower	- Healthcare - Speech Models - Cross-Device
Bitfount	- Healthcare & Life Sciences - Public Sector applications - Financial Services - Academic Research - Data consortia

[Disclaimer]: I am currently an ML Engineer @ Bitfount.

Each of these frameworks has its pros and cons, however, the most exciting thing about them is that they can be easily adapted by developers, with accompanying tutorials, according to the use case for which you are trying to build. For the first time, easy-to-deploy federated learning is at our fingertips, such that we can be liberated from architectural woes and instead focus on developing best-in-class models!

Additional Resources

If you are interested in a more comprehensive assessment of FL, take a look at Advances and Open Problems in Federated Learning [2]. Or to get up to date with the most recent research, follow Awesome-Federated-Machine-Learning on Github.

References

McMahan et al (2016)
Kairouz, et al. (2019)

Liberating Federated Learning for the Everyday Developer

What is Federated Learning?

How does it work?

Why aren’t we all using FL yet?

Additional Resources

References