As of July 2020, 4.78 billion people in the world own smartphones. The number of IoT devices on the planet reached 26.66 billion at the end of 2019 and every second 127 IoT devices connect to the web. These smartphones and devices produce trillions bytes of data each day, and most of it is sensitive.
Can this data live on the edge and be kept completely private, while also bringing value to machine learning?
Before answering that question, let’s review some basics.
What is Federated Learning?
The central idea of federated learning (FL) is that data is born at the edge. The individual devices’ data is never fully transmitted to the service provider that orchestrates communication between different devices. The service provider briefly captures data, stripping off a lot of the sensitive information and sends only minimized, focused updates. ML models are trained over decentralized, distributed datasets.
The FL technology is divided into cross-device federated learning and cross-silo federated learning:
- Cross-device federated learning environments consist of models trained from decentralized data on people’s mobile devices. You can think of this cross-device setting as consisting of millions, even billions, of devices. Each person has data on their own devices, and you’re trying to collaboratively and collectively train a machine learning model and deploy it back to the devices.
- In cross-silo federated learning environments, instead of mobile devices, we have institutions. These can be banks or hospitals, maybe colleges or schools. Rather than having millions or billions of people, we have tens, hundreds or thousands of institutions.
Federated Learning vs. Traditional Distributed Machine Learning Environments
People often ask how federated learning environments are different from classic machine learning. Because there are similarities in the two environments, people sometimes conflate the two, and think that federated learning is nothing but distributed machine learning. But there are fascinating nuances between them. The root differences are the defining features of federated learning.
Shuffling of Data
In distributed machine learning, the data is centralized. We shuffle the data and then partition it however we want. We send different jobs to different workers or different machines. This is important because the shuffling ensures that whatever you’re doing, every worker can see a slice of data from the same population-level distribution.
In federated learning, however, we don’t have the luxury of combining the data or shuffling it and then redistributing it among the workers. That’s a critical distinction between federated learning technology and distributed machine learning.
Scale of Paradigms
Another difference stems due to the sheer scale of the machine learning paradigm. In distributed machine learning, we can kick off hundreds of jobs, more or less. Here, we’re talking about the cases of millions or tens of millions of devices. These devices put a lot of constraints on bandwidth, communications, and also statefulness. For instance, in distributed learning we could ask if a device has a state such that if we sample that device later on after an hour or a day, we can fetch that state and use it to continue the training. With cross-device federated learning, we cannot afford to do something like that.
These are the defining characteristics that make the two technologies quite different.
Federated Learning vs. Learning in Decentralized Environments
In fully decentralized learning, there’s no notion of a central service provider or an orchestrator. You have a graph describing relationships or connections between different devices or institutions and they’re working together towards learning a statistic or training a machine learning model. They may talk directly to each other via some communication link.
Google was one of the first innovators of this technology, and is already using FL in production. Let’s see how they do that.
From Federated Learning to Federated Analytics
Google uses Federated Analytics, which provides deeper insights into data on devices while protecting user anonymity. This is done by applying data science methods to the analysis of raw data which is stored locally on users’ devices. These techniques, for example, can find an average value on a population of devices, compute a histogram over a closed set, or find the head of distribution over an open set.
Federated Learning Applications at Google
Although Federated Learning is a relatively new concept, with the term first appearing in a Google paper written in 2016, Google is already using it in production at a large scale today. The company runs Federated Analytics on approximately half a billion downloads.
More than one billion people use a Google keyboard which has many machine learning models under the hood, especially language models. One such model is the next word prediction model, which gives you a prediction for what the next typed word will be. Using federated learning, the Google team working on this model, improved its accuracy. Besides Google keyboard, FL is used in production by the Pixel and the Android Messages teams.
In our next article we will cover how Google is deploying FL in production. Meanwhile, if you want to dig into FL research and get better understanding about this technology check out this paper: Advances and Open Problems in Federated Learning.
This article covers part of a talk delivered at Stanford University by Peter Kairouz, Research Scientist at Google. The author thanks Peter for the inspiration.