NewtonX has previously written about the problematic dichotomy between data privacy and the utility of that data for AI applications. This tension is particularly apparent in the healthcare sector, where protecting patient data is paramount, but AI in the sector has the potential to find new cures, improve diagnosis, prevent illness, and even improve care for elderly and at-risk populations. Now, however, according to multiple AI in healthcare researchers interviewed by NewtonX, a new kind of machine learning could erase the gap between privacy and training data.
Deep learning algorithms used for drug discovery and diagnosis require an immense amount of data to learn, and the algorithms become more and more accurate the more data they receive. Giving the algorithms enough data (with enough diversity) would require numerous health organizations to partner up and share data. However, in both the US and the UK, citizens have been intensely critical of the idea of hospitals and research institutions centralizing incredibly sensitive data — and then handing it off to tech companies that don’t have a particularly good track record with data privacy.
Because of this, tech giants have been looking for ways to fully anonymize training data in foolproof ways, in order to gain access to sensitive data without the risk of breaches or de-anonymization.
Enter: Federated Learning
Federated machine learning is a type of distributed ML, that consists of training separate on-site algorithms and then combining them. The model first emerged in 2017, when Google used it to train its predictive text model on the text messages typed by all Android users, without ever reading the messages or exporting the data from the users’ phones.
For healthcare, federated learning would consist of training algorithms using the data stored on-site at hospitals and research institutions. After the local models are trained, they get sent to a central server (probably owned by a tech company), where they are combined together into one master algorithm. The master is then sent back to each hospital or research center, where it’s updated with new data acquired over time, and then sent back to the central server. Because the algorithm would be trained at the hospital, the local patient data would never touch a tech server. Raw data cannot be accessed by the tech company or a malicious third party because the algorithms cannot be reverse engineered.
This new approach is promising: IBM Research is using federated learning to advance healthcare applications, while a Google-backed startup called OWKIN is also using it to predict patient survival rates and reactions to certain drugs and treatments. In the Netherlands, the Personal Health Train (PHT) initiative works to connect distributed health data through federated learning by brining algorithms to hospitals, instead of bringing data to tech companies.
However, the technology is not without its drawbacks.
Why Aren’t All Hospitals and Tech Companies Using Federated Learning All the Time?
The first instance of using federated learning was relatively recent, and didn’t catch on in the industry until companies started looking for solutions to privacy protection for healthcare data.
The reason tech companies were reluctant to implement it is that federated learning requires standardizing data collection across every separate entity (hospitals and research centers), and also requires each of these entities to have the infrastructure and personnel on-site for training the machine learning algorithm. Additionally, there’s the risk that combining separate models could result in a master algorithm that is worse than each of its parts.
Despite these challenges, however, the expected payoff from AI in healthcare — lower costs, more accurate diagnoses, and predictive medicine — is so great that tech companies and hospitals/research institutions have strong incentives to work on overcoming the challenges. According to the NewtonX survey, while some hospitals and institutions have already begun to implement the necessary infrastructure, there will be a heavy push toward getting onboarded with federated learning over the next two years. By 2025 we will see massive advances and applications for AI in healthcare.