Robin Röhm discusses the application of federated machine learning in pharmaceutical research, focusing on model development across distributed, proprietary datasets.

Apheris and Federated Data Networks
Apheris develops infrastructure for federated data networks in life sciences, enabling models to be trained across datasets that remain physically separate. This is particularly relevant in drug discovery, where data are siloed and highly sensitive. Federated learning increases dataset diversity while preserving data ownership.

Limitations of Current AI Models
Recent advances in machine learning show strong benchmark performance but often rely on public datasets that do not reflect the complexity of proprietary pharmaceutical data. This limits generalisation to industrial use cases. Increasing data diversity—particularly through access to private datasets—is therefore critical for improving model performance.

System Architecture and Deployment
The platform is designed for local deployment within pharmaceutical IT environments. Models developed through federated networks can be integrated and fine-tuned with organisation-specific data, ensuring compliance with regulatory and operational constraints.

Data Privacy and Security
Data governance is central in pharmaceutical AI. Federated learning mitigates risk by sharing model updates rather than raw data, with safeguards to prevent inference of sensitive information. Maintaining strict control over intellectual property remains essential.

Impact on Collaboration
Data-intensive models are reshaping collaboration. Federated approaches enable joint model development without direct data exchange, supporting multi-party participation while preserving competitive boundaries, particularly in pre-competitive research areas.

Near-Term Developments
Over the next 12–18 months, federated networks are expected to expand across new domains, including structural biology and antibody modelling. Integration of newly generated experimental data into iterative training cycles will further support continuous model improvement.

Future Directions
The field is likely to shift toward domain-specific models trained on more diverse datasets, alongside increased use of active learning strategies. Federated learning is expected to play a key role in enabling broader data access while maintaining governance constraints.