Data is the new oil, but the ways in which we refine data are more important than the data itself. CK Ong, Director of Data Product at GSK, discussed his company’s developmental data initiative, which aims to accelerate drug discovery by bringing together and standardising data across platforms. Ultimately, this approach upholds the FAIR data principles.
To mitigate the attrition rates in drug discovery, Ong suggests that the role of data is interconnected. He argued that there is a need for better target identification through data-driven approaches. The data landscape is chopping and changing due to its ever-evolving nature; new data, new modalities, new data sources, new data production, and new prediction models add to challenges.
So how can we achieve FAIR data? Making data findable requires publishing data or datasets in a central repository that is easily accessible for scientists. It is also for clinical data sets to get reused in different contexts, so it is important that value can be harvested time and time again. It is also key to ensure data integration across formats and domains.
Ong also advocated for a shift from technology/platform-centric thinking to treating data as products. He suggested that this change in perspective enables modular, standardised, and reusable data blocks for diverse use cases. Furthermore, data generation must be intentional; scientists should ask themselves why they want to generate data and what it will be used for. Ong emphasises that more data beats clever algorithms, but better data beats more data.
GSK’s data fabric initiative aims to unify and streamline access to data across GSK’s development landscape. It sits between data sources and consumers, handling transformation, security, and privacy. The data fabric uses data mesh principles, including decentralised architecture, domain autonomy, and federated governance.
To finish off his presentation, Ong pointed to the importance of governance and implementation. Governance ensures interoperability, compliance, and quality across different data domains. Master reference data, such as tissue types and milestone types, are central to consistency. However, success ultimately depends on stakeholder engagement and communication than just technology.