Patrik Kagelid, a data engineer at AstraZeneca, presented on leveraging AI chatbots and graph databases to streamline access to safety data for scientists. His work focused on improving data querying efficiency using the standard for exchange of non-clinical data (SEND) and integrating various data sources into a unified knowledge graph to facilitate rapid, evidence-backed insights. 

The project arose from the need to quickly answer regulatory questions about safety data, which previously took large amounts of time due to fragmented systems and poor search tools. The adoption of the SEND standard enabled consistent data formatting over the past seven years. 

A chatbot interface, named ChatEpisteme, was developed as a proof of concept to allow scientists to query 20 years of safety data directly, significantly reducing the time to extract knowledge and linking answers to final reports for validation. The chatbot could quickly answer compound-related questions, identify studies using compounds, quantify findings, and specify affected tissues, tasks that previously required hours of manual report review. 

The system leveraged over 3,500 high-quality SEND datasets, with expert users involved in testing and training the model to generate Cypher queries for the knowledge graph, allowing natural language interaction without requiring SEND terminology. Data from SEND, efficacy, project, and compound sources were stored in Snowflake and integrated into a Neo4J knowledge graph. OpenAI was used on-premises to translate natural language queries into Cypher queries, returning human-readable answers without data leaving the premises. 

Users were encouraged to treat the chatbot as a search engine, verifying answers against reports and understanding the data’s raw nature. Evidence of reliability was provided, and user education was emphasised to set proper expectations. The tool greatly reduced query time, relieved data engineering workload, and supported cross-system searches. Future plans included expanding to a broader research assistant platform integrating public and internal data, including images, to enhance R&D capabilities. 

Implementation advice included starting small with limited datasets and involving users early for training and feedback. The interface was designed to be simple and evidence-based to build trust, with scalability designed to integrate into enterprise architecture for future expansion.