ChatEpisteme: Querying Complex Data with Natural Language Using AI

Patrik Kagelid

Data Engineer

AstraZeneca

19 August, 2025

Watch time: 5 Minutes

Patrik Kagelid, a data engineer at AstraZeneca, presented on leveraging AI chatbots and graph databases to streamline access to safety data for scientists. His work focused on improving data querying efficiency using the standard for exchange of non-clinical data (SEND) and integrating various data sources into a unified knowledge graph to facilitate rapid, evidence-backed insights.

The project arose from the need to quickly answer regulatory questions about safety data, which previously took large amounts of time due to fragmented systems and poor search tools. The adoption of the SEND standard enabled consistent data formatting over the past seven years.

A chatbot interface, named ChatEpisteme, was developed as a proof of concept to allow scientists to query 20 years of safety data directly, significantly reducing the time to extract knowledge and linking answers to final reports for validation. The chatbot could quickly answer compound-related questions, identify studies using compounds, quantify findings, and specify affected tissues, tasks that previously required hours of manual report review.

The system leveraged over 3,500 high-quality SEND datasets, with expert users involved in testing and training the model to generate Cypher queries for the knowledge graph, allowing natural language interaction without requiring SEND terminology. Data from SEND, efficacy, project, and compound sources were stored in Snowflake and integrated into a Neo4J knowledge graph. OpenAI was used on-premises to translate natural language queries into Cypher queries, returning human-readable answers without data leaving the premises.

Users were encouraged to treat the chatbot as a search engine, verifying answers against reports and understanding the data’s raw nature. Evidence of reliability was provided, and user education was emphasised to set proper expectations. The tool greatly reduced query time, relieved data engineering workload, and supported cross-system searches. Future plans included expanding to a broader research assistant platform integrating public and internal data, including images, to enhance R&D capabilities.

Implementation advice included starting small with limited datasets and involving users early for training and feedback. The interface was designed to be simple and evidence-based to build trust, with scalability designed to integrate into enterprise architecture for future expansion.

ChatEpisteme: Querying Complex Data with Natural Language Using AI

Highlights

Takeaways

PREMIUM CONTENT

Watch this next

Harnessing AI to Discover Rare Disease Drugs

Integrating Wet and Dry Labs to Discover the Antibodies of the Future

AI, the Future of Work, and You

iPSC-Derived Neuron Cell Cultures: Use, Development, and Automation

Distributed Automation Network: Implementing R&D Lab Automation at Novo Nordisk

ChatEpisteme: Querying Complex Data with Natural Language Using AI

Highlights

Takeaways

PREMIUM CONTENT

Watch this next

Harnessing AI to Discover Rare Disease Drugs

Integrating Wet and Dry Labs to Discover the Antibodies of the Future

AI, the Future of Work, and You

iPSC-Derived Neuron Cell Cultures: Use, Development, and Automation

Distributed Automation Network: Implementing R&D Lab Automation at Novo Nordisk

News

AI and Simulations Reveal a Single Weak Spot in Herpes Virus Fusion

Atelerix and Cherry Biotech Partner to Modernise Organoid Shipping

IonQ and CCRM Join Forces to Merge Quantum & Regenerative Medicine

Thought Leadership webinar Unlocking the Promise of Protein Degraders: Molecular Glues, PROTACs, and the Path Forward.

GSK and Fleming Initiative Scientists Team Up to Tackle AMR with AI

You're just a click away