0:00 

The title is related to the title contains the words team sport, and I was asked by my colleagues how many sports metaphors I'm going to use. 

 
0:10 
I promise it's going to be probably the last one. 

 
0:13 
But first I'd like to discuss a little bit the players in this whole setup. 

 
0:19 
Then yeah, we're going to discuss what we play with: the data, and then at the end we're going to discuss how we play with the data and how we make sure that the rules are actually being kept and managed to probably. You might recognise the picture up there. 

 
0:42 
It's from the Nature Review Drug Discovery, that nice huge map of drug discovery process. 

 
0:49 
And well, the take home message here is that it's complicated. 

 
0:54 
Even if you look at the top green and blue parts, even those parts are complicated and those parts are in focus in this presentation. 

 
1:05 
But those parts dealing with only the lead identification and lead optimisation, even those parts, every single box up there represents essentially the collaboration of a bunch of different people from different departments. 

 
1:19 
And every arrow means a data transfer from one stage to the other one. 

 
1:25 
So it's fun because the more arrows and boxes you have, the more issues you face with when it comes to transferring the data and facilitating the collaboration 

 

1:40 
Starting with the players, what does a discovery team look like? 

 
1:43 
Many of you probably know much better than me, but if you try to structure it in the typical data information knowledge wisdom pyramid, if you think about the bottom part that's related to all the raw data that you collect, all the compound data, all the assay, the results, all the predictions that you end up with using your different models and calculation methods. 

 
2:11 
Yes, essentially lots of spreadsheets and lots of well-structured data fields. 

 
2:19 
Then the layer above related to information. 

 
2:24 
You can think about it as an example, as an SAR analysis, which is fine, but the important part is the bottom is the top half of this pyramid where the new ideas and where the new design ideas, the new concepts are being fed continuously. 

 
2:43 
And these ideas are being fed by an interdisciplinary team usually. 

 
2:49 
So it's not only one medicinal chemist who's working in the project. 

 
2:53 
It's a collaboration of a medicinal chemist and the assay scientist and whoever is involved in that project and their active collaboration. 

 
3:01 
And it wouldn't move forward without their active collaboration to go a little bit beyond that. 

 
3:08 
It's not only a collaboration within a single company. 

 
3:12 
Many of us experience in our daily lives that the outsourcing is there and it's not only there, it's going to be even more there regardless which review article you end up with. 

 
3:29 
You might find a different figure for the size of the CRO market, but it's increasing. 

 
3:36 
It's there. 

 
3:36 
And we get used to the fact that in the most typical set up in some cases even the design part is outsourced to someone else. 

 
3:46 
But in the most typical setup, even if design and analysis is kept in house, the make and test parts of the MTA cycle are typically outsourced to one or two CROs, sometimes multiple CROs. 

 
4:05 
And that immediately creates the problem of how to transfer the data between the different companies and how to make sure that the data is being, well, properly guarded. 

 
4:18 
There's another player in the team. 

 
4:20 
We had had a whole session yesterday about AI and how AI participates in drug discovery. 

 
4:30 
Yes, the new player might be a little bit harsh here. 

 
4:35 
You can think about it as a tool, of course, but that tool needs the very same data sources that any other team member would need in order to be actively involved in the research process. 

 
4:51 
So that tool essentially acts as a player because it would need the data shared with that tool. 

 
4:59 
And it would of course creates some results that need to be shared with all the other players in the team. 

 
5:07 
One great example of such data sharing problems is when you think about all the Federated learning approaches. 

 
5:16 
Here you have the European IMI MELLODY project, which wouldn't be feasible without coming up with a proper way to share data between I think around a dozen different participants and partner companies and who are willing to build a joint model, who are willing to be part of a Federated learning experience, but without really compromising proprietary information. 

 
5:47 
So yes, it's not clear text IP of compounds running around in the world to feed that learning experience. 

 
5:58 
So what about data? 

 
6:01 
What kind of data we share the usual suspects. 

 
6:05 
If you think about the design part then is the top half yeah, we might share targets and certain hypothesis or compound designs, but most frequently the bottom parts, compound data, assay results, predictions, whatever. 

 
6:20 
And yes, when we talk about compound data, then you share structures, you share IDs, you share the status of those compounds and you want to see a real time tracking of those compounds in an ideal case and not just being notified by your next e-mail that by the way it's ready the next batch. 

 
6:42 
When you share ideas then you might want to share a more complex data structure which incorporates parts of publications and 3D visualisations of the target and in some cases docking poses in a way that you can investigate them at the end of the day in 3D when you want to check the results. 

 
7:10 
So it creates a pretty complex set of data types and some of them do not really have any predefined standards when it comes to sharing them. 

 
7:21 
Speaking of standards, yes, it's one thing that we do not have the file and data formats for all these, but every single company decides on their own normalisation related business rules which might or might not match the other companies, the collaborators business rules. 

 
7:40 
So there is a transformation that has to happen at some point. 

 
7:45 
And well yesterday there was a talk about CELA and how that would be an open format for APIs and how that would enhance automation. 

 
8:00 
But even if we do not have open format, in a best case, we have some predefined proprietary APIs and protocols that are secure enough to transfer that data. 

 
8:14 
And we do not end up with the situation that we see up there where we have analogue and handwritten data running around and some proprietary formats that you might not be able to read with the other vendor software or non-machine readable text based documents. 

 
8:31 
So these are not fair. 

 
8:33 
And we have, we've been talking about fair for like 15 years now, maybe more. 

 
8:40 
We managed to reach the F and then the fair ish part, which is good enough for these applications, but we need to make sure that it's as good as the requirements of that collaboration. 

 
8:56 
And speaking of collaboration, so PwC had a survey among larger European companies, I think 200 of them, who they share data with, what they share and when. 

 
9:11 
Well, the obvious candidates, they share data with customers and suppliers, they share status, data compounds, they share it in order to improve customer relations and to accelerate projects. 

 
9:25 
That's fine, that's perfect. 

 
9:27 
But still, even when they do it, data sharing keeps being difficult, even if the intent is there. 

 
9:35 
It requires a complex architecture from the IT department to set governance policies, set guidelines, set guardrails, set everything in place so that people do not get anxious about their data reaching eyes who that data set should have never reached, but not being able to guarantee who will have where we'll we have visibility of that data. 

 
10:06 
And yes, security. 

 
10:08 
I intentionally did not bring examples from the pharma industry, but even looking at the past half a year, huge headlines in words, the biggest cyber attack of 2023 leaking a few millions of users data out there in the world. 

 
10:29 
Yes, it is a concern. 

 
10:30 
And apparently in the very same survey, it turned out that almost 60% of the companies are heavily investing in cybersecurity. 

 
10:40 
Of course they are, but investing in cybersecurity means that you also have applications that can obey the requirements of that cybersecurity set up. 

 
10:50 
If we look at a little bit more specific examples, let's imagine that you have colleagues working on a project and some med chem partners, some consultants and an external synthesis collaborator. 

 
11:06 
Those colourful parts indicate that you do not want to share all data with everyone. 

 
11:13 
You will not cherry pick which projects are being shared with which partner and how that is being handled. 

 
11:19 
And at the end of the day, you end up with a matrix. 

 
11:22 
Look at that. 

 
11:23 
So at the end of the day it becomes a nightmare where you have even when not going into the IT details into the tiny little field-based access control. 

 
11:36 
Even with that high level concept, you end up with a massive table where you have to define who has access to what. 

 
11:45 
That the admin can actually manage that table and that we can ensure that the CRO has only visibility of the assigned compounds and not everything else within the same project. 

 
11:58 
And the internal collaborators have access only to those projects that they have that they are a member of. 

 
12:04 
So, yeah, this spreadsheet is not something that you can expose to an end user. 

 
12:11 
Absolutely not. 

 
12:12 
And this spreadsheet is not something that you can easily force on an end user that oh, by the way, please keep in mind that according to that spreadsheet, you shouldn't send that Excel file to someone else. 

 
12:24 
That's never going to be restrictive enough. 

 
12:28 
This project has to be implemented in a way that the end users will never see it. 

 
12:35 
They will only see the results. 

 
12:37 
They will only see if you look at 2 random screenshots of our designer product. 

 
12:44 
They will see a clean interface where they can interact with their design. 

 
12:50 
Here the light window represents the internal. 

 
12:56 
The internal members of the projects who have essentially visibility of everything and the dark window represents the external collaborators who have limited visibility of limited projects. 

 
13:09 
And within that limited project very limited set of data. 

 
13:14 
How limited that is certainly the purple, blue parts are being transferred which contain descriptions, which contain data fields that have to be captured on the CRO side. 

 
13:26 
But none of the orange coloured parts are actually visible outside the internal project team. 

 
13:34 
And that's the point that all the rules in the spreadsheet on the previous slide are implemented in a way that is essentially invisible to the end user. 

 
13:47 
But still we can guarantee that data is not leaked anywhere. 

 
13:51 
So to summarise in three bullet points, first of all, yes, I started with the word team sport, but it means only that we need to make sure that we can facilitate a heavy collaboration of both internal and external partners in the project so that we can achieve that. 

 
14:15 
It was used the word acceleration in yesterday's panel discussion the to accelerate the DMTS cycle. 

 
14:23 
Yes, that acceleration only works if that heavy team collaboration is not in the way of the people working in that team, but actually helps them to end up with better quality results. 

 
14:40 
The second point is related to, well, realistic situation that the data standards have to be improved continuously because, well, they're far from ideal at the moment. 

 
14:55 
And improving on those data standards and improving on how we share in a secure way those data fields, we can make sure that the fear of sharing sensitive data goes away eventually and implementing IT data access best practises. 

 
15:15 
Well, it should remove those obstacles and that's the point of it to implement connected systems in a way that they facilitate the increased efficiency. 

 
15:30 
And with that, I'd like to wrap up my presentation and thanks again for your attention. 

 
15:37 
And we are more than welcome to see you at our booth if you have any further questions on the topic. 

 
15:46 
Thank you.