0:00 

So just as a reminder, let's be on time. 

 
0:04 
Questions is the best thing. 

 
0:05 
So let's take advantage of that. 

 
0:08 
I'm very happy to introduce you to Lykke Pedersen from Copenhagen. 

 
0:14 
And Lykke was awarded her PhD in Biophysics from the University of Copenhagen and conducted research in the Niels Bohr Institute. 

 
0:23 
Her experience as a bioinformation coalesced at Roche, where Lykke worked as a leader drug designer across multiple projects with drug candidates entering clinical trials. 

 
0:33 
She's an A mentor of on more than 40 patents in the field of RNA therapeutics. 

 
0:38 
We definitely got to talk today. 

 
0:42 
She works as head of RNA therapeutics at Abzu. 

 
0:46 
Head or CEO? 

 
0:47 
Sorry, head or CEO? 

 
0:48 
Yeah, she found out. 

 
0:51 
Chief Pharma Officer. 

 
0:52 
Apologies. 

 
0:53 
Very good. 

 
0:55 
She's still working in the emerging field of RNA therapeutics. 

 
0:58 
Today she helps discover medicines for previously untreatable diseases like Alzheimer's. 

 
1:03 
Thank you. 

 
1:05 
Various cancers and neurodegenerative diseases. 

 
1:07 
She's also passionate about women in tech, women in business and et cetera. 

 
1:12 
I guess her philosophy centres around encouraging women to be their absolute best. 

 
1:18 
I hope I can be encouraged as well, even being a man. 

 
1:23 
Thank you. 

 
1:23 
OK, so you got until 12. 

 
1:28 
So very looking forward to the talk. 

 
1:31 
Please. 

 
1:32 
Thank you and thank you for joining this talk. 

 
1:35 
I can see some familiar faces, which is nice. 

 
1:38 
It's always nice. 

 
1:40 
Yeah. 

 
1:40 
I represent. 

 
1:41 
We'll be doing an Abzu, and we are an AI machine learning company. 

 
1:46 
So I'll try to explain what explainable AI is first and foremost, just to try not get us on the same level, all of us just so you kind of know what it is. 

 
1:57 
Have an example around siRNA explainable AI applied to that. 

 
2:02 
And lastly, a bit about disease understanding. 

 
2:05 
So let's see if this works here. 

 
2:12 
We as humans, we need to understand what the decisions we make are based on. 

 
2:18 
That's a very, you can say big thing to say, but let's put it back a bit. 

 
2:25 
For example, if I sit around the campfire with my kids and I tell them do not touch the fire, I think all of you know that feeling. 

 
2:34 
I just need to see, can I put my fingers through without getting burned? 

 
2:38 
Could I? 

 
2:39 
How far can I go? 

 
2:41 
All of this, you might end up actually being burned. 

 
2:45 
You might also enter end up not being burned because I told you do not go too far and my kids would take that as guidelines. 

 
2:55 
The same would go for, you can say scientists. 

 
2:58 
We get guidelines, we learn from each other, but we also test that. 

 
3:05 
We don't take everything for granted. 

 
3:07 
We want to test it and first and foremost, I think I actually used ChatGPT for this to like, you know, type in what is science and one of the things that it put out is that scientists they want to falsify. 

 
3:20 
So going back to we don't trust everything. 

 
3:24 
So of course, I could give you a very simple thing around RNA therapeutics. 

 
3:28 
Well, you just need to do Watson Crick base, quick base pairing. 

 
3:32 
Everything will work. 

 
3:33 
Yes, it needs to have Watson Crick base pairing, but that's not the whole truth here because there's so many different things that we don't know. 

 
3:44 
And there's also things that we do know. 

 
3:46 
And first and foremost, we could test this, but we also know to want to know why going back to the camp for like, why? 

 
3:55 
Why shouldn't I go close to the fire? 

 
3:57 
Otherwise you will get burned. 

 
4:00 
So that's actually the essence of explainable AI is that you can have an explanation, you can have more trust about what's getting out of your models. 

 
4:12 
So again, why the why in this? 

 
4:17 
For example, if I'm a patient and I get, well, you have a 10% risk of developing this disease, I'll be great. 

 
4:24 
Only 10%. 

 
4:25 
But if I was told, yeah, that's because you're blonde and you're 175 tall, I would be no, I don't trust that model. 

 
4:33 
So we need to have that why question to build the trust, but also to know why. 

 
4:39 
So for example, why is a drug being toxic to the liver? 

 
4:42 
If we know the why, we might also be able to mitigate that risk. 

 
4:47 
What is why does Alzheimer's develop? 

 
4:50 
If we know why, we might also be able to actually find a treatment for it. 

 
4:56 
And as scientists, we use data to answer these why questions. 

 
5:01 
And you can say the answers are the relationships within the data. 

 
5:05 
And let me see if this place works. 

 
5:07 
A does. 

 
5:09 
So data when we talk about machine learning in Abzu is tabular data. 

 
5:14 
So for example, you can imagine you could have patients in the rows, and you could have mentioned various things for these patients. 

 
5:22 
It could be RNA expressions, it could be you could say more real world data and so on. 

 
5:29 
If we talk about the drugs being measured, it could be activity and then we could describe these drugs in various ways and the relationships that we then go about to find in these data. 

 
5:44 
That is what we call the models and what we want to predict is the outcome. 

 
5:49 
So for patients that would be am I sick, am I healthy for a drug? 

 
5:54 
It could be, is it active or is it inactive? 

 
5:57 
And with regular, I would more call it default, machine learning methods. 

 
6:04 
It's often a black box. 

 
6:05 
What's behind the models? 

 
6:06 
It can be very tricky to figure out what is actually these underlying relationships. 

 
6:13 
So it is really black. 

 
6:16 
You, you get this like there might be 10% risk of developing a disease. 

 
6:20 
There might be a 90% risk that this drug is toxic. 

 
6:24 
But why is it so? 

 
6:27 
It could be that there's one feature that is important. 

 
6:30 
We don't know, but we'd explain it by AI and the algorithm that we have done or we have developed at Abzu. 

 
6:38 
We can actually look at all these features and see which one might be explainable and actually important for what we're predicting. 

 
6:47 
One thing could be age; there could be several features. 

 
6:52 
There could be thousands, there could be millions. 

 
6:54 
But what our algorithm can do is that it can select maybe only a handful, actually down to 1 if there's only a single that is actually descriptive. 

 
7:04 
And then it can combine them to predict a certain, in this case, vital status for a patient. 

 
7:12 
What you can also see here is it can handle various data. 

 
7:16 
It could be RNA, it could be protein, it could be age, and it could be mutation. 

 
7:21 
It doesn't matter. 

 
7:22 
And when we come to, for example, sequences, it can also handle an A, C, T, and Gs. 

 
7:27 
It doesn't need to be a number. 

 
7:30 
So this was for a patient. 

 
7:32 
Let's maybe go down to anaesthetics and the actual drugs. 

 
7:37 
So as I said before, mentioned before, the data could be that you have the drugs in the rows and in the columns you have various descriptors. 

 
7:45 
So we don't put the actual sequence in here. 

 
7:48 
Then you would need a million compounds or more. 

 
7:51 
And at least if I were back in my days and Roche, I would tell the chemists go and synthesise and scream a million compounds. 

 
7:58 
They would running away screaming. 

 
8:01 
So don't need that. 

 
8:03 
You only need maybe a hundreds down to 50 if you are very lucky in using our algorithm. 

 
8:10 
So you need to describe these molecules. 

 
8:12 
It could for example be the GC percent, it could be duplex energy, it could be all the things that you could imagine. 

 
8:19 
Because the neat thing about this algorithm is that if a descriptor or way describing a molecule doesn't have any information about the property that you try to predict, it will simply not be picked up. 

 
8:33 
So it's very easy to test hypothesis in that sense. 

 
8:36 
So if you have any kind of like idea about, I think this certain feature, this way of describing the molecules, right? 

 
8:44 
If you're wrong, the algorithm will tell you. 

 
8:47 
The other thing is that these. 

 
8:50 
Yeah, don't take look too much about this is just our way of you can say writing up a model. 

 
8:56 
But again, in this case it has taken all the these descriptors and figured out that duplex energy and matching mice is very important. 

 
9:04 
I need to say here, this is a very thought up example. 

 
9:07 
Don't take pictures and say this is the holy grail for the compounds. 

 
9:11 
It's a thought up example. 

 
9:13 
But what you can do from this is that you can luckily enough translate this model into something more human readable and human understandable. 

 
9:23 
From this model we would deduce that compounds are active if they have an optimal duplex energy and have a matching mice. 

 
9:29 
What you can also do is for example the duplex energy on this picture over here I have plotted this as the probability of activity. 

 
9:39 
So down here, you would have inactive compounds and active compounds. 

 
9:43 
And then as you increase the duplex energy, if you only would have compounds in this, like the red dots here, the model and the blue line would tell you just increase the duplex energy and you're fine. 

 
9:54 
But you could also go and say, ah, there's not that much of information up here. 

 
10:00 
Let's try and test these ones. 

 
10:02 
And suddenly the model will tell you, no, it's not just about increasing. 

 
10:06 
It's actually something about that you have an optimal duplex energy. 

 
10:10 
There's a certain range of duplex energy that's good. 

 
10:13 
That's also part of the explainability here that you can get that kind of information out of the models. 

 
10:19 
So now all of this was like fought up examples bit of introduction to explainable AI. 

 
10:25 
So how we do we then actually use this? 

 
10:27 
How can we apply our algorithm to you can say real data? 

 
10:33 
And for this I have taken with me an example of siRNA. 

 
10:37 
So we actually, we of course care about the molecules per se, but whether it is an ASO, gapmer, stearic blocker, siRNA, as long as it's short and it's either single or double stranded, we can handle them. 

 
10:52 
So in this case, again, it's a very old data set from 2005, the Huesken dataset, some people might know it by that name. 

 
11:02 
They measured 2,431 siRNA is targeting very different targets. 

 
11:07 
And I know it's a bit of an artificial data set, but it can to show you how we can use it. 

 
11:13 
And I then I have a real world example at the end. 

 
11:18 
So here we got the antisense base sequence. 

 
11:25 
And we got the sense here and it's pure RNA from the paper. 

 
11:29 
We get the inhibition, we then decide to classify. 

 
11:32 
So if the percent inhibition is below 50, we say that the siRNA is inactive of if it's above 80, we say it's active. 

 
11:40 
And then back to how can we then describe these molecules? 

 
11:44 
It could, as I said before, we can handle letters. 

 
11:47 
So it could be like the first dimer, it could be the number of TCs and many more features. 

 
11:53 
And we keep on developing and learning more about describing these molecules. 

 
11:58 
Right now we have more than 200. 

 
12:00 
I think we are actually up on 400 now. 

 
12:03 
It's not been updated, various descriptors. 

 
12:08 
So we're going from all of these different ways of describing the molecules. 

 
12:12 
And it could also, it's not just on the molecules. 

 
12:15 
It's also, I think it was mentioned in an earlier talk this morning that it's also on the target itself. 

 
12:21 
For example, it's something about how, how the RNA is folded up. 

 
12:25 
So whether it's in an inaccessible accessible site on the RNA, that's also considered. 

 
12:30 
And we also, for example, consider is it very well conserved across many species or not. 

 
12:37 
So it's not just the molecules, it's also the target. 

 
12:42 
So we put in all these data and then what we get out is that the algorithm picks up three different features that are informative here. 

 
12:52 
And over here you can see that we have 90% accuracy. 

 
12:56 
That means in 90% of the cases we are right. 

 
13:00 
So that's super nice. 

 
13:03 
But again, it's only RNA containing. 

 
13:05 
And then you can say to me, well, yeah, now you're just showing a model. 

 
13:08 
You're doing exactly as the black box. 

 
13:10 
You give me a number, but I also give you a bit of an insight here. 

 
13:15 
But what? 

 
13:16 
Because what the algorithm does, it assigns the weights to each of the dimers. 

 
13:20 
So it picked up on the last and the first dimer in the siRNA duplex. 

 
13:25 
And it assigns a high weight to AU rich dimers in beginning, a high weight to GC rich dimers towards the end. 

 
13:34 
And if we just think about binding that kind of corresponds to siRNAs that are loosely binding in the beginning and tightly binding in the end, that would be active. 

 
13:44 
And vice versa, if you're tight bound and loosely bound, you would be inactive. 

 
13:49 
And that has also been published. 

 
13:50 
So for us that was kind of like nice, our algorithm is also found what is being published. 

 
13:57 
Then there was the 3rd features being picked up, which is the target binding energy. 

 
14:02 
So that's the predicted binding energy between the antisense strand and the RNA target. 

 
14:09 
And here it's very tight and loose. 

 
14:10 
And over here we have the probability of the siRNA being active. 

 
14:13 
So very like likely you can say to be active up here. 

 
14:19 
And then this is what the model would predict for different dimers. 

 
14:23 
In this case, I've zoomed in on an siRNA that actually have this reverse thing. 

 
14:28 
So we would say from a binding perspective that it shouldn't work because it's tied to its beginning loose towards the end. 

 
14:36 
But what we can find from our model is actually it depends on what your last dimer is. 

 
14:41 
It actually depends if you have CU or UG, you actually lie up here where they could be that you're assigning is active. 

 
14:48 
So it's not just binding energy, it's also the dimers. 

 
14:52 
And then from a data science protected, we always get the answer. 

 
14:56 
Yeah, that's really nice. 

 
14:57 
You made the model. 

 
14:57 
Can you validate it? 

 
14:58 
Can you show me that it also works on data that has not been seen by your algorithm? 

 
15:04 
And yes, it works. 

 
15:06 
We scrape the data from Alnylam patents, and they had 344 siRNAs. 

 
15:14 
Sorry for that. 

 
15:15 
And this is a ROC curve for those that are not data science savvy. 

 
15:21 
I can tell you that if you have a perfect model, you will have a line here. 

 
15:27 
If you have a flip of a coin, that is the daughter, and the area, if I scrape all of this is what we call the area under the curve. 

 
15:36 
So a perfect model would have an area under the curve and AUC of one. 

 
15:40 
Our model lies with 0.95 for on the Huesken data and with the Alnylam lies on 0.91 which is very similar. 

 
15:49 
So it performs actually quite well. 

 
15:52 
Then you can say that's nice, it was unmodified RNAs. 

 
15:55 
We applied this to modified because Alnylam also has modified RNAs in their patterns. 

 
16:00 
It works like crap, doesn't work at all. 

 
16:03 
And no surprise either. 

 
16:06 
So we actually went about and did exactly the same exercise for the modified siRNAs. 

 
16:11 
And we get the same performance, I haven't shown the model here because that's company secret, so I won't tell you. 

 
16:17 
But what it can show, and that was the true, you can say example that I was telling you about is that one thing is that you can classify an siRNA. 

 
16:28 
Is it active and it's inactive. 

 
16:29 
But you would also very much like to know how active is my siRNA? 

 
16:34 
Because if you were to like, I want just want the 10 best. 

 
16:37 
I don't want to have 10 that lies between 0 and 50%. 

 
16:42 
I want the 10 best. 

 
16:44 
So we had a company where they had modification on their siRNA. 

 
16:49 
They had like 4 different modification patterns. 

 
16:52 
They were very similar. 

 
16:55 
They had 600 siRNA where we were doing regression. 

 
16:58 
So now we are actually predicting the target knockdown, not just classifying and it was across six different targets, 6 different targets. 

 
17:10 
So we also needed because I think all know that one target might be easy to knock down, another one might be not as easy. 

 
17:18 
So you can say the plateau that you reach is different. 

 
17:22 
So what we did is that we within a target normalising saying OK, the plateau, we set that to zero and then you can normalise. 

 
17:29 
So we have a window. 

 
17:32 
So while we were doing that, we developed the model and nicely not there was a component that kept on screening the oligonucleotides and also for new targets. 

 
17:42 
So what you see here is completely, again, it's a validation data set for two targets that hasn't been used during development of the model. 

 
17:54 
And we get a collation of 73 and we capture more than 50% of the variation. 

 
18:01 
And then they were saying, OK, so had we used your model, if we had had that when we're doing this, would we actually have captured the 10 most, As I said in the beginning, the 10 most, most active siRNA, we would capture 4 out of the 10, which in their sense, that was good enough because there's also biological noise and so on and so on. 

 
18:25 
And we could also say that the 10 that they have said we're the best, it's not like we were way off. 

 
18:33 
They might have been in the like 20 best from our predictions. 

 
18:38 
So now we can actually do regression. 

 
18:42 
And then I have a few ones here where I can't show you data, but we have had like a 20% increase using modelling in the same, you can say setting, but increasing here on safety. 

 
18:56 
We had a company where they had both in vitro trend and in vivo data. 

 
19:00 
And as with every single company, they wanted to see how can we translate in vitro to in vivo. 

 
19:07 
The biggest question, not the biggest, that's delivery, delivery, delivery, I think. 

 
19:12 
But second, maybe most important one is in vitro to in vivo. 

 
19:18 
And not surprisingly from our mods, we could say these are the features that drives in vitro. 

 
19:26 
They are not the same as the ones that are driving in vivo. 

 
19:29 
So when you go and optimise in vitro using your really nice model, you're actually not doing anything. 

 
19:35 
Like you get good compounds, but they don't work in vivo because the in vivo model tells you something different. 

 
19:41 
And again, we had AUCs about 0.8 in this case. 

 
19:45 
And the last thing is actually one of those that I'm kind of more proud of is that for a company using their data, we'll develop a model together with them. 

 
19:55 
They now use it whenever they get a new target and want to design their the drugs. 

 
20:02 
They had an increase by a factor of four, so the 400% more hits. 

 
20:07 
So now they were coming back and accessing. 

 
20:09 
Can we do that regression? 

 
20:11 
Because. 

 
20:12 
Now it's not just before I was like, yeah, we got a hit. 

 
20:15 
Now it's like now we've got too many. 

 
20:17 
We need to actually be better at predicting which ones are the best and not just good. 

 
20:24 
So that's the last one. 

 
20:26 
And then how do you then, if you want to, of course, how could you work with us and how do we work so we can design compounds by using these in house models that I told you about. 

 
20:41 
You don't need to provide a virus with any data or anything. 

 
20:44 
We can design the molecules. 

 
20:47 
Then we can also develop models that are then specific to the settings. 

 
20:52 
That was the regression model I showed before. 

 
20:54 
And the thing is we don't care about the models. 

 
20:57 
They don't mean anything for us. 

 
20:59 
So they are yours for free, not for free, but they're youth, yours to use. 

 
21:07 
And of course as everyone knows this goes in cycle, you learn something, you optimise, you learn something, you optimise the well known DMTA cycle here and we completely step into that. 

 
21:19 
And then delivery, delivery, delivery. 

 
21:22 
We did a small because everyone is talking about and we are as short oligonucleotide focused company, but we had a company going coming to us saying, OK, you work with oligonucleotides, what about the LNPs? 

 
21:36 
We would very much like you to be able to predict PKA because then we could screen these 500 to 400 LNPs in silico and this would take years to synthesise. 

 
21:47 
So we were like, yeah, that's a challenge. 

 
21:49 
Let's see, we are not experts in LNPs, but we actually built a model for that company and they are now trying to understand how they can use it and are using it. 

 
22:00 
And mostly again, they learn something about what is it that drives PKA. 

 
22:05 
So they are now going to try and optimise their LNPs from that. 

 
22:11 
Then the last bit is here, like we are not just short oligos. 

 
22:15 
I keep on talking about that. 

 
22:16 
That's because that's my background. 

 
22:18 
But we are also working with disease understanding back to the patients that I talked about in the beginning. 

 
22:24 
And here we have a very neat collaboration with the Spanish startup company called Peptomix. 

 
22:30 
And here you might notice this number over here, they had 16 patients in a clinical trial, and they were coming to us and say we're going for the next clinical trials. 

 
22:40 
Could you please help us choose the right patients to include? 

 
22:45 
We want to have success in that. 

 
22:48 
So they had measured 20 soluble factors and wanted to see if we could predict whether the patient was in a progressive state or in a stable to state. 

 
22:59 
I have to say this is cancer and this is meek inhibition. 

 
23:01 
I didn't say that jumping straight into machine learning and data, Yeah. 

 
23:04 
So it's an early predictive response for first in class MYC inhibitors. 

 
23:10 
So their question was could we find a combination of these soluble factors that could stratify and distinguish which patient would actively respond to their drug and very nicely. 

 
23:23 
It actually all came down to two soluble factors that you just have to add. 

 
23:29 
That's the model. 

 
23:30 
It's very simple and the lines here, that's the decision boundary you could call it. 

 
23:38 
So the green dots, that's patients that are aggressive, the pink dots are stable disease. 

 
23:43 
Again, it was an ongoing clinical trials, new data got in the stable new patient, the progressive all lies in the right side of the borders. 

 
23:54 
So now those models are actually patented together with the molecule itself because that is actually a way to figure out whether patients get to treatment or not. 

 
24:06 
Lastly, I just want to, I'm a physicist by training and I was talking about why. 

 
24:11 
And this quote by Feynman just resonates so good with me. 

 
24:15 
Like I would rather have questions that can't be answered than answers that can't be questioned. 

 
24:20 
So I want to end here and take any questions.