1:14 
Thank you for the warm welcome and also thank you to the audience regarding the interest in this topic. 

 
1:21 
To start off, let's have a look at the regulatory landscape in the last three decades and on the registration of on the one hand side, subcutaneous and intravenous products. 

 
1:34 
So especially in the last 10 years we've seen a significant increase in subcutaneous products which coincides with an increase in the respective concentration of the drug products. 

 
1:47 
The highest concentrations of currently marketed products is around 2200 milligrams per millilitre. 

 
1:54 
But if we have a look on products currently in clinical development companies strive for even higher concentrations. 

 
2:01 
The majority of subcutaneous drug products are dealing with liquid dosage forms in a wild prefilled syringes and auto injectors. 

 
2:11 
Freeze dry products are rather rarely seen in this space but they exist and there are also products which have been converted from a LYO into a liquid configuration. 

 
2:23 
Why we are seeing this trend? 

 
2:25 
I make this rather brief because I can piggyback on what was presented yesterday by Emmanuel Sanchez Felix. 

 
2:33 
So the drivers are mainly the enabling of self-administration for the sake of increasing patient convenience. 

 
2:42 
So this is also hallmark for increased patient centricity in biopharmaceutical development, especially patients and chronic diseases will benefit from this. 

 
2:51 
The compliance might increase due to the easiness of use, but we also have a powerful tool by subcutaneous delivery to decrease the costs in the healthcare sector because we can eliminate clinical dose preparations and also the administration of products via healthcare professionals like in the IV settings. 

 
3:13 
But unfortunately, this is a downside. 

 
3:15 
It comes also along with challenges, right. 

 
3:17 
So first of all, bioavailability is decreased in the subcutaneous administration. 

 
3:23 
So mAbs range roughly between 50 and 85% in bioavailability. 

 
3:29 
If we have a look on a simple bolus SC administration without any additional means to help with, we can consider 3ml as a kind of threshold and volume limitation, which drives the need for higher drug product concentrations which come along with their own challenges like viscosity and also increased aggregation propensity. 

 
3:57 
And if companies strive for converting their IV product into a subQ product, for sure this comes along with additional burden in terms of additional nonclinical and clinical studies. 

 
4:08 
But we have strategies in place to overcome those challenges. 

 
4:11 
Some of them have been mentioned yesterday already. 

 
4:14 
So we have the possibility to Co administer hyaluronidase to overcome limitations and dose volume or also large volume devices. 

 
4:24 
But this talk will be focusing on the development of highly concentrated protein formulations. 

 
4:31 
Before we start off into the actual workflows and where data science has a footprint in our setting, I would like to share with you a little bit our philosophy at Leukocare. 

 
4:41 
Why we believe that data science can add value in the formulation development process. 

 
4:47 
I mean, when we start off we have the molecule and QBD paradigms have been enforced also target product profile information. 

 
4:58 
But how is this information utilised? 

 
5:01 
When I started off 20 years ago with formulation development, it was a very empirical process. 

 
5:05 
People were using the prior knowledge yes, but often in a very arbitrary and biassed way. 

 
5:11 
And we believe that the combination of our expertise as formulation scientists plus data science can rationalise, especially the pre selection of excipients, which are forwarded in formulation, screening and optimisation studies for the sake of saving resources in terms of time, in terms of material, in terms of human resources, but also lowering the risks of programme. 

 
5:38 
And last but not least, data science also allow us to increase product understanding and to build quality into new products. 

 
5:47 
On our side, formulation development projects. 

 
5:49 
Projects are divided into 3 consecutive phases, starting off with basic characterization, where we try to increase our product understanding and molecule understanding. 

 
6:01 
Then we move into formulation, screening and optimization phases. 

 
6:05 
And last but not least, the findings will be validated by confirmatory stability studies. 

 
6:12 
And where does data science has a footprint in these faces? 

 
6:16 
You will have a look or we will have a look in each and every tool in more detail during the talk. 

 
6:23 
But in the nutshell, in the basic characterization, we start off with the amino acid sequence and the target product profile and subject these information to several processing steps like sequence alignments, molecular modelling, data mining in our internal databases and also machine learning applications. 

 
6:44 
And this ends in a rationale pre selection of excipients to be brought into the formulation and screening optimisation phase. 

 
6:53 
Here we heavily rely on design of experiment and what this phase delivers is actually a qualitative and quantitative justification of the use of excipients and formulation design spaces. 

 
7:06 
And these design spaces are then validated in the confirmatory stability studies. 

 
7:13 
This is a rate limiting step and therefore the it's very useful to leverage shelf life prediction methodologies. 

 
7:20 
And we will have a case study in this talk as well that we will discuss where we systematically compare the performance of these shelf life prediction methods for monoclonal antibodies. 

 
7:33 
And at the end of the day, we have the final formulation selection and also at least a certain likelihood with which the product can satisfy the end of shelf life specifications. 

 
7:51 
During the BMS talk yesterday, we learnt also how important structured data are for every data science application you can imagine. 

 
8:00 
We structure our data already for years in terms of our internally generated data. 

 
8:06 
But we also thought that all the publicly available information and already registered products can be useful for us. 

 
8:14 
Because what's the benefit about this products? 

 
8:16 
They are approved, they are stable, they are manufacturable. 

 
8:20 
But the problem statement here is really that the information about the drug product, the target product profile is very dispersed between different publicly available databases. 

 
8:32 
And this was the reason I that we strive to unify all this information by programming dedicated web scrapers which collect this information and unify it in an internal database. 

 
8:46 
We might also expand this in the future to peer reviewed papers, for example, maybe also to patents, which could be a little bit more complicated. 

 
8:54 
And on the other hand side with all the second problem statement as formulators, we are also interested in quantitative information, right? 

 
9:02 
So at which concentrations are excipients used in the registered products. 

 
9:07 
And therefore we systematically extracted this information from labels from the registered products by leveraging language recognition models which structure the unstructured label text by clearly identifying [unclear], units and substances. 

 
9:26 
And this allowed us to combine this quantitative information with the qualitative information we gathered from the different databases in order to create a unified database. 

 
9:38 
And now you might ask how we use this unified database. 

 
9:42 
So we have came up with two different paths, let's say to process the very basic, amino acid secret information. 

 
9:50 
We fed the sequence information, sorry into a BLAST sequence alignment. 

 
9:58 
The BLAST sequence alignment delivers us with a similarity tree which allows us to identify nearest neighbours, so very similar molecules to the one we want to formulate. 

 
10:12 
It also extracts the chemical liabilities in the aligned sequence sequences and in addition, it provides us with statistics about the excipient use in the very similar products as opposed to their use in the entire data set, which could also be a very useful information. 

 
10:31 
A second path, you can see here. 

 
10:34 
So here the amino acid sequence is fed into our molecular modelling pipeline, and this pipeline delivers us in an atomized way with more than 100 molecular features, including number and size of hydrophobic patches in the globally, but also in near CDRs, for example, the same for charge patches, but also dipole moments and many other features. 

 
10:58 
But the question is really what's the value of all these numbers, right. 

 
11:04 
And therefore we decided to come up with another tool where we modelled more than 1000 molecules which are either registered or in advanced clinical testing and unified all these features in another database that we call Flash. 

 
11:23 
And this allows us to instantly compare each and every feature of a new molecule against the same features of the background data set of these more than 1000 molecules. 

 
11:36 
And the benefit of this is we can really highlight outlier properties which require then special attention during formulation development. 

 
11:47 
But we want also wanted to go a little step forward in rationalising our excipient preselection. 

 
11:53 
And what we did was we took all the sequence and also structure related features plus elements of the target product profile as input features for a random forest type of classifier algorithm. 

 
12:06 
And the output of this algorithm is the probability with which a certain excipient should be present in a stable drug product. 

 
12:15 
And this adds a lot of productivity in the formulation development process because we can really focus on the promising excipients during our formulation development. 

 
12:27 
Plus there's another advantage of using algorithm types like the random forest because they are to a high degree interpretable because as formulators, we have the intuition that we would like to understand what a prediction model does and why it ends up with a certain prediction, right? 

 
12:48 
And this can be done by having a closer look on future impotentcies.  

 
12:53 
So we use here the rock AOC as a measure with a standard measure for evaluating a prediction performance. 

 
13:00 
So it compares false positives against true positives. 

 
13:05 
And what we found out was that the by stepwise reducing the model and eliminating different feature categories like molecular modelling features or others that different excipients are affected to a different extent by removing these features. 

 
13:25 
And if we can have a look for example to PS-80, which is shown here in brown, it's less affected by removing the molecular modelling features because as we heard, the mode of action is on the one hand side really to work at interfaces where no direct interaction with the protein or cures and maybe only to a minor part by direct interactions with the molecule. 

 
13:51 
And here more in a nonspecific way. 

 
13:53 
And we have other excipients that are highly affected by eliminating molecular modelling features like acetic acid, for example, because it is used in a certain pH range. 

 
14:03 
And the pH range that I target highly depends on the molecular modelling features because I would like to, for example, try to increase net charge, for example, to decrease viscosity or things like that. 

 
14:14 
So this makes absolutely sense. 

 
14:18 
Now let's step away a little bit from our In silico applications because we complement the In silico application with bed lab data in the basic characterization phase as well. 

 
14:29 
And what's always good to have our methods which are predictive for certain key quality attributes. 

 
14:35 
And for the highly concentrated liquid formulation, it is viscosity and the aggregation propensity. 

 
14:42 
And there was a nice publication by Kingsbury et al. several years ago who studied a very broad range of proteins at a single concentration. 

 
14:53 
But they could nicely show that the KD really nicely correlates with the with low viscosities and low opalescence. 

 
15:02 
High kDs are indicative for repulsive interactions. 

 
15:06 
And here we have to underline it's really the diversity in this data set comes from the different proteins, but it has no diversity in terms of formulations. 

 
15:18 
And it is very important to keep in mind that formulation is a very powerful tool to fine tune the kD as we see here with this three example mAbs. 

 
15:31 
So we can really come from very negative KDs to very positive KDS by simply changing the buffer type and the pH value. 

 
15:40 
And as we see here histidine and acetated to significantly higher KD value. 

 
15:44 
So and enforce repulsive interaction which leads to better solution properties. 

 
15:51 
And this data also nicely correlates with, for example, nano differential scanning fluorescence data. 

 
16:00 
The method allows us not only to study the confirmation stability, but also the aggregation propensity via the back scatter signal. 

 
16:09 
And these two biophysical methods are very in line and confirm each other in many cases. 

 
16:19 
Now let's come from the basic characterization to the actual formulation screening. 

 
16:24 
So as a reminder, we have our in silico tools and the inputs from there. 

 
16:31 
We also have our wet lab data buffer and pH screening. 

 
16:34 
So DLS and KD analysis is not the only method. 

 
16:38 
We only have chromatographic methods we can apply here. 

 
16:42 
We also do concentration studies where we use also small scale TFF devices in order to have a representative method for large scale protection processes. 

 
16:54 
And all this serves as input for our DOE designs actually. 

 
16:58 
So type of excipients, pH value also drug concentration is a factor in the DOEs, but we can also consider other covariance and also design constraints like if a client wants to target a certain osmolarity. 

 
17:12 
All this can be built in the design. 

 
17:15 
We also have to align on the critical quality attributes that we would like to follow and on the test conditions which allow us to discriminate between stabilising and destabilising conditions. 

 
17:26 
Then we perform the analytics. 

 
17:28 
We also have for nearly all of our analytics a data pipeline, which, for example, includes inline quality cheques and up on the availability of QC analytical data. 

 
17:41 
We perform regression analysis and data visualisation. 

 
17:45 
The screening itself, it strives for the identification of linear effects only and a nice way to really visualise effect sizes, effect directions. 

 
17:57 
So I have a positive or negative and also the fact whether we see curvatures in the relationship between formulation parameters and key quality attributes can be quantified and similar things are done for the optimisation. 

 
18:12 
The optimization instead is really dry striving for the identification not only of linear effects but also non linear, so quadratic effects and also two-way interactions between formulation parameters. 

 
18:27 
So we are not only feeding information of the DoE screening into the optimisation, but also some data mining efforts where we look into our unified databases and see how certain specific exhibits like L proline here for example, are used in different route of administration. 

 
18:47 
It delivers us with the products using this excipient, which allows us to also calculate back the daily exposure with certain excipient and also excipient which are more or less frequently combined with this excipient. 

 
19:02 
And the approach is relatively comparable to the screening. 

 
19:06 
However, we are using different designs for the optimisation. 

 
19:12 
So it could be a composite designs or things like that. 

 
19:15 
And also the visualisation methods are different. 

 
19:18 
So we are performing contour plotting in order to describe the relationship between formulation parameters, if there are interactions or the relationships between formulation parameters and critical quality attributes. 

 
19:31 
We also perform response surface modelling to identify local optimer in these relationships. 

 
19:38 
And this information then goes into the stability studies. 

 
19:45 
These studies are very important because they are also rate limiting, right? 

 
19:49 
Because I need the formulation information from very important activities downstream of formulation like GLP tox studies, GLP campaigns. 

 
19:58 
So there is really a need to see how a decision point can be expedited in this phase. 

 
20:04 
This is the reason why we performed a stability study at 3 different temperatures with two mAbs. 

 
20:12 
We took the CEX main peak and the SEC monomial peak as key quality attribute and we compared three different prediction methods, linear regression, multiple regression and advanced kinetic modelling against each other. 

 
20:26 
And we did these predictions based on three different data sets. 

 
20:30 
So based on one month data, two months data and three months data, and also two reduced data sets which omitted some of the 40° time points. 

 
20:40 
And what we saw is that advanced kinetic modelling is performing either comparably to the mathematical methods or even slightly better. 

 
20:47 
The difference in performance is even higher for other modalities like viral vectors, for example, from our experience. 

 
20:53 
And what we also saw is that we can do reliable predictions already after two months. 

 
20:59 
So you see the predictions of AKM here in green and the actual measured values in the dashed line. 

 
21:07 
So, and this is a significant acceleration of decision making here enabled by stability predictions. 

 
21:17 
You might remember our philosophy combination of formulation expertise and data science, and this is a very nice example to underline this. 

 
21:26 
So this case study deals with an IgG 4. 

 
21:28 
The modelling showed us that there was a property a higher [UNCLEAR] new CDR which was a kind of outlier property if compared to known molecules which are commercialised or in advanced clinical testing. 

 
21:44 
Why this was the case? 

 
21:46 
We had a closer look to the structure and we could identify 3 tyrosine residues which were very exposed and we asked ourself can we mask these residues because they are deemed to really mediate bad solution behaviour like high viscosity or low solubility. 

 
22:04 
And knowing that there is an approved excipient for parental use the kleptose actually which allows to form inclusion complexes between the aromatic side chains and the core here of the cyclodextrin. 

 
22:24 
This was the theory but it turned out to be to work out actually. 

 
22:29 
So we see here concentration dependent increase of the solubility of the IgG 4 and also a reduction in viscosity as compared to other known viscosity reducers. 

 
22:45 
I would like to conclude with very short two case studies. 

 
22:50 
Here we have Herceptin which we try to up concentrate on the one hand side which you see in red here in the original ID concentration versus an in house develop formulation and we could reduce the viscosity by 40% and the aggregation propensity also significantly. 

 
23:10 
This data is also published. 

 
23:13 
We also have I think six months stability data at room temperature even so a very successful project. 

 
23:23 
And last but not least, I would like to share this IgG 4 example. 

 
23:26 
This was a biosimilar developer approaching us who was driving to convert the originator product which was known to be lyophilized, administered in an IV setting and only stable as a lyophilised aid at 2 to 8°. 

 
23:42 
And we delivered a formulation enabling a drug product at more than 150 milligram per ml eligible to subcutaneous administration and stable in liquid at 2 to 8°. 

 
23:55 
And as you see here, two key quality attributes, the high molecular weight species and the viscosity all stay below the threshold defined by the target product profile. 

 
24:06 
So lessons learned, there is a clear trend of highly concentrated liquid from formulations for known reasons. 

 
24:14 
Formulation development is a very important enabler for HCLFs and data science is very instrumental in driving the productivity of the formulation development process. 

 
24:25 
If you would like to learn more, please stop by. 

 
24:28 
We have a booth number 7 in the exhibition hall. 

 
24:31 
And I hope you enjoyed the talk and would be happy if we have the chance to chat during the morning hours or even this afternoon. 

 
24:41 
And yeah, thank you for your interest.