The aging of the world’s population, the growing burden of chronic and infectious diseases, and the emergence of new pathogens have made the need for new therapies more urgent than ever. However, discovering a new drug and bringing it to market is a long, arduous and expensive journey marked by many failures and few successes.
AI has long been seen as the answer to overcoming some of these hurdles due to its ability to analyze massive amounts of data, detect patterns and relationships, and predict impacts.
Now a multi-institutional team led by Marinka Zetnik, a biomedical informatics specialist at Harvard Medical School, has now launched a platform that aims to improve AI-driven drug discovery by developing more realistic datasets and higher-precision algorithms.
The Therapeutics Data Commons, described in a recent commentary in Chemical Nature Biologyis an open-access platform that acts as a bridge between computer scientists and machine learning researchers on the one hand and biomedical researchers, biochemists, clinical researchers, and drug designers on the other—communities that have traditionally worked in isolation from each other.
The platform provides both data set design, algorithm design, and performance evaluation for multiple treatment modalities – incl Small molecule drugsand antibodies, cell and gene therapies – at all stages of drug development, from chemical compound identification to clinical experimental drug performance.
Zitnik, assistant professor of biomedical informatics at the Blavatnik Institute at HMS, conceived of the platform and is now leading the work in collaboration with researchers at MIT, Stanford University, Carnegie Mellon University, Georgia Tech, University of Illinois-Urbana-Champaign, and Cornell University.
I recently discussed the clinical data commons with Harvard Medical News.
HMNews: What are the central challenges in drug discovery and how can AI help solve them?
ZETNIK: Developing a safe and effective drug from scratch is a huge challenge. On average, it takes between 11 and 16 years and between $1 billion and $2 billion to do it. why is that?
It is very difficult to know whether an initially promising chemical compound will produce results in human patients consistent with early in vitro results. The number of small molecule compounds is 10 to the 60th – yet only a fraction of this large astrochemical space has been collected in search of molecules with medicinal properties. Despite this, the impact of current therapies on treating the disease has been amazing. We believe that new algorithms combined with automation and new datasets can find many molecules that can be translated into improved human health.
AI algorithms can help us determine which of these molecules are most likely to be safe and effective human treatments. This is the ultimate problem plaguing the development of drug discovery. Our vision is that machine learning models can help screen and integrate vast amounts of biochemical data that we can relate directly to molecular and genetic information, and ultimately to individual patient outcomes.
HMNews: How close is artificial intelligence to making that promise a reality?
Zetnik: We’re not there yet. There are a number of challenges, but I’d say the biggest is understanding how well our existing algorithms work and whether their performance translates into real-world problems.
When we evaluate new AI models through computer modeling, we test them on standard datasets. Increasingly, we see in publications that these models achieve near-perfect accuracy. If so, why aren’t we seeing widespread adoption of machine learning in drug discovery?
This is because there is a significant gap between performing well in a standardized dataset and being ready to transition to real-world implementation in a medical or clinical setting. The data on which these models are trained and tested does not indicate what kind of challenges these models face when used in real practice, so filling this gap is really important.
HMNews: Where does the Therapeutics Data Commons platform come from?
Zitnik: The goal of Therapeutics Data Commons is to precisely address such challenges. It serves as a meeting point between the machine learning community at one end and the biomedical community at the other. The machine learning community can help with algorithmic innovation and make these models more translatable to real-world scenarios.
HMNews: Can you explain how it actually works?
ZETNIK: First of all, keep in mind that the drug discovery process runs the gamut from initial drug design based on data from chemistry and chemical biology, through preclinical research based on data from animal studies, all the way to clinical research in human patients. The machine learning models that we train and evaluate as part of the platform use different types of data to support the development process at all of these different stages.
For example, the machine learning models that support the design of small-molecule drugs typically rely on large datasets of molecular diagrams — the structures of chemical compounds and their molecular properties. These models find patterns in known chemical space that associate parts of a chemical structure with chemical properties that are critical to drug safety and efficacy.
Once the AI model has been trained to identify these tell-tale patterns in the known subset of chemicals, it can be deployed and can search for the same patterns in huge data sets of untested chemicals and predict how these chemicals will perform.
To design models that can aid in late-stage drug discovery, we train them on data from animal studies. These models are trained to look for patterns that correlate biological data with potential clinical outcomes in humans.
We could also ask if the model could search for molecular signatures in chemical compounds that correlate with patient information to identify the subset of patients most likely to respond to a chemical compound.
HMNews: Who are the contributors and end users of this platform?
Zitnik: We have a team of students, scientists, and expert volunteers who come from partner universities and industry, including small startups in the Boston area as well as some large pharmaceutical companies in the US and Europe. Computer scientists and Biomedical researchers Contribute their expertise in the form of state-of-the-art machine learning models and preprocessed and formatted datasets, standardized in such a way that they can be versioned and ready for use by others.
Therefore, the platform has both analysis-ready datasets and machine learning algorithms, along with powerful measures that tell us how well a machine learning model is performing on a given dataset.
Our end users are researchers from all over the world. We organize webinars to introduce any new features, receive feedback and answer questions. We provide lessons. This constant coaching and feedback is really crucial.
We have 4,000 to 5,000 active users every month, most of whom are from the US, Europe and Asia. Overall, we saw over 65,000 downloads of our machine learning algorithm/dataset package. We’ve seen over 160,000 downloads of our standardized and formatted datasets. The numbers are growing, and we hope they will continue to grow.
HMNews: What are the long-term goals of Therapeutics Data Commons?
Zetnik: Our mission is to support drug discovery with AI on two fronts. First, in designing and testing machine learning methods across all phases drug discovery and development, from chemical compound identification and drug design to clinical research.
Second, to support the design and validation of machine learning algorithms across multiple therapeutic modalities, particularly newer ones, including biological products, vaccines, antibodies, mRNA drugs, protein therapies, and gene therapies.
There is a huge opportunity for machine learning to contribute to those new therapies, and we have not yet seen AI used in those areas to the extent that we have seen in small molecule research, which is where a lot of the focus is today. This gap is mostly due to the paucity of AI-ready benchmark datasets for those new therapeutic approaches, which we hope to address with Therapeutics Data Commons.
HMNews: What sparked your interest in this job?
Zetnik: I’ve always been interested in understanding and modeling interactions across complex systems, which are systems with multiple components that interact with each other in an unauthorized manner. As it turns out, many of the problems in therapeutic science are, by definition, just such complex systems.
We have a protein target which is a complex 3D structure, we have a small molecule complex that is a complex graph of atoms and the bonds between those atoms, then we have a patient, his description and his state of health are shown in the form of a multiscale representation. This is a classic complex system problem, and I really like researching and finding ways to unify and “tame” those complex interactions.
Remedial science is full of those kinds of problems that it’s time to take advantage of machine learning. This is what we are chasing, this is what we are striving for.
Kexin Huang et al, Foundation for Artificial Intelligence for Therapeutic Sciences, Chemical Nature Biology (2022). DOI: 10.1038/s41589-022-01131-2
Harvard Medical School
quotesCan AI change the way we discover new drugs? (2022, November 16) Retrieved November 16, 2022, from https://phys.org/news/2022-11-ai-drugs.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.