NVIDIA Clara Discovery aims to give researchers the tools they need to accelerate drug discovery
NVIDIA has teamed up with biopharmaceutical company AstraZeneca and the University of Florida’s Academic Health Center and the University of Florida College of Health on a new AI research project utilizing the groundbreaking Transformer neural network.
Newly proposed Transformer-based neural network architectures in recent years allow researchers to use self-supervised training methods for pre-training using batch datasets without the need to manually label the data. These models can learn syntactic rules to describe chemistry, as well as language grammar, and be applied across research fields and modalities.
NVIDIA is working with AstraZeneca to develop a Transformer-based generative AI model for chemical structure generation in drug discovery, which will be the first project to run on Cambridge-1, which will be the UK’s most powerful supercomputer. The model will be open-sourced, available to researchers and developers in the NVIDIA NGC software catalog, and deployable on the NVIDIA Clara Discovery computational drug discovery platform.
Separately, the University of Florida School of Health is leveraging NVIDIA’s latest Megatron framework and BioMegatron pretrained models on NGC to develop GatorTron, the largest clinical language model to date.
New NGC applications include AtacWorks, a deep learning model for identifying accessible DNA regions; and MELD, a tool for inferring biomolecular structure from sparse, fuzzy or noisy data.
Megatron Models for Molecular Analysis
The drug discovery model MegaMolBART, developed by NVIDIA and AstraZeneca, is planned for reaction prediction, molecular optimization, and molecule generation. This model is based on AstraZeneca’s MolBART Transformer model and trained on the ZINC compound database – using NVIDIA’s Megatron framework for massively scaled training on supercomputing infrastructure.
The large ZINC database allows researchers to pretrain models to understand chemical structures without manually labeling the data. Armed with a statistical understanding of chemistry, the model will be used for a range of downstream tasks, including predicting interactions between chemicals and generating new molecular structures.
Ola Engkvist, Head of Molecular AI, Discovery Science and R&D at AstraZeneca, said: “Just as AI language models can learn the relationships between words in sentences, our goal is that neural networks trained on molecular structure data will be able to learn real-world molecules The relationship between atoms. Once developed, the NLP model will become an open source model, providing the scientific community with a powerful tool to accelerate drug discovery.”
The model, trained using an NVIDIA DGX SuperPOD, helped researchers discover molecules that were not in the database but could be potential drug candidates. Computational methods called in-silico techniques allow drug developers to search more in the vast chemical space and optimize pharmacological properties before conducting expensive and time-consuming laboratory tests.
The collaboration will use NVIDIA DGX A100-powered Cambridge-1 and Selene supercomputers to run large workloads at scale. Cambridge-1 is the UK’s largest supercomputer, ranked third on the Green500 and 29th on the TOP500 list of the world’s most powerful systems. NVIDIA’s Selene supercomputer tops the latest Green500 list and ranks fifth on the TOP500.
Language Models Accelerate Healthcare Innovation
The GatorTron model at the University of Florida School of Health was trained using more than 50 million interaction records from 2 million patients, a breakthrough that could help identify patients in need of clinical trials, predict and alert health teams of life-threatening conditions, and provide Physicians provide clinical decision support.
“GatorTron leverages more than a decade of Electronic medical records to develop state-of-the-art models,” said University of Florida Provost Joseph Glover. “The university recently upgraded its supercomputing facility with NVIDIA DGX SuperPOD. A tool at this scale can help healthcare researchers gain insights. , and identify previously unavailable medical trends based on clinical note records.”
In addition to clinical medicine, the model can rapidly create patient cohorts for clinical trials, as well as study the effects of specific drugs, treatments or vaccines to accelerate drug discovery.
The model was built using BioMegatron, the largest ever trained biomedical Transformer model, developed by the NVIDIA Applied Deep Learning research team using PubMed corpus data. BioMegatron is available from Clara NLP on NGC (Clara NLP is a collection of NVIDIA Clara Discovery models pre-trained with biomedical and clinical text).
“The GatorTron project is an outstanding example of how academic and industry experts are collaborating using cutting-edge artificial intelligence and world-class computing resources,” said Dr. David R. Nelson, vice chair of the University of Florida’s Division of Health Affairs and president of the Florida College of Health. “Our partnership with NVIDIA, It is critical for the University of Florida to become a hub for AI expertise and development.”
Empowering drug discovery platforms
The computational drug discovery platform also uses the NVIDIA Clara Discovery library and the NVIDIA DGX system to advance drug research.
Schrödinger, a leader in chemical simulation software development, today announced a strategic partnership with NVIDIA, including scientific computing and machine learning research, Schrödinger application optimization on NVIDIA platforms, and joint solutions around NVIDIA DGX SuperPOD, in minutes Evaluate billions of potential drug compounds in-house.
Biotechnology company Recursion has installed BioHive-1, a supercomputer based on the NVIDIA DGX SuperPOD reference architecture that, as of January, ranked 58th on the TOP500 list of the world’s top computer systems. BioHive-1 enables Recursion to run a deep learning project in a day, which previously took a week with their existing cluster.
Insilico Medicine, a partner in the NVIDIA Inception Startup Accelerator Program, today announced the discovery of a new preclinical drug candidate for the treatment of idiopathic pulmonary fibrosis—the first AI molecular design for a new disease target and use in Examples of clinical trials. Systems powered by NVIDIA Tensor Core GPUs generate compounds from target hypothesis to preclinical candidate selection in less than 18 months and cost less than $2 million.
As part of the NVIDIA Inception Startup Accelerator Program, Vyasa Analytics uses Clara NLP and NVIDIA DGX systems to give users access to pretrained models for biomedical research. The company’s GPU-accelerated Vyasa Layar Data Fabric powers multi-institutional cancer research, clinical trial analysis and biomedical data orchestration solutions.
Register for free to watch NVIDIA founder and CEO Jen-Hsun Huang’s keynote speech. Learn more about NVIDIA’s progress in the healthcare industry at this week’s GTC, which includes 16 webinars, 18 special events, and more than 100 presentations.
The Links: 7MBR50SC060-50 EL640.480-AD4