Technical

Accelerating Molecular Design with AI: BioNeMo at the Frontier of Biotech

December 3rd, 2025

Santiago Ferreiros

Tech Lead

Delfina Montilla

Sr. Machine Learning Engineer

Introduction

In the age of AI-powered biology, NVIDIA's BioNeMo platform has emerged as a foundational tool for modern biotech and pharma R&D teams. It isn't just a framework for AI models, it’s a full ecosystem for molecular discovery

From Language Models to Molecules

BioNeMo extends NVIDIA's NeMo framework into life sciences, supporting a range of generative AI models for proteins, DNA, RNA, and small molecules. From folding proteins using ESMFold, to generating drug-like compounds via MegaMolBART, BioNeMo is deeply rooted in transformer architectures and optimized for GPU acceleration.

Beyond the models themselves, it's how those models are packaged and delivered. With BioNeMo NIM microservices, models are deployed as APIs. This means researchers can run predictions with a simple REST call, embedding AI directly into production workflows. BioNeMo bridges the gap between advanced AI and real world molecular tasks, from DiffDock powered docking to ProtT5 driven protein design.

BioNeMo includes a suite of foundation models specialized in different molecular tasks, from structure prediction to generative design:

ESMFold: Fast 3D protein structure prediction directly from amino acid sequence.
MegaMolBART: Generative model for small molecules in SMILES format, trained with rich chemical context.
DiffDock: Ligand–protein docking prediction using diffusion based generative modeling.
ProtT5: Protein language model for embedding sequences and generating novel variants.

For those looking to explore or integrate BioNeMo models into custom workflows, NVIDIA provides open-source access to the full framework and underlying components:

NeMo GitHub Repository: The core framework for training and serving large language models
BioNeMo GitHub Repository: A domain specific extension of NeMo tailored for life sciences, offering tools, pretrained checkpoints, and examples for generative biology.

These repositories enable teams to experiment, fine-tune, and deploy models in flexible environments, from notebooks to enterprise-scale clusters.

End-to-End Workflows and Blueprints

One of the most valuable offerings in BioNeMo is its set of domain specific blueprints. These are complete, multimodel workflows that reflect real R&D use cases. Want to design a protein binder? BioNeMo provides a blueprint combining AlphaFold, RFdiffusion, ProteinMPNN, and AlphaFold-Multimer for iterative generation and validation.

This modularity allows teams to customize workflows to their data and discovery pipelines. The design-first approach, with open source containers and reproducible pipelines, means biotech teams can rapidly prototype, test, and scale generative biology applications.

Real World Use Cases in Industry

BioNeMo has rapidly evolved from an experimental AI toolkit into a robust engine for enterprise-grade life sciences R&D. Across the biotech and pharmaceutical landscape, it is being used to accelerate multiple phases of the discovery pipeline, from early target identification to lead optimization and candidate validation.

Pharmaceutical companies leverage BioNeMo to reduce time to discovery through generative chemistry and predictive modeling. Instead of relying solely on traditional high throughput screening or wet-lab experiments, teams can generate, dock, and evaluate thousands of compounds computationally in a matter of hours. For biologics, researchers are using BioNeMo to design novel protein binders, engineer enzymes, and model complex protein-protein interactions.

In contract research organizations (CROs) and biotech startups, BioNeMo enables teams to scale with limited infrastructure by using containerized microservices and cloud deployment. Its API first approach allows seamless integration into existing platforms such as LIMS¹, ELN, or automated lab systems.

Academic and research institutions also benefit from BioNeMo’s open source flexibility and high performance workflows. Projects like full proteome folding, large scale ligand screening, or structure informed variant annotation are becoming more accessible thanks to BioNeMo’s model efficiency and compute optimization.

Whether supporting computational chemists, molecular biologists, or bioinformaticians, BioNeMo is quickly becoming a foundational layer in the digital transformation of molecular discovery.

Hands-On: DiffDock

In this demo, we’ll use NVIDIA BioNeMo’s DiffDock web interface to predict how the antiviral molecule Nirmatrelvir (the active ingredient in Paxlovid) binds to the main protease (Mpro) of SARS-CoV-2, a real world target with a known crystal structure. This will allow you to validate the model’s predictions against experimentally known data.

Step 1: Access the DiffDock Web UI

Go to: https://build.nvidia.com/mit/diffdock. You’ll see the interface split into two sides:

Input (left): to upload your molecule and target protein
Output (right): to view predicted docking poses and scores.

Step 2: Prepare Input Files

Download the Molecule: Nirmatrelvir is available in 3D formats from PubChem (3D SDF format) and save as nirmatrelvir.sdf

After downloading the .pdb file of the target protein, it’s important to clean the structure before using it in DiffDock. Most crystal structures include extra elements like water molecules, cofactors, or crystallographic ligands that can interfere with docking predictions. To ensure reliable results, we need to isolate just the relevant protein chain, typically chain A, and remove any non-standard residues.

This step can be done using PyMOL, but if you don’t have it installed, you can achieve the same result using Download the protein SARS-CoV-2 Main Protease (Mpro) in legacy PDB format and save as 7VH8.pdb7VH8.pdb directly in a jupyter notebook. By running a short script, you’ll generate a clean mpro_chain_cleaned.pdb file that contains only the protein backbone required for DiffDock to run properly. This ensures that the input focuses solely on the true binding pocket and avoids misleading predictions caused by leftover atoms in the structure.

!pip install Bio


from Bio.PDB import PDBParser, PDBIO, Select


class CleanChain(Select):
   def accept_residue(self, residue):
       return residue.id[0] == ' '  # Keep only standard residues
parser = PDBParser(QUIET=True)
structure = parser.get_structure("Mpro", "7VH8.pdb")


model = structure[0]
chain = model['A']  # Select chain A only


from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain


new_structure = Structure("Cleaned")
new_model = Model(0)
new_chain = Chain("A")


for residue in chain:
   if residue.id[0] == ' ':  # Exclude HETATM, waters, ligands
       new_chain.add(residue)


new_model.add(new_chain)
new_structure.add(new_model)


io = PDBIO()
io.set_structure(new_structure)
io.save("mpro_chain_cleaned.pdb", select=CleanChain())

Step 3: Upload Files to the DiffDock Interface

Before running DiffDock, it's a good idea to validate your input files to ensure they're correctly formatted. You can quickly check this by uploading your .sdf (ligand) and .pdb (protein) files to Mol* Viewer, an open source 3D molecular structure viewer. If the files open and render without errors, it confirms that they contain valid atomic coordinates and can be safely used as input for docking. This step helps catch issues like empty chains, unsupported atom types, or misformatted molecules before triggering server-side errors.

On the left (Input) panel:

Under Molecule, click “Upload New File” and select nirmatrelvir.sdf‍
Under Target Protein, click “Upload New File” and select mpro_chain_cleaned.pdb‍
Then adjust the parameters:
- Generated Poses → 20 (recommended default
- Diffusion Steps → 1
- Diffusion Time Divisions → 20
Finally click Run

Step 4: Review and Interpret the Results

Once the model finishes running, the output panel will display a 3D visualization of the predicted binding poses between Nirmatrelvir and the Mpro protein. You can explore each pose ranked by predicted binding quality indicating that the model has successfully identified a biologically meaningful binding pocket.

To better interpret the docking results, DiffDock provides a ranked list of predicted binding poses, each scored based on a model-inferred energy estimate. Higher scores suggest less stable or less likely binding configurations, while lower scores (more negative values) typically correspond to more favorable interactions. Users can visually explore how different poses are distributed in the binding site, and focus on the top-ranking predictions. When multiple poses cluster around the known catalytic cleft, as is the case with the SARS-CoV-2 Mpro and Nirmatrelvir example, it strongly suggests that the model has correctly identified a biologically relevant binding mode.

This result is especially powerful when compared to the experimentally determined structure (PDB: 7VH8), which shows Nirmatrelvir occupying the same catalytic cleft. Although DiffDock makes its predictions without prior knowledge of the actual complex, it’s often able to reproduce the correct pose within a few angstroms of RMSD (Root Mean Square Deviation), comparable to crystal level accuracy.

This example illustrates how DiffDock can be used to simulate drug target interactions, opening the door to rapid hypothesis testing, virtual screening, and early stage validation without requiring traditional docking software or expert intervention. It's a glimpse into how AI can speed up tasks that previously required days or weeks of computational chemistry or crystallographic modeling.

Looking Ahead

BioNeMo is continuously evolving. Expect more multi modal foundation models, larger protein LLMs, and deeper integration with lab automation and real-time discovery systems. As enterprise biotech embraces AI-native pipelines, BioNeMo is positioned to become the computational backbone of modern R&D, bridging molecular design, biological insight, and scalable innovation.

If you're building the future of molecular discovery, BioNeMo offers a powerful, flexible, and production-ready platform for applying generative AI in biology. Now is the time to explore what’s possible, and scale what works.

At Marvik, we work with biotech companies and life sciences organizations to operationalize AI, from first experiments to fully integrated discovery platforms. Our team combines deep AI engineering expertise with real world biotech experience, helping companies deploy solutions built on technologies like BioNeMo, AlphaFold, and large molecular models.

References

¹LIMS (Laboratory Information Management System) and ELN (Electronic Lab Notebook) are widely used platforms in biotech and pharma. LIMS is focused on managing samples, workflows, and compliance across laboratory operations, while ELN is designed to document experimental procedures, results, and scientific insights in a searchable, digital format. Together, they enable structured, traceable, and reproducible research, making them ideal environments to integrate AI-powered tools like BioNeMo.

Accelerating Molecular Design with AI: BioNeMo at the Frontier of Biotech

Introduction

From Language Models to Molecules

End-to-End Workflows and Blueprints

Real World Use Cases in Industry

Hands-On: DiffDock

Step 1: Access the DiffDock Web UI

Step 2: Prepare Input Files

Step 3: Upload Files to the DiffDock Interface

Step 4: Review and Interpret the Results

Looking Ahead

References

News, Insights & Impact

Model Context Protocol: Supercharge your Agents with MCP

Exploring – Nvidia CuOpt

Exploring NVIDIA Isaac GR00T

Genesis: Redefining Robotics and Physics Simulations

Every AI journey starts with a conversation