New Giant Protein Language Model

EvolutionaryScale, a frontier AI research lab, has unveiled ESM3, a groundbreaking generative AI model for protein design that simulates millions of years of evolution to create novel proteins, potentially revolutionizing fields from drug discovery to environmental sustainability.

ESM3 Key Features

ESM3 represents a significant leap in protein language models, boasting 98 billion parameters and training on an extensive dataset of 2.78 billion protein sequences. This third-generation model integrates sequence, structure, and function annotations as inputs, allowing for unprecedented accuracy in protein identification and validation. Trained with 1 trillion teraflops – more compute than any other known model in biology – ESM3 utilizes NVIDIA H100 GPUs and the Andromeda cluster for its robust computational infrastructure.

  • First generative model to simultaneously reason over protein sequence, structure, and function
  • Enables interactive prompting for protein creation
  • Trained on diverse organisms and biomes for comprehensive protein understanding
  • Capable of self-improvement based on feedback and designing proteins with specified functionalities

Green Fluorescent Protein Achievement

In a remarkable demonstration of its capabilities, ESM3 generated a novel Green Fluorescent Protein (GFP) variant, a feat that would typically require 500 million years of natural evolution. This new GFP exhibits a 58% dissimilarity to the nearest known fluorescent protein, replicating the bioluminescent characteristics observed in jellyfish and coral. The creation of this new GFP variant underscores ESM3’s potential to accelerate scientific research by simulating extensive evolutionary processes in a fraction of the time.

Applications in Science

The potential applications of ESM3 span a wide range of scientific fields, from drug discovery to materials science and environmental sustainability. In drug development, the model’s ability to generate novel proteins could accelerate the creation of new therapeutics and antibodies. ESM3’s capabilities also extend to environmental applications, with the potential to design proteins for carbon capture and create enzymes capable of breaking down harmful plastics. The model’s interactive nature allows researchers to prompt it for specific protein designs, acting as a virtual collaborator in solving complex biological challenges. This versatility positions ESM3 as a transformative tool in life sciences, offering the potential to revolutionize how scientists approach protein engineering and biological research.

Collaboration and Funding

Partnerships with industry leaders are key to ESM3’s accessibility and future development. Amazon Web Services (AWS) is collaborating to make the full ESM3 model family available to hundreds of thousands of researchers worldwide, including nine out of the top ten global pharmaceutical companies. NVIDIA is optimizing ESM3 for training and inference performance through NVIDIA BioNeMo NIMs, supported by the NVIDIA AI Enterprise software license. To fuel its ambitious goals, EvolutionaryScale secured over $142 million in seed funding from investors including Lux Capital, Amazon, NVentures (NVIDIA’s venture capital arm), Nat Friedman, and Daniel Gross. This substantial financial backing will drive the expansion and application of ESM3 across various scientific domains.

Source: Perplexity