Metagenomi Turns to AI and AWS to Speed Its Gene Editing Research November 4, 2025 by Jaime Hampton
(angellodeco/Shutterstock)
Biotech startup Metagenomi is harnessing artificial intelligence models to expand its gene editing toolkit beyond the limits of first-generation CRISPR systems. The company applies machine learning to metagenomic data, or DNA collected from environmental samples, to uncover and engineer new enzymes that could expand the reach of genetic medicine.
To support this work, Metagenomi has built its discovery platform entirely on AWS. The collaboration has evolved from basic data processing to large-scale AI experimentation, with cloud infrastructure that allows the company to search, compare, and generate billions of protein sequences. By training AI models on natural diversity and running them on AWS Inferentia2 accelerators, Metagenomi can now design compact, precise gene editors that were once too expensive or computationally intensive to explore.
For Chris Brown, Head of Discovery at Metagenomi, the partnership has blurred the line between compute and discovery. What began as a way to manage data has become a core part of the company’s scientific process. AIwire recently sat down with him to learn more.
AIwire: What is this story about? Is it about compute or discovery? How did this solve some of the pain points in your research?
Brown: What Metagenomi is doing is developing new gene editing tools that we can develop into curative genetic medicines. And for us, that means innovating beyond the first-generation CRISPR gene editing systems by taking a metagenomics approach.
AIwire: What does that involve?
Brown: What that means is we go out to the natural environment, and we collect really small samples to get DNA sequences. And by reconstructing these DNA sequences and genomes from the environment, we get access to new biology. That is the starting point for developing these gene editing tools. Over the past seven years, Metagenomi has been doing this. What sets us apart in the gene editing field is that, we see how a lot of companies focus on one type of technology or one type of gene edit, but because we can source so many of these diverse systems from nature, we can create a really diverse gene editing toolbox that we believe will be important for treating a diverse set of genetic diseases.
The simple explanation of that is, there are a lot of ways in which mutations can cause disease, and you need a lot of different specialized tools to be able to address those variations. Our process then involves going to nature and finding these systems that already exist but also doing some engineering and development. More recently, we're using both traditional protein engineering as well as AI to learn from the sequences and proteins that exist in nature to help us to generate new tools.
AIwire: How does AWS fit into the story?
Brown: The AWS part of it has three different pillars. One is that Metagenomi has been around for about seven years. We've built our metagenomics discovery platform on the AWS cloud, and we've been using Amazon since the very beginning to power our metagenomics workloads. But what we're highlighting more recently are advancements upon that, where we're now able to take the data that we've amassed over many, many years to begin to apply AI to that in different ways. And there are two different projects we recently did with Amazon, including one where we created new infrastructure that allows us to very quickly search through our database in ways we couldn't before previously. That's based on protein structural information that we use AI to generate embeddings, which are vector representations of those proteins, which we can search quickly. The second is generating entirely new sequences. And that's the theme I mentioned before, where we can train AI on the vast amounts of data we have about natural proteins to create new ones.
And that's how we generated over a million of these compact gene editing systems, and that was very much enabled by access to Amazon chips. This was a project that we would have probably done it, or at least done it once, but would have been fairly cost-prohibitive to do at scale. Access to the Amazon chips has allowed us to take something that would be big research undertaking and turn it into something that my team can do on a regular basis. They can do it multiple times a day or week. And that's really driving innovation on the science side of our team.
(mechichi/Shutterstock)
Brown went on to explain how the scale of Metagenomi’s discovery work has created new technical demands. The company’s protein database now holds roughly 15 billion entries, a number too large to search or analyze efficiently without custom infrastructure. Working with AWS engineers, Metagenomi developed a system to embed each protein into a searchable vector format using AI models, then store and retrieve those representations through LanceDB on Amazon S3. The collaboration helped the team identify the most cost-effective way to batch these jobs across AWS instances, turning what would have been a one-off, resource-heavy task into a repeatable part of daily research. This marked the point where compute stopped being a background utility and became an active enabler of discovery.
AIwire: Let's talk about the instances that you chose, the EC2 instances. What makes these the best option for these projects?
Brown: The EC2 is very flexible. It allows us to spin up computers with exactly the specifications that we need and for the kind of research we do. That's important because we're always testing new algorithms on new datasets. If we invested in that infrastructure internally, we would quickly outgrow it or want to do something else. So it allows us, from a research perspective, to be really flexible. In addition to that, we run a lot of the processes we do on Batch and utilize Spot Instances. We can schedule around the timing and availability of inexpensive Spot Instances and still get the data we need in a timely fashion. We were able to get some real advantages by putting thousands of jobs on Batch, scheduling them with Spot Instances, and being able to make use of the cheaper times and instance types to get our analysis done.
Brown’s description of the team’s workflow shows how flexibility and cost control have become central to scientific computing. By running experiments on AWS Batch and EC2 Spot Instances, Metagenomi can schedule thousands of jobs at a time, making use of low-cost windows without interrupting research.
At this point, Kamran Khan, head of business development at AWS, joined the conversation to explain how AWS’s newer accelerator chips fit into that model.
Khan: One of the things that is very exciting with this partnership is not only the ability for their team to utilize AI to accelerate discovery and exploration on the genome and protein sequencing side of things, but it's the access and the economy that we're able to provide with Inferentia. Metagenomi can process and analyze millions to billions of these proteins at a fraction of the time and cost. And that opens the door to new possibilities.
He added that AWS introduced its first Inferentia chips in 2019 and that Metagenomi is now using the second generation, Inferentia2, which powers its EC2 instances. Khan said the goal of these accelerators is to make AI workloads faster and more affordable across research and industry. By reducing the time and cost of large-scale protein analysis, he said, AWS enables researchers to test more hypotheses and explore new directions that once would have been too expensive or time-consuming to pursue. This, he noted, is part of AWS’s larger effort to democratize AI by making the same infrastructure available to research labs and smaller companies as major technology customers.
Khan: We work with a lot of very large customers spending billions on their infrastructure. However, I think these projects are equally exciting to us, because they are going to have real impacts to human lives all over the world. We want to make it accessible to everyone, and that’s one of the goals we set out for with Inferentia. It has to be easily accessible, it has to be inexpensive, and it has to meet the needs of researchers and developers around the world.
While Khan emphasized accessibility at the infrastructure level, Brown said the effects are clearest inside the lab, where these AI and compute resources have reshaped how his team designs and tests new enzymes.
AIwire: Chris, how has access to faster, more affordable compute changed your workflow, and what kinds of discoveries has it enabled?
Brown: As we're discussing our process of finding the right system from nature or generating a new enzyme using the generative AI models, there's a lot of iteration and optimization, and that's the important part of being able to do these types of operations more quickly. We take a project we would do once, turn it into something we can do routinely, and it allows us to explore the design space a lot more thoroughly. And if we can cut the cost by 50%, we can double the size of the net that we cast over the protein design space, potentially getting us to the right system. What we've found over the years in doing these types of research projects is that it's not always just about incremental gains and doing something over and over again. It's actually about looking at a broad enough landscape, a broad enough set of design, where you get to the right system that's going to have the biggest impact on the technology and on patients.
Brown said the impact of these AI and compute capabilities extends across Metagenomi’s entire gene editing platform rather than any single discovery. While the company’s lead program still uses natural CRISPR nucleases identified several years ago, newer projects are advancing more complex forms of gene editing, such as precise genomic corrections and large DNA integrations, using AI-driven search and design tools. These methods, he explained, are helping Metagenomi develop compact gene editing systems that work with existing delivery technologies, potentially expanding treatment to more tissues and more patients. In the long term, he said, the goal is to enable safe replacement of entire genes rather than targeting thousands of individual mutations, a shift that could make gene editing therapies viable for larger patient populations.
Brown: There might be a thousand different mutations in a single gene that cause one disease across a patient population, and no one can afford to make a thousand different drugs, but if you can make one that integrates that large gene safely, it can have an impact on more patients. So we're using these tools to optimize those types of technology.
The collaboration between Metagenomi and AWS has been a two-way exchange. While access to AI accelerators has advanced Metagenomi’s research, it has also given AWS insight into how scientific workloads differ from the LLMs that dominate much of today’s AI infrastructure discussion.
AIwire: Kamran, this sounds like an iterative process between the both of you. What have you learned from working with companies like Metagenomi to provide the best tools for scientists and innovators?
Khan: This is one of the most exciting parts of these collaborations. Our largest customers today are deploying LLMs. As an example, Anthropic is utilizing Trainium2 to deploy a lot of their Claude models. These LLMs have a particular structure, and they have a lot of similarities. But working with Chris and Metagenomi, we want to have a robust software stack, which is our Neuron software stack that serves all of these different workloads, in an environment where researchers and developers can come together. This is one of the things that we can take away from this relationship, an understanding of these types of workloads that are different than the large-scale LLMs, and where they leverage a lot of those same capabilities. Now our team can work with Chris and his team to optimize these libraries as well, to make them as efficient for protein synthesis as they are for deploying LLMs.
(Gorodenkoff/Shutterstock)
Khan added that AWS is continuing to broaden support for emerging AI architectures, including non-transformer models like Mamba and other solid-state designs. He said collaborations with research partners help the company tune its Neuron software stack to handle these varied workloads efficiently on both Inferentia and Trainium chips. To give developers deeper control, AWS has introduced a new Neuron Kernel Interface that allows direct access to the hardware for customized optimization. According to Khan, these advances are aimed at expanding the flexibility and scientific reach of AWS’s accelerator ecosystem.
As the conversation turned toward the future, Brown reflected on how AI is reshaping the practice of science itself.
AIwire: Do you think AI is going to completely change the way science is conducted? Will it be as much of a boon to science as some are currently predicting?
Brown: What we've seen at Metagenomi, and what other biotechs highlight, is that [AI] has an impact at essentially every stage in development. So that could mean that it's a research assistant that we can use to quickly come up with a search strategy to get to the right kind of enzyme, because it's able to look through the literature and propose different options for doing that, very much in a copilot type of capacity.
Brown went on to say that this copilot model means AI functions as a trusted assistant rather than replacing scientists, helping them search, design, and iterate more effectively. He added that the long-term goal is to reach a point where AI systems can predict entire drug products with confidence, identifying the right protein or molecule and anticipating how it will perform in safety studies. That level of predictive accuracy, he said, will take time to achieve. But for now, AI is making each stage of discovery faster and more reliable, multiplying the impact of human expertise rather than replacing it.
“It doesn't mean that drug development looks totally different,” he said. “It's not turning it upside down, but it is improving all of the stages.”
Editor's note: This interview has been lightly edited and condensed for clarity and flow.

