World's first genetic search engine, world record in Zurich, Switzerland

Zurich, Switzerland--The new tool called
MetaGraph, developed by scientists at
ETH Zurich, allows rapid searches across vast genetic databases by compressing and indexing genetic data, setting the world record for the
World's first genetic search engine, according to the WORLD RECORD ACADEMY.

"The world's first and fastest genetic search engine is a new tool called MetaGraph, developed by scientists at ETH Zurich.
"It allows rapid searches across vast genetic databases by compressing and indexing genetic data, and is nicknamed the "Google for DNA" because of its ability to search through millions of DNA and RNA sequences in seconds." (AI Overview)

How it works
- Full-text search: Instead of relying on slow downloads or incomplete metadata, MetaGraph performs a full-text search directly on raw DNA and RNA sequences.
- Data compression: It compresses enormous genetic data (nearly 100 petabytes) into a manageable index using complex mathematical graphs, making it possible for the entire dataset to fit on a few computer hard drives.
- Speed: This compression allows for searches that used to take vast amounts of computing time and resources to be completed in seconds.
What it means for research
- Accelerated discovery: The tool can significantly speed up research in areas like disease, antibiotic resistance, and rare genetic conditions.
- New possibilities: It could make it possible for individuals to search genetic databases in the future, as Nature notes.

"A team of researchers at ETH Zurich has developed MetaGraph, a groundbreaking tool that allows scientists to search through vast public DNA and RNA databases in seconds, earning it the nickname “Google for DNA”, The Times of India reports.
"With global repositories now holding nearly 100 petabytes of genetic data, equivalent to the total text on the internet, traditional methods of downloading and analysing sequences have become slow and resource-intensive.
"MetaGraph compresses this enormous volume of data into a searchable, full-text index, enabling rapid identification of sequences across millions of datasets. This innovation could accelerate research into pathogens, antibiotic resistance and rare genetic conditions."

"MetaGraph indexes and compresses vast collections of DNA, RNA, and protein sequences—representing petabases of data—and lets you query them instantly to turn big data into biological insight," the MetaGraph's official website says.
"MetaGraph unlocks the world’s sequencing archives, transforming data from sources like SRA and ENA into a unified, searchable landscape. Identify whether a sequence has been observed before and trace its biological context.
"MetaGraph supports both exact and inexact sequence searches, pairing precision with flexibility. Each result is annotated with sample and metadata context, and the open-source framework is ready to run on your own data."

"Freedom and individual responsibility, entrepreneurial spirit and open-mindedness: ETH Zurich stands on a bedrock of true Swiss values. Our university for science and technology dates back to the year 1855, when the founders of modern-day Switzerland created it as a centre of innovation and knowledge," the
ETH Zürich says.
"At ETH Zurich, students discover an ideal environment for independent thinking, researchers a climate which inspires top performance. Situated in the heart of Europe, yet forging connections all over the world, ETH Zurich is pioneering effective solutions to the global challenges of today and tomorrow.
"Basic research is the foundation for the success of ETH as a whole and particularly for mission-oriented research targeted at solving global challenges. Interdisciplinary collaboration will be strengthened in all strategic action areas, thereby promoting the translation of research results into real-world applications and teaching."

"The MetaGraph tool can search through millions of published DNA, RNA and protein records in a matter of seconds. Developed by SIB scientists at ETH Zurich, the tool overcomes current limitations in analyzing vast volumes of biological sequencing data – which will significantly accelerate life-science research and biomedical innovation. This important milestone in computational genomics was published in Nature," the SIB Swiss Institute of Bioinformatics reports.
"Over 100 million gigabytes (100 petabytes) of DNA, RNA and protein sequences are stored in public databases around the world – about as much as all the text on the internet.
"This vast collection of data is a treasure trove for research into disease treatments, ecology, new biotechnologies, and more. However, accessing and analysing data at this scale poses a major challenge. Current methods are often slow, require massive computing power and other resources, and lack scalability for high-throughput searches."
"It’s a kind of Google for DNA,” as Professor Gunnar Rätsch, data scientist at the Department of Computer Science at ETH Zurich summarizes," the SciTechDaily.com reports.
"Until now, researchers had to search the databases for descriptive metadata. In order to access the raw data, they had to download the respective data sets. These searches were incomplete, time-consuming and expensive.
“MetaGraph“ is comparatively favorable in terms of costs, as the researchers state in their study. The representation of all public biological sequences would fit on a few computer hard drives, while larger queries should cost no more than 0.74 dollars per megabase."
"The Internet has Google. Now biology has MetaGraph. Detailed today in Nature1, the search engine can quickly sift through the staggering volumes of biological data housed in public repositories," the
Nature reports.
“It’s a huge achievement,” says Rayan Chikhi, a biocomputing researcher at the Pasteur Institute in Paris. “They set a new standard” for analysing raw biological data — including DNA, RNA and protein sequences — from databases that can contain millions of billions of DNA letters, amounting to ‘petabases’ of information, more entries than all the webpages in Google’s vast index."
"Researchers at ETH Zurich have developed a revolutionary digital tool that allows scientists to search through the world’s genetic data archives as easily as a web search on Google," the the ICT&health reports.
"The system, called MetaGraph, enables rapid full-text searches of global DNA and RNA databases. It's an innovation that could accelerate biomedical research and the development of new treatments.
"For decades, DNA sequencing has driven major advances in medicine, from identifying rare hereditary diseases to decoding the SARS-CoV-2 genome. Yet the enormous volume of genomic data, over 100 petabytes stored in international archives such as the Sequence Read Archive (SRA) and European Nucleotide Archive (ENA), has made efficient searching virtually impossible. Until now, researchers needed vast computing power and time-consuming downloads to analyze data."
In brief
- "MetaGraph", a new ETH tool enables fast searching of DNA sequences – efficiently, accurately and at favorable costs.
- In order to achieve this, the researchers use indices enabling better structuring of large data volumes, making them easy to search.
- As an open-source tool, MetaGraph is freely accessible, offering a wide range of potential applications.
"In the study published on 8 October in the journal Nature, the ETH researchers demonstrate how MetaGraph works: the tool indexes the data and presents it in compressed form," the Eidgenössische Technische Hochschule Zürich says.
"This is achieved by way of complex mathematical graphs that improve the structure of the data – similar to spreadsheet programmes such as Excel. "Mathematically speaking, it is a huge matrix with millions of columns and trillions of rows," as Rätsch states.
"The idea of rendering large amounts of data searchable with the help of indexes is standard practice in computer science research. What is new about the work of the ETH researchers, however, is the complex linking of raw data and metadata and the compression by a factor of about 300, similar to a book summary: it no longer contains every word, but all the main storylines and connections remain intact – more compact, yet without any relevant loss of information."