Viruses are closely linked to human health and play a critical role in ecosystems. Despite significant progress in global RNA virus discovery through continuous sampling expansion and optimization of sequencing technologies, current academic identification of RNA viruses still relies heavily on known viral sequence homology.
An article published in Cell develops a deep learning model that integrates sequence and structural information to expand known RNA virus species by nearly 30 times, including viral dark matter that has eluded traditional research methods.
Fig. 1. Identification of highly divergent RNA viral dark matter using LucaProt modeling. (Hou, X.; et al. 2024)
Traditional RNA virus identification methods rely heavily on sequence homology, i.e., comparing the sequence similarity between unknown and known viruses. However, RNA viruses are so diverse and highly differentiated that it is difficult to capture dark matter viruses that lack homology or have very low homology, and the efficiency of new virus discovery is low. Therefore, new strategies need to be developed to efficiently recognize the diversity of RNA viruses.
In recent years, deep learning algorithms have shown great potential in the field of bioinformatics, including improving accuracy and performance, reducing the reliance on feature engineering, and providing flexible model architectures and self-learning capabilities. Based on this, the research team proposes to utilize artificial intelligence (AI)-based metagenomics to accurately and efficiently detect RNA viruses.
Current metagenomic tools cannot identify highly differentiated RNA viruses. This article provides an in-depth exploration and study of the global RNA virus circle by applying AI techniques, especially deep learning algorithms.
The researchers collected 10,437 samples from the NCBI SRA database and combined them with 50 samples collected in the study, covering a wide range of ecosystems. The research team developed a deep learning model called LucaProt, which combines sequence and predictive structure information, for identifying highly differentiated RNA-dependent RNA polymerase (RdRP) sequences from metagenomic samples from diverse ecosystems around the world. Some of the newly identified RNA viruses were validated by RT-PCR and RNA/DNA sequencing. In addition, the researchers evaluated the sensitivity and specificity of LucaProt by comparing it to four other virus discovery tools and validated the newly discovered viral supergroups by DNA and RNA sequencing using 50 samples.
By developing and applying the deep learning model LucaProt, this study not only greatly expands human understanding of global RNA virus diversity, but also provides new methods for future virus discovery and ecology research. This breakthrough marks a milestone for deep learning algorithms in the field of virus discovery, emphasizing the scale of the viral circle and providing computational tools to better document the global RNA virus.
Our company provides advanced AI platforms and collaborates with research organizations and pharmaceutical companies worldwide to contribute to global public health security and disease control. If you are interested in our services or have a question, please feel free to contact us for more details.
Original Article:
Hou, X.; et al. (2024). Using artificial intelligence to document the hidden RNA virosphere. Cell. 187(4), 1-14.
Related Services:
Target Identification
AI-powered Drug Discovery and Design