Scientists have developed a machine learning algorithm that may help find the original hosts of viruses. It is hoped that the new tool could help inform preventive measures against deadly diseases.
The new research, led by the University of Glasgow, uses a new algorithm designed to use viral genome sequences to predict the likely natural host for a broad spectrum of RNA viruses – the viral group that most often jumps from animals to humans.
The paper, co-authored by Simon Babayan, Richard Orton and Daniel Streicker, ‘Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes’ is published in Science.
Accelerating virus analysis
Finding the source of viruses from genome sequences can take years of intensive field research and laboratory work. These delays can make it difficult to implement preventive measures such as vaccinating the animal sources of disease or preventing dangerous contact between species.
Researchers studied the genomes of over 500 viruses to train machine learning algorithms to match patterns embedded in the viral genomes to their animal origins.
These models were able to accurately predict which animal reservoir host each virus came from, whether the virus required the bite of a blood-feeding vector and, if so, whether the vector is a tick, mosquito, midge, or sandfly.
Next, researchers applied the models to viruses for which the hosts and vectors are not yet known, such as Crimean Congo Hemorrhagic Fever, Zika and MERS. The model-predicted hosts often confirmed the current best guesses in each field.
The research found that two of the four species of Ebola which were presumed to have a bat reservoir actually had equal or stronger support as primate viruses, which could point to a non-human primate, rather than a bat, source of some Ebola outbreaks.
Earlier interventions
Dr Daniel Streicker, the senior author of the study from the MRC-University of Glasgow Centre for Virus Research and Institute of Biodiversity, Animal Health and Comprative Medicine, said:
“Genome sequences are just about the first piece of information available when viruses emerge, but until now they have mostly been used to identify viruses and study their spread.
“Being able to use those genomes to predict the natural ecology of viruses means we can rapidly narrow the search for their animal reservoirs and vectors, which ultimately means earlier interventions that might prevent viruses from emerging all together or stop their early spread.”
The researchers are now developing a web application that will allow scientists from anywhere in the world to submit their virus sequences and get rapid predictions for reservoir hosts, vectors and transmission routes.
Internet of Business says
AI is fast gaining traction in healthcare, particularly in diagnostics, where the technology is able to spot patterns and anomalies in medical scans that a human healthcare professional might miss.
Yet, using machine learning to predict virus hosts, based on evolutionary signatures in their genomes, is also showing clear potential – taking healthcare AI use cases outside the hospital.
The prospect of more quickly identifying a viruses source could play a key part in quickly combating the rapid spread of viruses by better understanding their origins.
The vast quantities of data associated for DNA and RNA make them candidates for AI training and analysis. We expect to see further advancements into the use of AI in DNA research in the near future, and with it the ethical questions that greater insight into DNA will undoubtedly raise.