Project 461550

Adapting genetic clustering techniques to SARS-CoV-2

461550

Adapting genetic clustering techniques to SARS-CoV-2

$596,700
Project Information
Study Type: Unclear
Research Theme: Not applicable / Specified
Institution & Funding
Principal Investigator(s): Poon, Arthur F
Institution: University of Western Ontario
CIHR Institute: Infection and Immunity
Program: Project Grant
Peer Review Committee: Genomics: Systems and computational biology
Competition Year: 2022
Term: 5 yrs 0 mth
Abstract Summary

One of the positive outcomes of the ongoing SARS-CoV-2 pandemic has been the rapid collection and sharing of virus genomc data. Today, there are over 9 million SARS-CoV-2 genomes from around the world in public databases. This abundance of data has also created tremendous new challenges for genomic epidemiology - the use of genetic sequences to reconstruct the spread and adaptation of an infectious disease. The purpose of this project is to contribute to the global effort to update the computational toolkit for genomic epidemiology for the SARS-CoV-2 pandemic by focusing on clustering methods. Genetic clustering is a fundamental category of methods for analyzing sequences where we collect similar observations into groups. Clusters are intuitive and have a broad range of applications. For the study and management of infectious diseases, for example, we use clusters to detect outbreaks, to find associations between risk factors and the spread of disease, and to reconstruct how different infections are related back in time. Clusters are also a useful device for reducing large data sets while preserving the essential information. Many of the standard clustering methods used for infectious disease were developed and honed on HIV-1 sequences, not only because of the enormous global health burden of this disease, but also because these data are abundant around the world. Our specific objectives are to: (1) adapt methods from network science to partition large databases of SARS-CoV-2 genomes into clusters that are calibrated to measure the impact of age, location and other risk factors on transmission rates; (2) develop fast, approximate methods to extract epidemiological information, such as the number of unsampled infections, from cluster-based trees updated in real time; and (3) to adapt a method from dynamic social network analysis to reconstruct the role of recombination (the exchange of fragments between genomes) in the evolutionary history of coronaviruses.

No special research characteristics identified

This project does not include any of the advanced research characteristics tracked in our database.

Keywords
Bioinformatics Data Visualization Molecular Epidemiology Phylodynamics Sars-Cov-2 Unsupervised Clustering Virus Evolution