Coronaviruses are not new. They are a large family of viruses that can cause illnesses in both humans and other animals like bats, camels, and civets. Occasionally the coronaviruses that infect animals can evolve to infect humans, which was the case with this novel coronavirus Covid-19.
Covid-19 has never been identified in humans before. The first human case was seen in Wuhan, Hubei Province, China. Since then, it has spread from China to at least 170 confirmed countries around the world. Scientists speculate that the virus can be traced back to the seafood and live animal markets in Wuhan and the direct consumption of a bat or an animal that was infected by a bat. But how can scientists be so sure of the origin of the virus and what animal likely spread it to humans?
Much like how we can create a family tree using our own DNA, scientists can create family trees for viruses using RNA. Sequencing the genomes of viruses is the best and quickest way to track how quickly it’s evolving and where it originated. The Covid-19 genome was sequenced and published by Christian Drosten on the 28th of February. Since then, mutations in the virus have been documented in real-time using the viral results of everyone who has tested positive for the virus so far. Of course, it’s important to note that with a lack of testing available and the possibility of infected persons showing no symptoms, not everyone with the virus has been tested and therefore not every iteration of the virus has been sequenced. This big chunk of missing information means that the data is incomplete and tricky to analyze, but the information that is available is still extremely valuable.
For example, through the genome sequencing of Covid-19, scientists have been able to determine that with a 30,000 base pair genome, the virus only acquires around two mutations per month. This is a relatively low number of mutations meaning that the virus likely won’t mutate too quickly for an effective vaccine to be created and won’t mutate into a deadlier version. From this mutation rate, it’s also been determined that the virus came from a single source that was infected around mid-November or early December. Using genome sequencing like this to determine the start date of a viral infection also helps scientists and virologists better understand its infection rate and incubation period.
With enough data, it’s even possible to determine exactly how the virus spreads. When Covid-19 was first identified in Seattle, it was unclear where the virus came from. Then five weeks later, that second case was identified and it was unsure whether those testing positive were simply sick because they had traveled overseas or if someone in their own community had infected them. But because the viral genome sequenced from the second positive case in Washington state was so similar to the first case earlier in the month, it was best to assume that the virus had been spreading through the community rather than arriving from an outside location multiple times. Knowing this type of information helps inform policy and action and is part of the reason why so many of us are currently sheltering in place.
The mapping of viruses is not a new science. In fact, this technique has been used for over a decade now and has tracked the spread of everything from Ebola to SARS. If you’re interested in learning more about the family tree and tracking of the Covid-19 virus, you can watch the data in real-time via NextStrain.org. It’s a great way to spend your time while stuck indoors! And of course, please continue to wash your hands, practice social distancing, and follow the guidelines set in place by your local officials. We hope you’re staying as safe and healthy as possible in these unprecedented times.