2

Bytes and Bio: Big Data in Biomedical Research


[Music] Hi, I’m Adam A graduate student in the University of Rochester School of Medicine and Dentistry I’m going to talk a bit about the relationship between biomedical research and something you’ve probably been hearing a lot about lately- Big Data How much data is enough to count as big data? I’d say at least enough to have trouble figuring out what to do with it all! Biologists having been dealing with big data since before it was cool. The Human Genome Project, which started in 1990 and completed in 2003 generated an enormous amount of sequence information for the time. That meant specialized software and systems to process and store it all. That synergy between computer science and biological research Led to the development of the fields of computational biology and bioinformatics. The cost of completing the first complete Human genome was about $1 per base- The bases are each of the As, Ts, Cs, and Gs that comprise DNA- and the Human genome has about 3 billion of them. In 2017, that cost per base is less than one millionth of a cent- This means that sequencing has become routine for more than just genomes, and in more than just Humans. We can now use sequencers to look at patterns of gene expression and regulation mmicrobiomes, and much more, to get a better understanding of the fundamental biology and mechanisms of disease, in nearly any organism. At current trends, by the year 2020, the total annual output of all the sequencers in the world might reach 1 exabase. That’s one with eighteen zeroes. That’s not just big data, that’s huge data! Luckily, researchers are already at work to make sure we’ll be able to handle it all. Biology is enormously complex, and although it might take a while to figure out what all that data means the more information we have, the clearer a picture we’ll be able to draw. At universities and research centers in the US and around the world, easier access to resources such as high-throughput sequencers, high-performance computer clusters, and repositories to share and access large datasets means we can work on solving bigger biology problems faster. [music]

Glenn Chapman

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *