Accessing and using ENCODE data

Published on November 5, 2013 Archived on April 13, 2022   44 min

Other Talks in the Series: Epigenetics, Chromatin, Transcription and Cancer

Hello. I am Peggy Farnham, the William M Keck Professor of Biochemistry and the Associate Dean of Graduate Studies at the Keck School of Medicine at the University of Southern California. I previously held professorships at the University of Wisconsin in Madison, Wisconsin, and at the University of California in Davis, California, where I was also the Associate Director of the UC Davis Genome Center. I'm a member of the ENCODE Consortium. I would like to tell you about how the data produced by this consortium can aid in understanding gene regulation.
The Human Genome Project was launched in 1990 with the specific goal of identifying all the human genes. This led to a push for sequencing the entire genome, which began in 1996, with a draft version of the human genome, published in 2001. One of the first big surprises that came from this era of human genomics had to do with how many genes we have. Based on the fact that genomic sequencing of a worm which has only 959 cells and 1 x 10^8 nucleotides of DNA, they identified approximately 20,000 genes. It was assumed that humans would have approximately 150,000 genes. After all, we are much more complicated than a microscopic worm. However, after the human genome was sequenced, the results suggested that we might have at most 30,000 genes. Refinement of the analyses have since reduced this number to 20,000. In other words, we do have the same number of genes as a worm. In fact, only five percent of our genome is covered by exons, that is DNA segments that encode proteins. This realization led to the question, if 95 percent of the genome is not involved encoding for proteins, then what does it do? To address this question,