Archive for September, 2012

ENCODE: Our Guide to the Human Genome

Wednesday, September 26th, 2012

Early in 2001, the Human Genome Project gave us a complete read out of our DNA. The elucidation of the human DNA sequence was important to give us all the instructions to make a human being; however we are just starting to realize how the instructions are indeed incomplete. Researchers were able able to uncover 3 billion letters of the DNA molecule, but just roughly 2% (around 25,000 protein-coding genes) corresponded to the building blocks (or proteins) of the cells. Based on that, many biologists suspected that the information responsible for the complexity in human cells could be somewhere in the “deserts” between the protein-coding genes. ENCODE, which stands for Encyclopedia of DNA Elements (for more information see the article “The Human Encyclopaedia” on Nature) is a project that started in 2003 with a massive data-collection effort uniting several laboratories all over the world. The main objective was to understand the deserts between protein-coding genes, catalogue the “functional” parts of the DNA sequence and understand their regulation. In summary, the objective of the ENCODE Project was to show if the rest of the genome, more specifically the non-coding areas, were doing something important inside the cells. An interesting fact from ENCODE is that it was a consortia created between groups that are generally competitors. These groups generated an incredible amount of information that was collected, stored and analyzed. Indeed, science today is increasingly “social”, especially in fields such as genomics in which huge amounts of data are generated. In such projects, collaboration between groups is key. This project was only possible because of collaboration. It was also a good training for researchers in big scientific projects that will be more common from now on. In these projects, tons of data are generated, stored, transferred and analyzed. After almost 10 years of intensive data analysis, researchers involved in this Project published their results in 30 papers across three different journals. According to ENCODE’ s main conclusions, more than 80 percent of our genome has a “biochemical function”. These regions were classified as “junk” for a long time, but ENCODE is showing that they are the opposite. Tom Gingeras, one of the study main scientists, declared that “Almost every single nucleotide is associated with a function of some sort or another in the genome” (see more in the DISCOVER Magazine), reinforcing the idea of functionality for most genome. And what about the remaining 20 percent of human’s DNA? Researchers believe that these remaining regions are not “junk” either. ENCODE looked at hundreds of cell types, but the human body has thousands. A given part of the genome might control a functional element in one cell type, but not in others, and the complexity of information could be even higher. Again, the researchers claim that ENCODE has one important implication, which is to redefine what is a “gene”. This new study has changed the view of the genome as we knew it since the functional elements have lots of overlap. And since we are the most complex organism out there, it is not surprising that the results are the same way. The new definition for a “gene” suggests that it is a collection of transcripts, united by a common factor, with a function that could be either in the genome itself or in biochemical reactions within a cell. Human genome research is far from finished, and this could go on for decades (if not forever…). For those who though that the elucidation of the sequence of the human DNA was enough to understand a human being, a big lesson has been learned. The complexity of the information from ENCODE will probably need another decade to be fully understood. In fact, ENCODE is just the start for a big journey inside human cell’s DNA. We are just beginning to build a guide for our genome (Image Source: Nature Journal).