FeedFeed2CommentsDeliciousDiggFacebookTwitter
Leave a comment.
Campaigns
Sign the petition to remove the umbrella use of the term 'transgender' to cover women of transsexual / intersex history.
Petition: remove women of transsexual / intersex history from the GLAAD Media Reference Guide.
[ link ] Also read Andrea Rosenfield's call for reform here at TS-Si.[ link ]
TS-Si supports open access to publicly funded research.
TS-Si supports open and immediate access to publicly funded research.
xkcd


is dedicated to the acceptance, medical
treatment, and legal
protection of individuals correcting the misalignment
of their brains and their anatomical sex, while supporting their transition
into society as hormonally reconstituted and surgically corrected citizens.
Encyclopedia of DNA Elements (ENCODE) Released Print E-mail
SciMed - Genetics & Genome
TS-Si News Service   
Monday, 25 April 2011 09:00
Bethesda, MD, USA. The Encyclopedia Of DNA Elements (ENCODE), an extensive catalog of the functional elements within the human genome — genes, RNA transcripts, and other products — is now available as an open resource to the scientific community, classrooms, and the public.

The ENCODE project has published a paper in PLoS Biology that provides an overview of ongoing efforts to interpret the human genome sequence, as well as a guide for using the vast amounts of data and resources produced so far.


Ross Hardison, the T. Ming Chu Professor of Biochemistry and Molecular Biology at Penn State University and one of the principal investigators of the ENCODE Project team, explained that the philosophy behind the project is one of scientific openness, transparency, and collaboration across sub-disciplines.



Encyclopedia of DNA Elements (ENCODE). ENCODE is a massive database cataloging many of the functional elements of the entire collection of human genes — the human genome.

Click Pic for Details
ENCODE follows on from the now-complete Human Genome Project (HGP) — a 13-year effort aimed at identifying all the approximately 20,000 to 25,000 genes in human DNA — which also was based on the belief in open-source data sharing to further scientific discovery and public understanding of science.

The ENCODE Project has accomplished this goal by publishing its ENCODE database, and by posting the ENCODE project tools to facilitate data use.

"ENCODE resources are already being used by scientists for discovery," Hardison said. "But what's kind of revolutionary is that they also are being used in classes to train students in all areas of biology. Our classes here at Penn State are using real data on genomic variation and function in classroom problem sets, shortly after the labs have generated them."

There are about 3-billion base pairs in the human genome, making the cataloging and interpretation of the information a monumental task. The project has a lofty goal: to identify the function of every nucleotide of the human genome. Hardison says "Not only are we discovering the genes that give information to cells and make proteins, but we also want to know what determines that the proteins are made in the right cells, and at the appropriate time. Finding the DNA elements that govern this regulated expression of genes is a major goal of ENCODE."

ENCODE's job is to identify the human genome's functional regions

The human DNA sequence often is described as a kind of language, but without a full understanding of the grammar, there is no key for interpretation.

ENCODE supplies data such as where proteins bind to DNA and where parts of DNA are augmented by additional chemical markers.

These proteins and chemical additions are keys to understanding how different cells within the human body interpret the DNA language.

The PLoS Biology paper shows how to use ENCODE data for interpreting associations between disease and DNA sequences that can vary from person to person — single nucleotide polymorphisms (SNPs).
For example, scientists know that DNA variants located upstream of a gene called MYC are associated with multiple cancers, but until recently the mechanism behind this association was a mystery. ENCODE data already have been used to confirm that the variants can change binding of certain proteins, leading to enhanced expression of the MYC gene and, therefore, to the development of cancer.

ENCODE also has made similar studies possible for thousands of other DNA variants that may be associated with a variety of birth conditions and susceptibility to human diseases.

Another of the principal investigators of the project, Richard Myers, president and director of the HudsonAlpha Institute for Biotechnology, explained that the ENCODE Project is unique because it requires collaboration from multiple people all over the world at the cutting edge of their fields. "People are working in a coordinated manner to figure out the function of our human genome," he said. "The importance of the project extends beyond basic knowledge of who and what we are as humans, and into an understanding of human health and disease."

Scientists with the ENCODE Project also are applying up to 20 different tests in 108 commonly used cell lines to compile important data. John Stamatoyannopoulos, an assistant professor of genome sciences and medicine at the University of Washington and another principal investigator, explained that the ENCODE Project has been responsible for producing many assays — molecular-biology procedures for measuring the activity of biochemical agents — that are now fundamental to biology.

"Widely used computational tools for processing and interpreting large-scale functional genomic data also have been developed by the project," Stamatoyannopoulos added. "The depth, quality, and diversity of the ENCODE data are unprecedented."

Hardison said that the portion of the human genome that actually codes for protein is about 1.1 percent. "That's still a lot of data," he said. To complicate matters even more, most mechanisms for gene expression and regulation lie outside the DNA coding region. Scientists have a limited number of tools with which to explore the genome, and one that has been used widely is inter-species comparison.

"For example, says Hardison, "we can compare humans and chimpanzees and glean some fascinating information, but very few proteins and other DNA products differ in any fundamental way between humans and chimps. The important difference between us and our close cousins lies in gene expression — the basic level at which genes give rise to traits such as eye color, height, and susceptibility to a particular disease.

ENCODE is helping to map the very proteins involved in gene regulation and gene expression. The PLoS Biology paper not only explains how to find the data, but it also explains how to apply the data to interpret the human genome.

FundingThe ENCODE Project is primarily funded by the National Human Genome Research Institute (NHGRI) of the U.S. National Institutes of Health (NIH).
CitationA User's Guide to the Encyclopedia of DNA Elements (ENCODE). The ENCODE Project Consortium. PLoS Biology 2011; 9(4): e1001046. doi:10.1371/journal.pbio.1001046
Download PDF
Abstract

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

Author Summary

The Encyclopedia of DNA Elements (ENCODE) Project was created to enable the scientific and medical communities to interpret the human genome sequence and to use it to understand human biology and improve health. The ENCODE Consortium, a large group of scientists from around the world, uses a variety of experimental methods to identify and describe the regions of the 3 billion base-pair human genome that are important for function. Using experimental, computational, and statistical analyses, we aimed to discover and describe genes, transcripts, and transcriptional regulatory regions, as well as DNA binding proteins that interact with regulatory regions in the genome, including transcription factors, different versions of histones and other markers, and DNA methylation patterns that define states of the genome in various cell types. The ENCODE Project has developed standards for each experiment type to ensure high-quality, reproducible data and novel algorithms to facilitate analysis. All data and derived results are made available through a freely accessible database. This article provides an overview of the complete project and the resources it is generating, as well as examples to illustrate the application of ENCODE data as a user's guide to facilitate the interpretation of the human genome.

Abbreviations: 3C, Chromosome Conformation Capture; API, application programming interface; CAGE, Cap-Analysis of Gene Expression; ChIP, chromatin immunoprecipitation; DCC, Data Coordination Center; DHS, DNaseI hypersensitive site; ENCODE, Encyclopedia of DNA Elements; EPO, Enredo, Pecan, Ortheus approach; FDR, false discovery rate; GEO, Gene Expression Omnibus; GWAS, genome-wide association studies; IDR, Irreproducible Discovery Rate; Methyl-seq, sequencing-based methylation determination assay; NHGRI, National Human Genome Research Institute; PASRs, promoter-associated short RNAs; PET, Paired-End diTag; RACE, Rapid Amplification of cDNA Ends; RNA Pol2, RNA polymerase 2; RBP, RNA-binding protein; RRBS, Reduced Representation Bisulfite Sequencing; SRA, Sequence Read Archive; TAS, trait/disease-associated SNP; TF, transcription factor; TSS, transcription start site.

TS-Si News ServiceThe TS-Si News Service is a collaborative effort by TS-Si.org editors, contributors, and corresponding institutions. The sources can include the cited individuals and organizations, as well as TS-Si.org staff contributions. Articles and news reports do not necessarily convey official positions of TS-Si, its partners, or affiliates.

We welcome your comments. Use the form below to leave a public comment or send private correspondence via the TS-Si Contact Page. We will not divulge any personal details or place you on a mailing list without your permission.


TS-Si is dedicated to the acceptance, medical treatment, and legal protection of individuals correcting the misalignment of their brains and their anatomical sex, while supporting their transition into society as hormonally reconstituted and surgically corrected citizens.

Comments (0)

Write comment
smaller | bigger

Last Updated on Monday, 25 April 2011 13:40
 
FeedFeed2CommentsDeliciousDiggFacebookTwitter