UCSC Genome Browser: Custom Annotation Tracks

- - - - - -

Custom Annotation Tracks

This page contains links to custom annotation tracks contributed by the UCSC Genome Bioinformatics group and by the research community. Click on a track to display it in the UCSC Genome Browser. Please check the Genome Browser standard track set for additional contributed annotation tracks.

Human Annotations
Mouse Annotations
Rat Annotations
Tetraodon Annotations
Zebrafish Annotations
Yeast Annotations
Multiple-Species Annotations

For information on how to create a custom annotation track, see Displaying Your Own Annotations in the Genome Browser. If you would like to submit your own custom tracks to this list, contact genome@soe.ucsc.edu.

Human Genome

Phased haplotypes of 'Max Planck One' (MP1) genome in hg18 as described in Suk et al. A comprehensively molecular haplotype-resolved genome of a European individual Genome Res 2011. RefSeq genes are shown in the first track for reference purposes. The second track shows the extent of each molecularly phased segment within the genome of MP1. The two haplotypes of MP1 are shown in two separate tracks (MP1_haplotype_1 and MP1_haplotype_2) and are colored by base. Phased indels are also included in these haplotypes. All SNPs from MP1 are shown in the fifth track (MP1_all_SNPs). These SNPs are annotated with their dbSNP rs numbers (or are annotated as novel). Non-synonymous SNPs are colored bright pink if they cause a potentially damaging mutation and dark pink if they are not predicted to be damaging. Thanks to the Max Planck Institute for Molecular Genetics for contributing these data.

DNA binding sites in hg18 for nuclear receptor HNF4alpha (NR2A1). The PBM track shows in vitro validated sites as determined by protein binding microarrays (PBMs) (number after sequence indicates relative binding score). The SVM track shows predicted sites by support vector machine (SVM) analysis (number after sequence indicates predicted relative binding score). For more information, see Bolotin E et al. in Integrated approach for the identification of human hepatocyte nuclear factor 4α target genes using protein binding microarrays. Hepatology. 2010 Feb;51(2):642-653. Thanks to the Sladek lab, University of California Riverside for contributing these data.

Transcribed ultraconserved regions (T-UCRs) reblatted to hg18. The first track shows intragenic T-UCRs (red); the second one displays intergenic T-UCRs (blue) (intragenic and intergenic relative to the RefSeq Genes track). For more information, see Mestdagh, P. et al. An integrative genomics screen uncovers ncRNA T-UCR functions in neuroblastoma tumours. Oncogene. 2010 Apr 12. [Epub ahead of print] Thanks to Erik Fredlund, Pieter Mestdagh, Filip Pattyn and Jo Vandesompele of the Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium for contributing these tracks.

Vervet monkey gene expression data (hg18) providing mean expression differences for 8-16 samples per tissue type (publication pending). See the UCLA vervet gene expression atlas project website for more information. Thanks to Dmitriy Skvortsov from the laboratory of Stanley F. Nelson in the Department of Human Genetics and Psychiatry at the David Geffen School of Medicine, UCLA, for contributing this track. The work is a collaboration with Zugen Chen, Barry Merriman, Lynn Fairbanks, Roger Woods, and Nelson Freimer.

Nucleosome Exclusion Prediction data sets (hg18) accompanying the paper Radwan A et al. Prediction and analysis of nucleosome exclusion regions in the human genome. BMC Genomics. 2008 Apr 22;9(1):186. View the Nucleosome Regions tracks to see the whole genome annotation for nucleosome exclusion regions. View the Nucleosome Scores tracks to see the nucleosome exclusion scores which were calculated individually for each nucleotide. This annotation was contributed by Ahmed Radwan, Akmal Younis, Peter Luykx, and Sawsan Khuri, at the University of Miami, Miami, FL, USA. Contact Sawsan Khuri at skhuri@med.miami.edu. Click on the chromosome you wish to display. Nucleosome Regions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M. Nucleosome Scores: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M.

Results of a genome-wide association study of bipolar disorder (hg17) published in Baum AE et al. A genome-wide association study implicates diacylglycerol kinase eta (DGKH) and several other genes in the etiology of bipolar disorder. Mol Psychiatry. 2007 May 8; [Epub ahead of print]. The track shows the results of a two-stage study performed using the Illumina HumanHap 550K chip. SNPs that replicated in both of two independent case-control samples are shown, filtered for p-value and odds ratio. Many thanks for this contribution to Amber Baum, Francis McMahon, and the Unit on the Genetic Basis of Mood and Anxiety Disorders, Mood and Anxiety Disorders Program, U.S. Department of Health and Human Services, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA, and the Central Institute for Mental Health, Mannheim, Germany.

Compare data from locus-specific databases with the genotypic and functional data in the Genome Browser using PhenCode, which consolidates variants from many curated locus-specific databases and one genome-wide database. Click here to access the PhenCode query page that lets you select and display a filtered set of locus variants data in the Genome Browser. Thanks to Belinda Giardine, Ross Hardison, Webb Miller, and Cathy Riemer at the Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, PA, USA, for contributing these data. DISCLAIMER: PhenCode is intended for research purposes only. Although the data are freely available to all, users should treat the reported mutations with extreme caution in clinical settings or for any diagnostic or population screening purpose. This information requires expertise to interpret properly; clinical diagnosis and/or treatment recommendations should be made only by medical professionals.

HOX microarray expression data from John Rinn et al. (hg18 and hg17), as described in the publication Rinn JL, et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell. 2007 Jun 29;129(7):1311-23. See the track description page for more information about the data methods and verification techniques used. Thanks to John Rinn in the Chang Lab at Stanford University for contributing these data.

Tracks showing increases and decreases in copy number variants across five hominoid species (human, bonobo, chimpanzee, gorilla, and orangutan) (hg17, hg16). We would like to thank the University of Colorado Health and Sciences Center, Dr. Jim Sikela, and Michael Cox for their contributions. For further insight, reference this paper: Fortna, A., Kim, Y., MacLaren, E., Marshall, K., Hahn, G., Meltesen, L., Brenton, M., Hink, R., Burgers, S., Hernandez-Boussard, T., Karimpour-Fard, A., Glueck, D., McGavran, L., Berry, R., Pollack, J.R. and Sikela, J.M. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biology Jul 2(7):E207, 2004.

The Intronic EST hotspots track (hg17) highlights non-coding genomic regions that have a high degree of EST coverage (EST hotspot) within "consensus intronic regions", i.e. regions that are intronic in all RefSeq transcript variants of a given gene. They are an invaluable tool in identifying novel coding and non-coding elements within the genome. This annotation was contributed by Xitong Li and Christina Zheng at Genomic Health, Inc.

Tracks providing CpG island strength predictions and mapping of bona fide CpG islands for the human genome (hg17/hg18). The tracks are based on large-scale epigenome predictions, which give rise to an improved and quantitative annotation of CpG islands. Additional information on these tracks is available from the supplementary website and from the corresponding paper Bock, C. et al. CpG island mapping by epigenome prediction to appear in PLoS Comput Biol. For prioritization of candidate regions, the quantitative CpG island strength predictions are recommended (hg17/hg18). For genome annotation, three maps of bona fide CpG islands are provided: (i) a highly specific map (hg17/hg18), (ii) a balanced map recommended for most applications (hg17/hg18) and (iii) a highly sensitive map (hg17/hg18). Finally, all tracks can be viewed simultaneously (hg17/hg18), which may take longer to load.

Three tracks (hg17) accompanying the paper Nakaya HI et al. Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol. 2007 Mar 26;8(3):R43. The TIN_RNAs track shows the genomic mapping coordinates of all 55,139 Totally Intronic Noncoding RNA (TIN RNA) transcripts identified in the human genome. The PIN_RNAs track shows the mapping coordinates of all 12,592 Partially Intronic Noncoding RNA (PIN RNA) transcripts. The TIN_PIN_probes track shows the genomic coordinates of all TIN and PIN sense and antisense intronic probes plus the exonic probes in a custom-designed 44K intron-exon oligoarray. This array was used for gene expression experiments with human prostate, kidney and liver tissues. Thanks to Sergio Verjovski-Almeida, Eduardo M. Reis, and Helder I. Nakaya from Instituto de Quimica - Universidade de Sao Paulo for contributing these data sets.

Copy-Number Variants (hg17) accompanying the paper Wong, K. et al. A Comprehensive Analysis of Common Copy-Number Variations in the Human Genome. American Journal of Human Genetics 80:91-104 (2007). The following color scheme is used to indicate the frequency with which clones were seen: blue (1 or 2), red (3), green (4 or 5), black (6 or more). Thanks to Kendy Wong and Ronald deLeeuw for contributing this data.

Data sets (hg17) accompanying the paper Carroll, J.S. et al. Genome-wide analysis of estrogen receptor binding sites. Nature Genet. 38(10) 2006. The set of six custom tracks shows ER and RNA Pol2 ChIP-chip data at two cutoffs (low and high), upregulated genes, and downregulated genes. Thanks to the Myles Brown lab at the Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA for contributing these data.

Sliding window analysis of Tajima's D across the human genome (hg17) and (hg16). This track identifies regions putatively subject to strong, recent, selective sweeps and identified Contiguous Regions of Tajima's D Reduction (CRTRs) in each of three populations. For details, see the Tajima's D SNPs track on the hg17 and hg16 Genome Browsers, as well as Christopher S. Carlson et al. Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res. 15:1553-1565 (2005).

Structural RNAs predicted by RNAz (hg17). This track displays putative functional RNA elements with exceptionally stable and/or evolutionary conserved secondary structure. For a description of the RNAz program, see Washietl, S., Hofacker, I.L. and Stadler, P.F. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA 102(7), 2454-2459 (2005). Additional information on how this track has been generated can be found here. Thanks to Stefan Washietl, Ivo Hofacker and Peter F. Stadler for contributing this annotation.

Alternative conserved exons predicted by ACEScan (hg17). This track displays human exons (from Known Genes with an exonic alignment to mouse) that have a positive ACEScan score. For a description of the methods used to generate this annotation, see Yeo, G.W. et al.. Identification and analysis of alternative splicing events conserved in human and mouse. Proc. Natl. Acad. Sci. USA 102(8), 2850-2855 (2005). The ACEScan online webtool is available at http://genes.mit.edu/acescan. Thanks to Gene Yeo and Chris Burge at MIT for contributing this annotation.

Perfect LINEs identified by GPS (hg17). This track displays regions in the chromosome in which all the components have at least 10% identity to the query (Retroid Agent) and no frame shifts or stop codons in the gene coding regions. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 22, X, Y. Thanks to Dr. Marcella McClure and Vijay A. Raghavan at Montana State University for providing this annotation.

Database of Transcribed Sequences (DoTS) Genes (hg17) generated using BLAT alignments of DoTS RNAs. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, M. Thanks to Y. Thomas Gan for creating this track.

DoTS Genes (hg16). Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, M. Thanks to Y. Thomas Gan for creating this track.

Isochore track (hg17) generated using IsoFinder, a segmentation algorithm developed by Grupo de Bioinformatica, Universidad de Granada, Spain. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y. Data on older human assemblies are also available (hg16, hg15, hg13, hg12). Thanks to Dr. Jose L. Oliver for contributing this track.

Stanford Human Promoters (hg16, hg15, hg13). The Stanford Human Promoters data sets were generated by the Richard M. Myers lab at Stanford University and is described in Trinklein, N., Force Aldred, S., Saldanha, A., and Myers, R.M. (2003). Identification and Functional Analysis of Human Transcriptional Promoters. Genome Res., 13:308-312. Thanks to Nathan Trinklein at Stanford School of Medicine for contributing this track, and to Daryl Thomas of UCSC for lifting the hg15 data to the hg16 assembly.

Mouse Ortholog (hg12). Human and Mouse gene predictions based on fgenesh++ clustered using a BLAT protein alignment and the reciprocal best matches retained. Thanks to Robert Baertsch for creating this track.

Penn State University Known Regulatory Regions Set 1 (hg12). This set contains acollection of known regulatory regions gathered from literature. Set 1 is limited to the smallest recognized segment containing full function, and was used as the data set for Elnitski L, Hardison RC, Li J, Yang S, Kolbe D, Eswara P, O'Connor MJ, Schwartz S, Miller W, and Chiaromonte F. (2003). Distinguishing Regulatory DNA From Neutral Sites. Genome Res., 13:64-72. For more information, see http://bio.cse.psu.edu/mousegroup/Reg_annotations/. Thanks to Robert Baertsch for creating this track.

Penn State University Known Regulatory Regions Set 2 (hg12). This set of functional regions contains names and coordinates of an additional set of regulatory regions that were not trimmed (as in Set 1) to show the smallest possible functional element with maximum activity. The regions range in size from 300-4000 bp. For more information, see http://bio.cse.psu.edu/mousegroup/Reg_annotations/. Thanks to Robert Baertsch for creating this track.

Mouse Genome

Transcriptome-wide monoallelic expression in CNS-derived stem cells for four clonal hybrid (B6 x JF1) cell lines is displayed in mm9. The track shows the allelic preference for cell lines 2A1, 2A5, 3A1 and 4A5 at JF1 cSNP locations. The allelic preference is denoted by the proportion of the B6 allele vs JF1 allele. For more information, see: Li SM et al. Transcriptome-wide survey of CNS-derived cells reveals monoallelic expression within novel gene families. PLoS ONE. 2012 Feb;7(2):e31751.

Genome-wide DNase hypersensitivity in male and female mouse liver mapped by DNase I treatment of pooled livers from male and female mice coupled with high-throughput sequencing (DNase-seq). The tracks here are BED files representing (1) Liver_DHS_peaks: peaks identified using PeakSeq (Rozowsky et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009;27(1):66-75), and (2) Liver_DHS_regions: broader regions of hypersensitivity identified using SICER (Zang et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 2009;25(15):1952-8), that are sex-independent (gray) and sex-specific (blue for male-specific, pink for female-specific; darker shade for higher stringency for sex-specificity). For more information, see Ling G, Sugathan A, Mazor T, Fraenkel E, Waxman DJ. Unbiased, genome-wide in vivo mapping of transcriptional regulatory elements reveals sex differences in chromatin structure associated with sex-specific liver gene expression. Mol Cell Biol. 2010 Dec;30(23):5531-44.

DMRT1 is a transcription factor that is expressed in germ cells and Sertoli cells and plays multiple roles in testis development. This study analyzed DMRT1 genome wide promoter occupancy in the mouse testis at postnatal day 9 as determined by ChIP-chip on Nimblegen mouse promoter arrays. The three WIG traces [1] [2] [3] are from three independent biological replicates and displayed on mouse genome assembly mm8. The WIG traces represent the enrichment for each probe on the array calculated as the log-ratio of the intensities of the DMRT1 ChIP product (Cy5) to control input chromatin (Cy3). More details and gene expression analysis can be found on the associated interactive web site: www.dmrt1.umn.edu and in the publication: Murphy MW, Sarver AL, Rice D, Hatzi K, Ye K, Melnick A, Heckert LL, Zarkower D, Bardwell VJ. Genome wide analysis of DNA binding and transcriptional regulation by the mammalian Doublesex homolog DMRT1 in the juvenile testis. PNAS. 2010 July 2. [Epub ahead of print] PNAS:1006243107.

Farnesoid X receptor (FXR) is a bile acid-activated transcription factor belonging to the nuclear receptor superfamily. FXR is highly expressed in liver and intestine, and crosstalk mediated by FXR in these two organs is critical in maintaining bile acid homeostasis. This study analyzed genome-wide FXR binding in liver and intestine of mice treated with a synthetic FXR ligand (GW4064) by chromatin immunoprecipitation coupled to massively parallel sequencing (ChIP-seq). The Fxr Liver and Fxr Intestine tracks shown here are WIG files that represent the number of times a particular 35bp fragment of DNA was sequenced in the reaction. More details can be found in the publication Thomas AM, Hart SN, Kong B, Fang J, Zhong XB, Guo GL. Genome-wide tissue-specific farnesoid X receptor binding in mouse liver and intestine. Hepatology. 2009 Nov 30. [Epub ahead of print] PMID: 20091679. Thanks to Ann Thomas and Steven Hart in the Department of Pharmacology, Toxicology, and Therapeutics at the University of Kansas Medical Center, Kansas City, KS, for contributing these tracks.

An experiment looking at four different ages of mouse liver to observe how different histone modifications (DNA methylation, H3K4me2, and H3K27) change across postnatal development (mm9). A ChIP-on-chip tiling array for three mouse chromosomes (chr5, chr12, chr15) was used. The tracks show three types of data: 1) a genomic region with a sequence of >800 bp and an average signal increase greater than the threshold, defined as an interval, 2) a genomic region with one or more enriched intervals in close proximity to each other (at least one base overlap) at any given age, defined as an active region, and 3) peak values for each interval. For more information, see Li Y, Cui Y, Hart SN, Klaassen CD, Zhong X. Dynamic patterns of histone methylation are associated with ontogenic expression of the Cyp3a genes during mouse liver maturation. Mol Pharmacol. 2009 May;75(5):1171-1179. Thanks to Steven Hart in the Department of Pharmacology, Toxicology, and Therapeutics at the University of Kansas Medical Center, Kansas City, KS, for contributing these tracks.

BayGenomics mouse knockout gene tags (mm7), generated using BLAT alignments of sequences derived from gene-trap vector insertions into thousands of genes in mouse embryonic stem cells. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X, Y, M. Thanks to the BayGenomics bioinformatics group at UCSF for providing this track.

BayGenomics mouse knockout gene tags (mm6), generated using BLAT alignments of sequences derived from gene-trap vector insertions into thousands of genes in mouse embryonic stem cells. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X, Y, M. Thanks to the BayGenomics bioinformatics group at UCSF for providing this track.

Locations of known, suspected, and imputed SNPs generated by BLAT alignment of 3 million Celera associated sequences to the May 2004 mouse genome assembly (mm5), provided by The GeneNetwork and WebQTL. Only those SNPs that distinguish strains C57BL/6J from DBA/2J (1.75 million) or that distinguish C57BL/6J from A/J (1.80 million) are displayed in the custom track. Due to the proprietary nature of these data, only low resolution position data (SNP density per 100,000 to 300,000 bp) are currently provided. This custom track is available on any PHYSICAL and GENETIC maps in WebQTL for the BXD and AXB/BXA genetic reference panels simply by clicking on interval maps. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X. Thanks to Celera Genomics (Richard Mural and Paul Thomas) for this level of access to CDS data and to Christopher Vincent (Georgia Tech), Alex G. Williams (UCSC); Robert Crowell (UTHSC and MIT), Gary Churchill and Natalie Blades (The Jackson Laboratory), and the WebQTL group at UTHSC (Jintao Wang, Yanhua Qu, Yan Cui, Robert Williams, and Kenneth Manly) for contributing this track.

Isochore track (mm5) generated using IsoFinder, a segmentation algorithm developed by Grupo de Bioinformatica, Universidad de Granada, Spain. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X. Data are also available for the mm3 assembly. Click here for more information about this annotation. Thanks to Dr. Jose L. Oliver for contributing this track.

Rat Genome

Isochore track (rn3) generated using IsoFinder, a segmentation algorithm developed by Grupo de Bioinformatica, Universidad de Granada, Spain. Click on the chromosome you wish to display: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, X. Data are also available for the rn2 and rn1 assemblies. Click here for more information about this annotation. Thanks to Dr. Jose L. Oliver for contributing this track.

Tetraodon Genome

CAGE transcription start sites for the tetraodon (tetNig2) genome. Thank you to Chirag Nepal for creating these custom annotation tracks. For a full list of those individuals and institutions involved in the creation of the data included in these tracks, please refer to the following paper, Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis. This paper also includes information on the methods used to generate the data included in these tracks. These tracks can be viewed in the following Browser session.

Zebrafish Genome

A suite of tracks for the zebrafish (danRer7) genome that include CAGE transcription start sites, plus H3K4me3 and RNAseq coverage. Thank you to Chirag Nepal for creating these custom annotation tracks. For a full list of those individuals and institutions involved in the creation of the data included in these tracks, please refer to the following paper, Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis. This paper inlcudes information on the methods used to produce the data included in these tracks. These tracks can be viewed in the following Browser session.

Yeast Genome

A compiled and systematic reference map of nucleosome positions across the Saccharomyces genome. Thanks to Cizhong Jiang and B. Franklin Pugh of the Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA for contributing this annotation. This work was supported by a grant from NIH (HG004160). The contributors would like to thank members of the Pugh lab for their numerous helpful comments.

Multi-Species Annotations

A Hidden Markov Model (HMM) based method was used to look for CpG islands (CGI) from DNA sequences. Two HMMs are fitted for GC content and observed to expected ratios of CpG counts. The CGIs were detected by jointly thresholding the result posterior probabilities. Unlike the current CGI definition which was derived from studying promoters of known human genes, this method is data-driven and can be applied to species with different sequence compositions. For details please see Wu H, Caffo B, Jaffee HA, Feinberg AP, Irizarry RA. Redefining CpG Islands Using Hidden Markov Models. Biostatistics 2010 July 3;11(3):499-514.

H. sapiens (human) hg19
H. sapiens (human) hg18
P. troglodytes (chimpanzee) panTro2
P. abelii (Orangutan) ponAbe2
R. macaque (monkey) rheMac2
M. musculus (mouse) mm9
M. musculus (mouse) mm8
C. familiaris (dog) canFam2
E. caballus (horse) equCab2
D. melanogaster (fruit fly) dm3
C. elegans (worm) ce2

Lists of CpG islands and an R software package can be download from http://rafalab.jhsph.edu/CGI/.