HAVANA

The HAVANA group provides the manual annotation of human, mouse, zebrafish and other vertebrate genomes that appears in the Vega browser.

The value of a genome is only as good as its annotation. To create a gold standard reference annotation the Human and Vertebrate Analysis and Annotation (HAVANA) team uses tools developed in-house to manually annotate human, mouse and zebrafish genomes.

The team aims to develop accurate and comprehensive annotation representing the full complexity of gene loci and their features. Manual annotation is especially important in areas that are not well catered for by automated annotation systems, such as splice variation, pseudogenes, conserved gene families, duplications and non-coding genes. The HAVANA team constantly updates its methods by incorporating new data sources that are created as new technologies are developed. HAVANA annotation is freely available through genome browsers, including VEGA, Ensembl and UCSC.

If you have any queries regarding our annotation, please contact us at vega-helpdesk@sanger.ac.uk.

[Genome Research Limited]

  • Background
  • Collaborations
  • Annotation
  • Publications

Background

The Sanger Institute has made large contributions to a large number of vertebrate genome sequences, including all or part of human chromosomes 1, 6, 9, 10, 13, 20, 22 and X and mouse chromosomes 2, 4, 11 and X, and the full Danio rerio (zebrafish) genome sequence. The Institute has also sequenced or continues to sequence selected parts of other vertebrate genomes, including candidate diabetes gene regions (in reference and non-obese diabetic (NOD) mouse strains) and MHC regions (in wallaby, Tasmanian devil, gorilla, dog, pig, human haplotypes and mouse strains). The HAVANA group provides the manual annotation for these and other genome sequences.

Collaborations

The HAVANA group collaborates with others in both small and large projects. The largest projects are designed to annotate the entire human genome and the majority of coding genes in mouse. The following are the main HAVANA collaborations relating to these projects:

ENCODE (Encyclopedia of DNA Elements) and GENCODE

The ENCODE and GENCODE projects provide in-depth, coordinated analysis of the entire human genome using experimental, computational and manual techniques. HAVANA manual annotation serves as the reference annotation underlying this global project. Continuous feedback between collaborators working on the three different aspects encourages refinement of all techniques involved.
ENCODE website

CCDS (Consensus Coding Sequence)

CCDS is a collaboration between the Sanger Institute (Ensembl, VEGA, HAVANA), UCSC (Genome Bioinformatics Group) and NCBI (RefSeq). CCDS strives to provide a comprehensive database of high-quality coding regions from the human and mouse genomes agreed by all collaborators. Annotation from Sanger Institute and RefSeq, which is created using different techniques, is compared and a CCDS entry is created when the two agree on the coding sequence structure for a given transcript or locus. Conflicts are discussed between all three parties and, where a consensus can be reached, a CCDS entry is created.
CCDS website

IKMC (International Knockout Mouse Consortium)

IKMC is a collaboration between the three main mouse knockout projects: EUCOMM (European Conditional Mouse Mutagenesis), KOMP (Knockout Mouse Project) and NorCOMM (North American Conditional Mouse Mutagenesis). Manual annotation by the HAVANA group and collaborators at Washington University, St Louis, and University of Manitoba, Winnipeg, serves as the foundation for constructing knockout mouse cell lines for every coding gene.
IKMC website

GRC (Genome Reference Consortium)

A collaboration between the Wellcome Trust Sanger Institute, the Genome Center at WashU, the EBI and the NCBI, the GRC aims to provide the best possible genome assemblies for human, mouse and zebrafish. It does so by investigating potential variation, errors, conflicts and sequence gaps with a view to choosing the best or multiple representations of variant sequence, correcting errors, resolving conflicts and filling-in gaps. HAVANA's role is to report and feed back any of these issues affecting genes in the three species.
GRC website

spacer

Flow of information between HAVANA (blue and red shapes), collaborators and databases. Thick arrows are direct collaborations, thin arrows show indirect feeding of HAVANA annotation back into the analysis pipeline.

zoom

Annotation

HAVANA annotation is publicly available from the following websites:

  • VEGA
  • Ensembl
  • UCSC

The HAVANA group puts special emphasis on splice variants and pseudogenes, two areas still underdeveloped in automated annotation systems, as well as poly-adenylation features. Also, where other systems concentrate on, or are limited to, protein-coding genes, many HAVANA transcripts are annotated without a protein-coding region. These transcripts may function as non-coding RNAs or they may be incomplete gene fragments for which the coding sequence cannot yet be determined.

The HAVANA group requires that all annotated gene structures (transcripts) are supported by transcriptional evidence, either from cDNA, EST or protein sequences. As such not all annotated transcripts are necessarily complete. Support does not need to come from locus-specific evidence, but can also be homologous, paralogous or orthologous.

While the transcript and protein sequences are the most important pieces of information, HAVANA annotation takes into account and uses other data, such as CpG islands, gene predictions, repeats and genome signatures. Because the annotation software used is DAS (Distributed Annotation System) aware, the HAVANA team can link to external data sources. Ensembl gene models and data from GENCODE collaborators are some of the DAS sources the HAVANA group uses. HAVANA sources are under constant review and subject change. For example, the group recently started to use data from new technologies such as RNAseq and protein mass spectrometry in its annotation efforts.

Annotation guidelines

Like its data sources, HAVANA's annotation guidelines are under constant review and are routinely updated to take into account feedback from collaborators, incorporate new data sources and reflect new trends in genetics, transcriptomics, proteomics and genomics.

HAVANA Annotation guidelines detail our annotation standards.

Otterlace

We use the in-house developed and maintained Otterlace annotation suite for manual annotation. This suite comprises an automated analysis pipeline based on the Ensembl pipeline, graphical interfaces for viewing the pipeline results and interfaces for creating and modifying transcript models. The figure shows a selection of user interfaces from Otterlace.

spacer

Annotation interfaces in Otterlace

zoom

The Otterlace user manual gives guidance on how to use the Otterlace interfaces.

Nomenclature

As well as modelling accurate transcript models, it is important to use the correct gene nomenclature. To maintain consistency in an annotation database, especially important when working with syntenic regions across species or haplotypes within a single species, the HAVANA annotation group interacts closely with the nomenclature committees for the human, mouse and zebrafish genomes.

  • Human genome nomenclature
  • Mouse genome nomenclature
  • Zebrafish genome nomenclature

Publications

• Journal papers

spacer

citations per annum of HAVANA (co-)authored publications

zoom


  • Fine mapping of type 1 diabetes regions Idd9.1 and Idd9.2 reveals genetic complexity.

    Hamilton-Williams EE, Rainbow DB, Cheung J, Christensen M, Lyons PA, Peterson LB, Steward CA, Sherman LA and Wicker LS

    Mammalian genome : official journal of the International Mammalian Genome Society 2013;24;9-10;358-75

    PUBMED: 23934554; PMC: 3824839; DOI: 10.1007/s00335-013-9466-y

  • The zebrafish reference genome sequence and its relationship to the human genome.

    Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, McLaren S, Sealy I, Caccamo M, Churcher C, Scott C, Barrett JC, Koch R, Rauch GJ, White S, Chow W, Kilian B, Quintais LT, Guerra-Assunção JA, Zhou Y, Gu Y, Yen J, Vogel JH, Eyre T, Redmond S, Banerjee R, Chi J, Fu B, Langley E, Maguire SF, Laird GK, Lloyd D, Kenyon E, Donaldson S, Sehra H, Almeida-King J, Loveland J, Trevanion S, Jones M, Quail M, Willey D, Hunt A, Burton J, Sims S, McLay K, Plumb B, Davis J, Clee C, Oliver K, Clark R, Riddle C, Elliot D, Eliott D, Threadgold G, Harden G, Ware D, Begum S, Mortimore B, Mortimer B, Kerry G, Heath P, Phillimore B, Tracey A, Corby N, Dunn M, Johnson C, Wood J, Clark S, Pelan S, Griffiths G, Smith M, Glithero R, Howden P, Barker N, Lloyd C, Stevens C, Harley J, Holt K, Panagiotidis G, Lovell J, Beasley H, Henderson C, Gordon D, Auger K, Wright D, Collins J, Raisen C, Dyer L, Leung K, Robertson L, Ambridge K, Leongamornlert D, McGuire S, Gilderthorp R, Griffiths C, Manthravadi D, Nichol S, Barker G, Whitehead S, Kay M, Brown J, Murnane C, Gray E, Humphries M, Sycamore N, Barker D, Saunders D, Wallis J, Babbage A, Hammond S, Mashreghi-Mohammadi M, Barr L, Martin S, Wray P, Ellington A, Matthews N, Ellwood M, Woodmansey R, Clark G, Cooper J, Cooper J, Tromans A, Grafham D, Skuce C, Pandian R, Andrews R, Harrison E, Kimberley A, Garnett J, Fosker N, Hall R, Garner P, Kelly D, Bird C, Palmer S, Gehring I, Berger A, Dooley CM, Ersan-Ürün Z, Eser C, Geiger H, Geisler M, Karotki L, Kirn A, Konantz J, Konantz M, Oberländer M, Rudolph-Geiger S, Teucke M, Lanz C, Raddatz G, Osoegawa K, Zhu B, Rapp A, Widaa S, Langford C, Yang F, Schuster SC, Carter NP, Harrow J, Ning Z, Herrero J, Searle SM, Enright A, Geisler R, Plasterk RH, Lee C, Westerfield M, de Jong PJ, Zon LI, Postlethwait JH, Nüsslein-Volhard C, Hubbard TJ, Roest Crollius H, Rogers J and Stemple DL

    Nature 2013;496;7446;498-503

    PUBMED: 23594743; PMC: 3703927; DOI: 10.1038/nature12111

  • Ensembl 2013.

    Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ritchie GR, Ruffier M, Schuster M, Sheppard D, Sobral D, Taylor K, Thormann A, Trevanion S, White S, Wilder SP, Aken BL, Birney E, Cunningham F, Dunham I, Harrow J, Herrero J, Hubbard TJ, Johnson N, Kinsella R, Parker A, Spudich G, Yates A, Zadissa A and Searle SM

    Nucleic acids research 2013;41;Database issue;D48-55

    PUBMED: 23203987; PMC: 3531136; DOI: 10.1093/nar/gks1236

  • The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes.

    Steward CA, Gonzalez JM, Trevanion S, Sheppard D, Kerry G, Gilbert JG, Wicker LS, Rogers J and Harrow JL

    Database : the journal of biological databases and curation 2013;2013;bat032

    PUBMED: 23729657; PMC: 3668384; DOI: 10.1093/database/bat032

  • Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Wilming LG, Hart EA, Coggill PC, Horton R, Gilbert JG, Clee C, Jones M, Lloyd C, Palmer S, Sims S, Whitehead S, Wiley D, Beck S and Harrow JL

    Database : the journal of biological databases and curation 2013;2013;bat011

    PUBMED: 23589541; PMC: 3626023; DOI: 10.1093/database/bat011

  • Structural and functional annotation of the porcine immunome.

    Dawson HD, Loveland JE, Pascal G, Gilbert JG, Uenishi H, Mann KM, Sang Y, Zhang J, Carvalho-Silva D, Hunt T, Hardy M, Hu Z, Zhao SH, Anselmo A, Shinkai H, Chen C, Badaoui B, Berman D, Amid C, Kay M, Lloyd D, Snow C, Morozumi T, Cheng RP, Bystrom M, Kapetanovic R, Schwartz JC, Kataria R, Astley M, Fritz E, Steward C, Thomas M, Wilming L, Toki D, Archibald AL, Bed'Hom B, Beraldi D, Huang TH, Ait-Ali T, Blecha F, Botti S, Freeman TC, Giuffra E, Hume DA, Lunney JK, Murtaugh MP, Reecy JM, Harrow JL, Rogel-Gaillard C and Tuggle CK

    BMC genomics 2013;14;332

    PUBMED: 23676093; PMC: 3658956; DOI: 10.1186/1471-2164-14-332

  • The B10 Idd9.3 locus mediates accumulation of functionally superior CD137(+) regulatory T cells in the nonobese diabetic type 1 diabetes model.

    Kachapati K, Adams DE, Wu Y, Steward CA, Rainbow DB, Wicker LS, Mittler RS and Ridgway WM

    Journal of immunology (Baltimore, Md. : 1950) 2012;189;10;5001-15

    PUBMED: 23066155; PMC: 3505683; DOI: 10.4049/jimmunol.1101013

  • Analyses of pig genomes provide insight into porcine demography and evolution.

    Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, Bystrom M, Capitanu B, Carvalho-Silva D, Chardon P, Chen C, Cheng R, Choi SH, Chow W, Clark RC, Clee C, Crooijmans RP, Dawson HD, Dehais P, De Sapio F, Dibbits B, Drou N, Du ZQ, Eversole K, Fadista J, Fairley S, Faraut T, Faulkner GJ, Fowler KE, Fredholm M, Fritz E, Gilbert JG, Giuffra E, Gorodkin J, Griffin DK, Harrow JL, Hayward A, Howe K, Hu ZL, Humphray SJ, Hunt T, Hornshøj H, Jeon JT, Jern P, Jones M, Jurka J, Kanamori H, Kapetanovic R, Kim J, Kim JH, Kim KW, Kim TH, Larson G, Lee K, Lee KT, Leggett R, Lewin HA, Li Y, Liu W, Loveland JE, Lu Y, Lunney JK, Ma J, Madsen O, Mann K, Matthews L, McLaren S, Morozumi T, Murtaugh MP, Narayan J, Nguyen DT, Ni P, Oh SJ, Onteru S, Panitz F, Park EW, Park HS, Pascal G, Paudel Y, Perez-Enciso M, Ramirez-Gonzalez R, Reecy JM, Rodriguez-Zas S, Rohrer GA, Rund L, Sang Y, Schachtschneider K, Schraiber JG, Schwartz J, Scobie L, Scott C, Searle S, Servin B, Southey BR, Sperber G, Stadler P, Sweedler JV, Tafer H, Thomsen B, Wali R, Wang J, Wang J, White S, Xu X, Yerle M, Zhang G, Zhang J, Zhang J, Zhao S, Rogers J, Churcher C and Schook LB

    Nature 2012;491;7424;393-8

    PUBMED: 23151582; PMC: 3566564; DOI: 10.1038/nature11622

  • An integrated encyclopedia of DNA elements in the human genome.

    ENCODE Project Consortium

    Nature 2012;489;7414;57-74

    PUBMED: 22955616; PMC: 3439153; DOI: 10.1038/nature11247

  • Landscape of transcription in human cells.

    Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Falconnet E, Fastuca M, Fejes-Toth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Ruan X, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan Y, Wold B, Carninci P, Guigó R and Gingeras TR

    Nature 2012;489;7414;101-8

    PUBMED: 22955620; PMC: 3684276; DOI: 10.1038/nature11233

  • The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

    Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J and Guigó R

    Genome research 2012;22;9;1775-89

    PUBMED: 22955988; PMC: 3431493; DOI: 10.1101/gr.132159.111

  • Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

    Howald C, Tanzer A, Chrast J, Kokocinski F, Derrien T, Walters N, Gonzalez JM, Frankish A, Aken BL, Hourlier T, Vogel JH, White S, Searle S, Harrow J, Hubbard TJ, Guigó R and Reymond A

    Genome research 2012;22;9;1698-710

    PUBMED: 22955982; PMC: 3431487; DOI: 10.1101/gr.134478.111

  • Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function.

    Ezkurdia I, del Pozo A, Frankish A, Rodriguez JM, Harrow J, Ashman K, Valencia A and Tress ML

    Molecular biology and evolution 2012;29;9;2265-83

    PUBMED: 22446687; PMC: 3424414; DOI: 10.1093/molbev/mss100

  • GENCODE: the reference human genome annotation for The ENCODE Project.

    Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M,

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.