The DDBJ/ENA/GenBank 
Feature Table:
Definition

Version 10.5 November 2015



DNA Data Bank of Japan, Mishima, Japan.
EMBL-EBI, European Nucleotide Archive, Cambridge, UK.
GenBank, NCBI, Bethesda, MD, USA.


1 Introduction
2 Overview of the Feature Table format
2.1 Format Design
2.2 Key aspects of this feature table design
2.3 Feature Table Terminology
3 Feature table components and format
3.1 Naming conventions
3.2 Feature keys
3.2.1 Purpose
3.2.2 Format and conventions
3.2.3 Key groups and hierarchy
3.2.4 Feature key examples
3.3 Qualifiers
3.3.1 Purpose
3.3.2 Format and conventions
3.3.3 Qualifier values
3.3.4 Qualifier examples
3.4 Location
3.4.1 Purpose
3.4.2 Format and conventions
3.4.3 Location examples
4 Feature table Format
4.1 Format examples
4.2 Definition of line types
4.3 Data item positions
4.4 Use of blanks
5 Examples of sequence annotation
5.1 Eukaryotic gene
5.2 Bacterial operon
5.3 Artificial cloning vector (circular)
5.4 Plasmid
5.5 Repeat element
5.6 Immunoglobulin heavy chain
5.7 T-cell receptor
5.8 Transfer RNA
6 Limitations of this feature table design
7 Appendices
7.1 Appendix I EMBL, GenBank and DDBJ entries
7.1.1 EMBL Format
7.1.2 GenBank Format
7.1.3 DDBJ Format
7.2 Appendix II: Feature keys reference
7.3 Appendix III: Summary of qualifiers for feature keys
7.3.1 Qualifier List
7.4 Appendix IV: Controlled vocabularies
7.4.1 Nucleotide base codes (IUPAC)
7.4.2 Modified base abbreviations
7.4.3 Amino acid abbreviations
7.4.4 Modified and unusual Amino Acids
7.4.5 Genetic Code Tables
7.4.6 Country Names
7.4.7 Announces

1 Introduction

Nucleic acid sequences provide the fundamental starting point for describing 
and understanding the structure, function, and development of genetically 
diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data 
banks have from their inception used tables of sites and features to describe 
the roles and locations of higher order sequence domains and elements within 
the genome of an organism. 
In February, 1986, GenBank and EMBL began a collaborative effort (joined by 
DDBJ in 1987) to devise a common feature table format and common standards for 
annotation practice.

2 Overview of the Feature Table format

The overall goal of the feature table design is to provide an extensive 
vocabulary for describing features in a flexible framework for manipulating 
them. The Feature Table documentation represents the shared rules that allow 
the three databases to exchange data on a daily basis. 
The range of features to be represented is diverse, including regions which: 
* perform a biological function, 
* affect or are the result of the expression of a biological function, 
* interact with other molecules, 
* affect replication of a sequence, 
* affect or are the result of recombination of different sequences, 
* are a recognizable repeated unit, 
* have secondary or tertiary structure,
* exhibit variation, or have been revised or corrected.

2.1 Format Design

 
The format design is based on a tabular approach and consists of the following 
items: 

Feature key - a single word or abbreviation indicating functional group  
Location - instructions for finding the feature 
Qualifiers - auxiliary information about a feature

2.2 Key aspects of this feature table design

* Feature keys allow specific annotation of important sequence features.

* Related features can be easily specified and retrieved.
Feature keys are arranged hierarchically, allowing complex and compound 
features to be expressed. Both location operators and the feature keys show 
feature relationships even when the features are not contiguous. The hierarchy 
of feature keys allows broad categories of biological functionality, such as 
rRNAs, to be easily retrieved.

* Generic feature keys provide a means for entering new or undefined features.
A number of "generic" or miscellaneous feature keys have been added to permit 
annotation of features that cannot be adequately described by existing feature 
keys. These generic feature keys will serve as an intermediate step in the 
identification and addition of new feature keys. The syntax has been designed 
to allow the addition of new feature keys as they are required. 

* More complex locations (fuzzy and alternate ends, for example) can be specified.
Each end point of a feature may be specified as a single point, an alternate 
set of possible end points, a base number beyond which the end point lies, or 
a region which contains the end point. 

* Features can be combined and manipulated in many different ways.
The location field can contain operators or functional descriptors specifying 
what must be done to the sequence to reproduce the feature. For example, a 
series of exons may be "join"ed into a full coding sequence. 

* Standardized qualifiers provide precision and parsibility of descriptive details 
A combination of standardized qualifiers and their controlled-vocabulary 
values enable free-text descriptions to be avoided.
 
* The nature of supporting evidence for a feature can be explicitly indicated.
Features, such as open reading frames or sequences showing sequence similarity 
to consensus sequences, for which there is no direct experimental evidence can 
be annotated. Therefore, the feature table can incorporate contributions from 
researchers doing computational analysis of the sequence databases. However, 
all features that are supported by experimental data will be clearly marked as 
such. 

* The table syntax has been designed to be machine parsible.
A consistent syntax allows machine extraction and manipulation of sequences 
coding for all features in the table.

2.3 Feature Table Terminology

The format and wording in the feature table use common biological research 
terminology whenever possible. For example, an item in the feature table such as: 

Key             Location/Qualifiers
CDS             23..400
                /product="alcohol dehydrogenase" 
                /gene="adhI"
 
might be read as: 
The feature  CDS  is a coding sequence beginning at base 23 and ending at base 
400, has a product called 'alcohol dehydrogenase' and is coded for by a gene 
called "adhI".

A more complex description:
Key             Location/Qualifiers
CDS             join(544..589,688..>1032)
                /product="T-cell receptor beta-chain"

which might be read as: 
This feature, which is a partial coding sequence,  is formed by joining 
elements indicated to form one contiguous sequence encoding a product called T-
cell receptor beta-chain. 

The following sections contain detailed explanations of the feature table 
design showing conventions for each component of the feature table, examples 
of how the format might be implemented, a description of the exact column 
placement of all the data items and examples of complete sequence entries that 
have been annotated using the new format. The last section of this document 
describes known limitations of the current feature table design. 

Appendix I gives an example database entry for the DDBJ, GenBank and EMBL  
formats. 

Appendices II and III provide reference manuals for the feature table keys and 
qualifiers, respectively. 

Appendix IV includes controlled vocabularies such as nucleotide base codes, 
modified base abbreviations, genetic code tables etc.

This document defines the syntax and vocabulary of the feature table. The 
syntax is sufficiently flexible to allow expression of a single biological 
entity in numerous ways. In such cases, the annotation staffs at the databases 
will propose conventions for standard means of denoting the entities. 
This feature table format is shared by GenBank, EMBL and DDBJ. Comments, 
corrections, and suggestions may be submitted to any of the database staffs. 
New format specifications will be added as needed.

3 Feature table components and format

3.1 Naming conventions

Feature table components, including feature keys, qualifiers, accession 
numbers, database name abbreviations, and location operators, are all named 
following the same conventions. Component names may be no more than 20 
characters long  (Feature keys 15, Feature qualifiers 20) and must 
contain at least one letter. The following characters are permitted to 
occur in feature table component names: 

* Uppercase letters (A-Z) 
* Lowercase letters (a-z) Numbers (0-9) 
* Underscore (_) 
* Hyphen (-) 
* Single quotation mark or apostrophe (') 
* Asterisk (*)

3.2 Feature keys

3.2.1 Purpose

Feature keys indicate 
(1) the biological nature of the annotated feature or 
(2) information about changes to or other versions of the sequence. 
The feature key permits a user to quickly find or retrieve similar features or 
features with related functions.

3.2.2 Format and conventions

There is a defined list of allowable feature keys, which is shown in Appendix 
II. Each feature must contain a feature key.

3.2.3 Key groups and hierarchy

The feature keys fall into families which are in some sense similar in 
function and which are annotated in a similar manner. A functional family may 
have a "generic" or miscellaneous key, which can be recognized by the 'misc.' 
prefix, that can used for instances not covered by the other defined keys of 
that group. 

The feature key groups are listed below with a short definition and an 
annotation example: 

1. Difference and change features

Indicate ways in which a sequence should be changed to produce a different 
"version": 
misc_difference location
              /replace="change_location"

2. Transcript features

Indicate products made by a region: 
misc_RNA        location


3. Binding features

Indicate that a sequence or nucleotide is covalently, non-covalently, or 
otherwise bound to something else: 
misc_binding    location
                /bound_moiety="bound molecule" 

4. Repeat features

Indicate repetitive sequence elements: 
repeat_region   location


5. Recombination features

Indicate regions that have been either inserted or deleted by recombination: 
misc_recomb     location


6. Structure features

Indicate sequence for which there is secondary or tertiary structural 
information: 
misc_structure  location

3.2.4 Feature key examples

Key                     Description     

CDS                     Protein-coding sequence 
rep_origin              Origin of replication
protein_bind            Protein binding site on DNA
tRNA                    mature transfer RNA

See Appendix II for descriptions of all feature keys.

3.3 Qualifiers

3.3.1 Purpose

Qualifiers provide a general mechanism for supplying information about 
features in addition to that conveyed by the key and location.

3.3.2 Format and conventions

Qualifiers take the form of a slash (/) followed by the qualifier name and, if 
applicable, an equal sign (=) and a value. Each qualifier should have a single 
value; if multiple values are necessary, these should be represented by 
iterating the same qualifier, eg: 
Key             Location/Qualifiers

source          1..1000
                /culture_collection="ATCC:11775"
                /culture_collection="CECT:515"

If the location descriptor does not need a continuation line, the first 
qualifier begins a new line in the feature location column. If the location 
descriptor requires a continuation line, the first qualifier may follow 
immediately after the location. Any necessary continuation lines begin in the 
same column. See Section 4 for a complete description of data item positions.

3.3.3 Qualifier values

Since qualifiers convey many different types of information, there are several value formats: 
1. Free text 
2. Controlled vocabulary or enumerated values 
3. Citation or reference numbers 
4. Sequences

3.3.3.1 Free text

Most qualifier values will be a descriptive text phrase which must be enclosed 
in double quotation marks. When the text occupies more than one line, a single 
set of quotation marks is required at the beginning and at the end of the 
text. The text itself may be composed of any printable characters (ASCII 
values 32-126 decimal). If double quotation marks are used within a free text 
string, each set (") must be 'escaped' by placing a second double quotation 
mark immediately before it (""). For example: 
              /note="This is an example of ""escaped"" quotation marks"

3.3.3.2 Controlled vocabulary or enumerated values

Some qualifiers require values from a controlled vocabulary and are entered 
without quotation marks. For example, the '/direction' qualifier has only 
three values: 'left', 'right' or 'both'. Qualifier value controlled 
vocabularies, like feature table component names, must be treated as 
completely case insensitive: they may be entered and displayed in any 
combination of upper and lower case ('/direction=Left' '/direction=left' and '/
direction=LEFT' are all legal and all convey the same meaning). The database 
staffs reserve the right to regularize the case of qualifier values. Qualifier 
value controlled vocabularies will be maintained by the cooperating database 
staffs. Examples of controlled vocabularies can be found in Appendices IV and 
V. The database staff should be contacted for the current lists.

3.3.3.3 Citation or reference numbers

The citation or published reference number (as enumerated in the entry 
'REFERENCE' or 'RN' data item) should be enclosed in square brackets 
(e.g., [3]) to distinguish it from other numbers.

3.3.3.4 Sequences

Literal sequence of nucleotide bases e.g., join(12..45,"atgcatt",988..1050) in 
location descriptors has become illegal starting from implementation of 
version 2.1 of the Feature Table Definition Document (December 15, 1998)

3.3.4 Qualifier examples

Key             Location/Qualifiers

source          1..1509
                /organism="Mus musculus"
                /strain="CD1"
                /mol_type="genomic DNA"
regulatory      <1..9
                /gene="ubc42"
                /regulatory_class="promoter"
mRNA            join(10..567,789..1320)
                /gene="ubc42"
CDS             join(54..567,789..1254)
                /gene="ubc42"
                /product="ubiquitin conjugating enzyme"
                /function="cell division control"

3.4 Location

3.4.1 Purpose

The location indicates the region of the presented sequence which corresponds 
to a feature.

3.4.2 Format and conventions

The location contains at least one sequence location descriptor and may 
contain one or more operators with one or more sequence location descriptors. 
Base numbers refer to the numbering in the entry. This numbering designates 
the first base (5' end) of the presented sequence as base 1. 
Base locations beyond the range of the presented sequence may not be used in 
location descriptors, the only exception being location in a remote entry (see 
3.4.2.1, e).  

Location operators and descriptors are discussed in more detail below.

3.4.2.1 Location descriptors

The location descriptor can be one of the following: 
(a) a single base number
(b) a site between two indicated adjoining bases
(c) a single base chosen from within a specified range of bases (not allowed for new
    entries)
(d) the base numbers delimiting a sequence span
(e) a remote entry identifier followed by a local location descriptor
    (i.e., a-d)

A site between two adjoining nucleotides, such as endonucleolytic cleavage 
site, is indicated by listing the two points separated by a carat (^). The 
permitted formats for this descriptor are n^n+1 (for example 55^56), or, for 
circular molecules, n^1, where "n" is the full length of the molecule, ie 
1000^1 for circular molecule with length 1000.

A single base chosen from a range of bases is indicated by the first base 
number and the last base number of the range separated by a single period 
(e.g., '12.21' indicates a single base taken from between the indicated 
points). From October 2006 the usage of this descriptor is restricted : 
it is illegal to use "a single base from a range" (c) either on its own or 
in combination with the "sequence span" (d) descriptor for newly created entries. 
The existing entries where such descriptors exist are going to be retrofitted. 

Sequence spans are indicated by the starting base number and the ending base 
number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may 
be used with the starting and ending base numbers to indicate that an end 
point is beyond the specified base number. The starting and ending base 
positions can be represented as distinct base numbers ('34..456') or a site 
between two indicated adjoining bases. 

A location in a remote entry (not the entry to which the feature table 
belongs) can be specified by giving  the accession-number and sequence version 
of the remote entry, followed by a colon ":", followed by a location 
descriptor which applies to that entry's sequence (i.e. J12345.1:1..15, see 
also examples below)

3.4.2.2 Operators

The location operator is a prefix that specifies what must be done to the 
indicated sequence to find or construct the location corresponding to the 
feature. A list of operators is given below with their definitions and most 
common format. 

complement(location) 
Find the complement of the presented sequence in the span specified by "
location" (i.e., read the complement of the presented strand in its 5'-to-3' 
direction) 

join(location,location, ... location) 
The indicated elements should be joined (placed end-to-end) to form one 
contiguous sequence 

order(location,location, ... location) 
The elements can be found in the 
specified order (5' to 3' direction), but nothing is implied about the 
reasonableness about joining them 

Note : location operator "complement" can be used in combination with either "
join" or "order" within the same location; combinations of "join" and "order" 
within the same location (nested operators) are illegal.

3.4.3 Location examples

The following is a list of common location descriptors with their meanings: 

Location                  Description   

467                       Points to a single base in the presented sequence 

340..565                  Points to a continuous range of bases bounded by and
                          including the starting and ending bases

<345..500                 Indicates that the exact lower boundary point of a feature
                          is unknown.  The location begins at some  base previous to
                          the first base specified (which need not be contained in 
                          the presented sequence) and continues to and includes the 
                          ending base 

<1..888                   The feature starts before the first sequenced base and 
                          continues to and includes base 888

1..>888                   The feature starts at the first sequenced base and 
                          continues beyond base 888

102.110                   Indicates that the exact location is unknown but that it is 
                          one of the bases between bases 102 and 110, inclusive

123^124                   Points to a site between bases 123 and 124

join(12..78,134..202)     Regions 12 to 78 and 134 to 202 should be joined to form 
                          one contiguous sequence


complement(34..126)       Start at the base complementary to 126 and finish at the 
                          base complementary to base 34 (the feature is on the strand 
                          complementary to the presented strand)


complement(join(2691..4571,4918..5163))
                          Joins regions 2691 to 4571 and 4918 to 5163, then 
                          complements the joined segments (the feature is on the 
                          strand complementary to the presented strand) 

join(complement(4918..5163),complement(2691..4571))
                          Complements regions 4918 to 5163 and 2691 to 4571, then 
                          joins the complemented segments (the feature is on the 
                          strand complementary to the presented strand)
  
J00194.1:100..202         Points to bases 100 to 202, inclusive, in the entry (in 
                          this database) with primary accession number 'J00194'
 
join(1..100,J00194.1:100..202)
                          Joins region 1..100 of the existing entry with the region
                          100..202 of remote entry J00194

4 Feature table Format

The examples below show the preferred sequence annotations for a number of 
commonly occurring sequence types. These examples may not be appropriate in 
all cases but should be used as a guide whenever possible. This section 
describes the columnar format used to write this feature table in "flat-file" 
form for distributions of the database.

4.1 Format examples

Feature table format example (EMBL): 
FT   source          1..1859
FT                   /db_xref="taxon:3899"
FT                   /organism="Trifolium repens"
FT                   /tissue_type="leaves"
FT                   /clone_lib="lambda gt10"
FT                   /clone="TRE361"
FT                   /mol_type="genomic DNA"
FT   CDS             14..1495
FT                   /db_xref="MENDEL:11000"
FT                   /db_xref="UniProtKB/Swiss-Prot:P26204"
FT                   /note="non-cyanogenic"
FT                   /EC_number="3.2.1.21"
FT                   /product="beta-glucosidase"
FT                   /protein_id="CAA40058.1"
FT                   /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSR.......
---------+---------+---------+---------+---------+---------+---------+---------
1       10        20        30        40        50        60        70       79

Feature table format example (GenBank):

     source          1..8959
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /mol_type="genomic DNA"
     gene            212..8668
                     /gene="NF1"
     CDS             212..8668
                     /gene="NF1"
                     /note="putative"
                     /codon_start=1
                     /product="GAP-related protein"
                     /protein_id="AAA59924.1"
                     /translation="MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTE.......
---------+---------+---------+---------+---------+---------+---------+---------
1       10        20        30        40        50        60        70       79

Feature table format example (DDBJ):

 
     source          1..2136
                     /clone="pK28"
                     /organism="Rattus norvegicus"
                     /strain="Sprague-Dawley"
                     /tissue_type="kidney"
                     /mol_type="genomic DNA" 
     mRNA            19..2128
     CDS             31..1212
                     /codon_start=1
                     /function="Dual specificity protein tyrosine/threonine
                     kinase"
                     /product="MAP kinase kinase"
                     /protein_id="BAA02603.1"
                     /translation="MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKL.......
---------+---------+---------+---------+---------+---------+---------+---------
1       10        20        30        40        50        60        70       79

4.2 Definition of line types

The feature table consists of a header line, which contains the column titles 
for the table, and the individual feature entries. Each feature entry is 
composed of a feature descriptor line and qualifier and continuation lines, 
if needed. The feature descriptor line contains the feature's name, key, and 
location. If the location cannot be contained on the first line of the feature 
descriptor, it is continued on a continuation line immediately following the 
descriptor line. If the feature requires further attributes, feature qualifier 
lines immediately follow the corresponding feature descriptor line (or its 
continuation). Qualifier information that cannot be contained on one line 
continues on the following continuation lines as necessary.
 
Thus, there are 4 types of feature table lines: 
      Line type            Content                 #/entry     #/feature
      ---------            -------                 -------     ---------

      Header               Column titles           1*          N/A
      Feature descriptor   Key and location        1 to many*  1
      Feature qualifiers   Qualifiers and values   N/A         0 to many
      Continuation lines   Feature descriptor or   0 to many   0 to many
                           qualifier continuation

4.3 Data item positions

The position of the data items within the feature descriptor line is as follows: 
     column position    data item
     ---------------    ---------

     1-5                blank 
     6-20               feature key
     21                 blank
     22-80              location

Data on the qualifier and continuation lines begins in column position 22 (the 
first 21 columns contain blanks). The EMBL format for all lines differs from 
the GenBank / DDBJ formats  that it includes a line type abbreviation in 
columns 1 and 2.

4.4 Use of blanks

Blanks (spaces) may, in general, be used within the feature location and 
qualifier values to make the construction more readable. The following rules 
should be observed: 
* Names of feature table components may not contain blanks (see Section 3.1) 
* Operator names may not be separated from the following open parenthesis (the 
  beginning of the operand list) by blanks. 
* Qualifiers may not be separated from the preceding slash or the following 
  equals sign (if one) by blanks

5 Examples of sequence annotation

The examples below show the preferred sequence annotations for a number of 
commonly occurring sequence types. These examples may not be appropriate in 
all cases but should be used as a guide whenever possible.

5.1 Eukaryotic gene

source          1..1509
                /organism="Mus musculus"
                /strain="CD1"
                /mol_type="genomic DNA"
regulatory      <1..9
                /gene="ubc42"
                /regulatory_class="promoter"
mRNA            join(10..567,789..1320)
                /gene="ubc42"
CDS             join(54..567,789..1254)
                /gene="ubc42"
                /product="ubiquitin conjugating enzyme"
                /function="cell division control"
                /translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY
                QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS
                AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"
exon            10..567
                /gene="ubc42"
                /number=1
intron          568..788
                /gene="ubc42"
                /number=1
exon            789..1320
                /gene="ubc42"
                /number=2
regulatory      1310..1317
                /regulatory_class="polyA_signal_sequence"
                /gene="ubc42"

5.2 Bacterial operon

source          1..9430
                /organism="Lactococcus sp."
                /strain="MG1234"
                /mol_type="genomic DNA"
operon          160..6865
                /operon="gal"
regulatory      160..165
                /operon="gal"
	        /regulatory_class="minus_35_signal"
regulatory      179..184
                /operon="gal" 
        	/regulatory_class="minus_10_signal"
CDS             405..1934
                /operon="gal"
                /gene="galA"
                /product="galactose permease"
                /function="galactose transporter"
CDS             2003..3001
                /operon="gal"
                /gene="galM"
                /product="aldose 1-epimerase"
                /EC_number="5.1.3.3"
                /function="mutarotase"
CDS             3235..4537
                /operon="gal"
                /gene="galK"
                /product="galactokinase"
                /EC_number="2.7.1.6"
mRNA            189..6865
                /operon="gal"

5.3 Artificial cloning vector (circular)

source          1..5300
                /organism="Cloning vector pABC"
                /lab_host="Escherichia coli"
                /mol_type="other DNA"
                /focus
source          1..5138
                /organism="Escherichia coli"
                /mol_type="other DNA"
                /strain="K12"
source          5139..5247
                /organism="Aequorea victoria"
                /mol_type="other DNA"
                /dev_stage="adult"
source          5248..5300
                /organism="Escherichia coli"
                /mol_type="other DNA"
                /strain="K12"
CDS             join(complement(1..799),complement(5080..5120))
                /gene="mob1"
                /product="mobilization protein 1"
CDS             complement(1697..2512)
                /gene="Km"
                /product="kanamycin resistance protein"
CDS             3037..3711
                /gene="rep1"
                /product="replication protein 1"
CDS             complement(4170..4829)
                /gene="Cm"
                /product="chloramphenicol resistance protein"
CDS             5139..5247
                /gene="GFP"
                /product="green fluorescent protein"

5.4 Plasmid

source          1..2245
                /organism="Escherichia coli"
                /plasmid="Plasmid XYZ"
                /strain="K12"
                /mol_type="genomic DNA"
rep_origin      6
                /direction=LEFT
                /note="ori"
CDS             join(complement(567..795),complement(21..349))
                /gene="trbC"
                /product="transfer protein C"
CDS             803..1344
                /gene="traN"
                /product="transfer protein N"
CDS             1559..1985
                /gene="incA"
                /product="incompatability protein A"
CDS             join(2004..2195,3..20)
                /gene="finP"
                /product="fertility inhibition protein P"

5.5 Repeat element

source          1..1011
                /organism="Homo sapiens"
                /clone="pha281u/1DO"
                /mol_type="genomic DNA"
repeat_region   80..401
                /rpt_type=DISPERSED
                /rpt_family="Alu-J"

5.6 Immunoglobulin heavy chain

source          1..321
                /organism="Mus musculus "
                /strain="BALB/c2
                /cell_line="hybridoma 1A4"
                /rearranged
                /mol_type="mRNA"
CDS             <1..>321
                /codon_start=1
                /gene="VFM1-DFL16.1-JH4"
                /product="immunoglobulin heavy chain"
V_region        1..277
                /gene="VFM1"
                /product="immunoglobulin heavy chain variable region"

5.7 T-cell receptor

source          1..402
                /organism="Homo sapiens"
                /sex="male"
                /cell_type="CD4+ T-lymphocyte"
                /rearranged
                /clone="TCR1A.12"
                /mol_type="mRNA"
sig_peptide     1..54
                /gene="TCR1A"
CDS             1..402
                /gene="TCR1A"
                /product="T-cell receptor alpha chain"
mat_peptide     55..399
                /gene="TCR1A"
                /product="T-cell receptor alpha chain"
V_region        55..327
                /gene="TCR1A"
J_segment       328..393
                /gene="TCR1A"
C_region        394..399
                /gene="TCR1A"

5.8 Transfer RNA

source          1..2345
                /organism="Yersinia sp."
                /strain="IP134"
                /mol_type="genomic DNA"
regulatory      644..650
                /gene="tRNA-Leu(UUR)"
	        /regulatory_class="minus_35_signal"
tRNA            655..730
                /gene="tRNA-Leu(UUR)"
                /anticodon=(pos:678..680,aa:Leu,seq:taa)
                /product="transfer RNA-Leu(UUR)"

6 Limitations of this feature table design

During the development of the feature table design numerous choices between 
simplicity and representational power had to be made. In order to create a 
design which was capable of representing the most common features of 
biological significance, a certain degree of complexity in the syntax was 
guaranteed. However, to limit that level of complexity, certain limitations of 
the design syntax have been accepted.

7 Appendices

7.1 Appendix I EMBL, GenBank and DDBJ entries

7.1.1 EMBL Format

ID   X64011; SV 1; linear; genomic DNA; STD; PRO; 756 BP.
XX   
AC   X64011; S78972;
XX
SV   X64011.1
XX
DT   28-APR-1992 (Rel. 31, Created)
DT   30-JUN-1993 (Rel. 36, Last updated, Version 6)
XX
DE   Listeria ivanovii sod gene for superoxide dismutase
XX
KW   sod gene; superoxide dismutase.
XX
OS   Listeria ivanovii
OC   Bacteria; Firmicutes; Bacillus/Clostridium group;
OC   Bacillus/Staphylococcus group; Listeria.
XX
RN   [1]
RX   MEDLINE; 92140371.
RA   Haas A., Goebel W.;
RT   "Cloning of a superoxide dismutase gene from Listeria ivanovii by
RT   functional complementation in Escherichia coli and characterization of the
RT   gene product.";
RL   Mol. Gen. Genet. 231:313-322(1992).
XX
RN   [2]
RP   1-756
RA   Kreft J.;
RT   ;
RL   Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases.
RL   J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am
RL   Hubland, 8700 Wuerzburg, FRG
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..756
FT                   /db_xref="taxon:1638"
FT                   /organism="Listeria ivanovii"
FT                   /strain="ATCC 19119"
FT                   /mol_type="genomic DNA"
FT   regulatory      95..100
FT                   /gene="sod"
FT                   /regulatory_class="ribosome_binding_site"
FT   regulatory      723..746
FT                   /gene="sod"
FT                   /regulatory_class="terminator"
FT   CDS             109..717
FT                   /transl_table=11
FT                   /gene="sod"
FT                   /EC_number="1.15.1.1"
FT                   /db_xref="GOA:P28763"
FT                   /db_xref="HSSP:P00448"
FT                   /db_xref="InterPro:IPR001189"
FT                   /db_xref="UniProtKB/Swiss-Prot:P28763"
FT                   /product="superoxide dismutase"
FT                   /protein_id="CAA45406.1"
FT                   /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG
FT                   HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA
FT                   IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL
FT                   DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
XX
SQ   Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other;
     cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat   60
     gtaatttctt ..........                                               120
//

7.1.2 GenBank Format

LOCUS       LISOD                    756 bp    DNA     linear   BCT 30-JUN-1993
DEFINITION  Listeria ivanovii sod gene for superoxide dismutase.
ACCESSION   X64011 S78972
VERSION     X64011.1  GI:44010
KEYWORDS    sod gene; superoxide dismutase.
SOURCE      Listeria ivanovii
  ORGANISM  Listeria ivanovii
            Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria. 
REFERENCE   1  (bases 1 to 756)
  AUTHORS   Haas,A. and Goebel,W.
  TITLE     Cloning of a superoxide dismutase gene from Listeria ivanovii by
            functional complementation in Escherichia coli and characterization
            of the gene product
  JOURNAL   Mol. Gen. Genet. 231 (2), 313-322 (1992)
  MEDLINE   92140371
REFERENCE   2  (bases 1 to 756)
  AUTHORS   Kreft,J.
  TITLE     Direct Submission
  JOURNAL   Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie,
            Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
FEATURES             Location/Qualifiers
     source          1..756
                     /organism="Listeria ivanovii"
                     /strain="ATCC 19119"
                     /db_xref="taxon:1638"
                     /mol_type="genomic DNA"
     regulatory      95..100
                     /gene="sod"
                     /regulatory_class="ribosome_binding_site"
     gene            95..746
                     /gene="sod"
     CDS             109..717
                     /gene="sod"
                     /EC_number="1.15.1.1"
                     /codon_start=1
                     /transl_table=11
                     /product="superoxide dismutase" 
                     /db_xref="GI:44011"
                     /db_xref="GOA:P28763"
                     /db_xref="InterPro:IPR001189"
                     /db_xref="UniProtKB/Swiss-Prot:P28763"
                     /protein_id="CAA45406.1"
                     /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS
                     GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK
                     AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV
                     LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
     regulatory      723..746
                     /gene="sod"
                     /regulatory_class="terminator"
ORIGIN      
        1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat
       61 gtaa

The DDBJ/ENA/GenBank Feature Table Definition

1 Introduction

2 Overview of the Feature Table format

2.1 Format Design

2.2 Key aspects of this feature table design

2.3 Feature Table Terminology

3 Feature table components and format

3.1 Naming conventions

3.2 Feature keys

3.2.1 Purpose

3.2.2 Format and conventions

3.2.3 Key groups and hierarchy

3.2.4 Feature key examples

3.3 Qualifiers

3.3.1 Purpose

3.3.2 Format and conventions

3.3.3 Qualifier values

3.3.3.1 Free text

3.3.3.2 Controlled vocabulary or enumerated values

3.3.3.3 Citation or reference numbers

3.3.3.4 Sequences

3.3.4 Qualifier examples

3.4 Location

3.4.1 Purpose

3.4.2 Format and conventions

3.4.2.1 Location descriptors

3.4.2.2 Operators

3.4.3 Location examples

4 Feature table Format

4.1 Format examples

4.2 Definition of line types

4.3 Data item positions

4.4 Use of blanks

5 Examples of sequence annotation

5.1 Eukaryotic gene

5.2 Bacterial operon

5.3 Artificial cloning vector (circular)

5.4 Plasmid

5.5 Repeat element

5.6 Immunoglobulin heavy chain

5.7 T-cell receptor

5.8 Transfer RNA

6 Limitations of this feature table design

7 Appendices

7.1 Appendix I EMBL, GenBank and DDBJ entries

7.1.1 EMBL Format

7.1.2 GenBank Format