The DDBJ/ENA/GenBank
Feature Table:
Definition
Version 10.5 November 2015
DNA Data Bank of Japan, Mishima, Japan.
EMBL-EBI, European Nucleotide Archive, Cambridge, UK.
GenBank, NCBI, Bethesda, MD, USA.
1 Introduction
2 Overview of the Feature Table format
2.1 Format Design
2.2 Key aspects of this feature table design
2.3 Feature Table Terminology
3 Feature table components and format
3.1 Naming conventions
3.2 Feature keys
3.2.1 Purpose
3.2.2 Format and conventions
3.2.3 Key groups and hierarchy
3.2.4 Feature key examples
3.3 Qualifiers
3.3.1 Purpose
3.3.2 Format and conventions
3.3.3 Qualifier values
3.3.4 Qualifier examples
3.4 Location
3.4.1 Purpose
3.4.2 Format and conventions
3.4.3 Location examples
4 Feature table Format
4.1 Format examples
4.2 Definition of line types
4.3 Data item positions
4.4 Use of blanks
5 Examples of sequence annotation
5.1 Eukaryotic gene
5.2 Bacterial operon
5.3 Artificial cloning vector (circular)
5.4 Plasmid
5.5 Repeat element
5.6 Immunoglobulin heavy chain
5.7 T-cell receptor
5.8 Transfer RNA
6 Limitations of this feature table design
7 Appendices
7.1 Appendix I EMBL, GenBank and DDBJ entries
7.1.1 EMBL Format
7.1.2 GenBank Format
7.1.3 DDBJ Format
7.2 Appendix II: Feature keys reference
7.3 Appendix III: Summary of qualifiers for feature keys
7.3.1 Qualifier List
7.4 Appendix IV: Controlled vocabularies
7.4.1 Nucleotide base codes (IUPAC)
7.4.2 Modified base abbreviations
7.4.3 Amino acid abbreviations
7.4.4 Modified and unusual Amino Acids
7.4.5 Genetic Code Tables
7.4.6 Country Names
7.4.7 Announces
1 Introduction
Nucleic acid sequences provide the fundamental starting point for describing
and understanding the structure, function, and development of genetically
diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data
banks have from their inception used tables of sites and features to describe
the roles and locations of higher order sequence domains and elements within
the genome of an organism.
In February, 1986, GenBank and EMBL began a collaborative effort (joined by
DDBJ in 1987) to devise a common feature table format and common standards for
annotation practice.
2 Overview of the Feature Table format
The overall goal of the feature table design is to provide an extensive
vocabulary for describing features in a flexible framework for manipulating
them. The Feature Table documentation represents the shared rules that allow
the three databases to exchange data on a daily basis.
The range of features to be represented is diverse, including regions which:
* perform a biological function,
* affect or are the result of the expression of a biological function,
* interact with other molecules,
* affect replication of a sequence,
* affect or are the result of recombination of different sequences,
* are a recognizable repeated unit,
* have secondary or tertiary structure,
* exhibit variation, or have been revised or corrected.
2.1 Format Design
The format design is based on a tabular approach and consists of the following
items:
Feature key - a single word or abbreviation indicating functional group
Location - instructions for finding the feature
Qualifiers - auxiliary information about a feature
2.2 Key aspects of this feature table design
* Feature keys allow specific annotation of important sequence features.
* Related features can be easily specified and retrieved.
Feature keys are arranged hierarchically, allowing complex and compound
features to be expressed. Both location operators and the feature keys show
feature relationships even when the features are not contiguous. The hierarchy
of feature keys allows broad categories of biological functionality, such as
rRNAs, to be easily retrieved.
* Generic feature keys provide a means for entering new or undefined features.
A number of "generic" or miscellaneous feature keys have been added to permit
annotation of features that cannot be adequately described by existing feature
keys. These generic feature keys will serve as an intermediate step in the
identification and addition of new feature keys. The syntax has been designed
to allow the addition of new feature keys as they are required.
* More complex locations (fuzzy and alternate ends, for example) can be specified.
Each end point of a feature may be specified as a single point, an alternate
set of possible end points, a base number beyond which the end point lies, or
a region which contains the end point.
* Features can be combined and manipulated in many different ways.
The location field can contain operators or functional descriptors specifying
what must be done to the sequence to reproduce the feature. For example, a
series of exons may be "join"ed into a full coding sequence.
* Standardized qualifiers provide precision and parsibility of descriptive details
A combination of standardized qualifiers and their controlled-vocabulary
values enable free-text descriptions to be avoided.
* The nature of supporting evidence for a feature can be explicitly indicated.
Features, such as open reading frames or sequences showing sequence similarity
to consensus sequences, for which there is no direct experimental evidence can
be annotated. Therefore, the feature table can incorporate contributions from
researchers doing computational analysis of the sequence databases. However,
all features that are supported by experimental data will be clearly marked as
such.
* The table syntax has been designed to be machine parsible.
A consistent syntax allows machine extraction and manipulation of sequences
coding for all features in the table.
2.3 Feature Table Terminology
The format and wording in the feature table use common biological research
terminology whenever possible. For example, an item in the feature table such as:
Key Location/Qualifiers
CDS 23..400
/product="alcohol dehydrogenase"
/gene="adhI"
might be read as:
The feature CDS is a coding sequence beginning at base 23 and ending at base
400, has a product called 'alcohol dehydrogenase' and is coded for by a gene
called "adhI".
A more complex description:
Key Location/Qualifiers
CDS join(544..589,688..>1032)
/product="T-cell receptor beta-chain"
which might be read as:
This feature, which is a partial coding sequence, is formed by joining
elements indicated to form one contiguous sequence encoding a product called T-
cell receptor beta-chain.
The following sections contain detailed explanations of the feature table
design showing conventions for each component of the feature table, examples
of how the format might be implemented, a description of the exact column
placement of all the data items and examples of complete sequence entries that
have been annotated using the new format. The last section of this document
describes known limitations of the current feature table design.
Appendix I gives an example database entry for the DDBJ, GenBank and EMBL
formats.
Appendices II and III provide reference manuals for the feature table keys and
qualifiers, respectively.
Appendix IV includes controlled vocabularies such as nucleotide base codes,
modified base abbreviations, genetic code tables etc.
This document defines the syntax and vocabulary of the feature table. The
syntax is sufficiently flexible to allow expression of a single biological
entity in numerous ways. In such cases, the annotation staffs at the databases
will propose conventions for standard means of denoting the entities.
This feature table format is shared by GenBank, EMBL and DDBJ. Comments,
corrections, and suggestions may be submitted to any of the database staffs.
New format specifications will be added as needed.
3 Feature table components and format
3.1 Naming conventions
Feature table components, including feature keys, qualifiers, accession
numbers, database name abbreviations, and location operators, are all named
following the same conventions. Component names may be no more than 20
characters long (Feature keys 15, Feature qualifiers 20) and must
contain at least one letter. The following characters are permitted to
occur in feature table component names:
* Uppercase letters (A-Z)
* Lowercase letters (a-z) Numbers (0-9)
* Underscore (_)
* Hyphen (-)
* Single quotation mark or apostrophe (')
* Asterisk (*)
3.2 Feature keys
3.2.1 Purpose
Feature keys indicate
(1) the biological nature of the annotated feature or
(2) information about changes to or other versions of the sequence.
The feature key permits a user to quickly find or retrieve similar features or
features with related functions.
3.2.2 Format and conventions
There is a defined list of allowable feature keys, which is shown in Appendix
II. Each feature must contain a feature key.
3.2.3 Key groups and hierarchy
The feature keys fall into families which are in some sense similar in
function and which are annotated in a similar manner. A functional family may
have a "generic" or miscellaneous key, which can be recognized by the 'misc.'
prefix, that can used for instances not covered by the other defined keys of
that group.
The feature key groups are listed below with a short definition and an
annotation example:
1. Difference and change features
Indicate ways in which a sequence should be changed to produce a different
"version":
misc_difference location
/replace="change_location"
2. Transcript features
Indicate products made by a region:
misc_RNA location
3. Binding features
Indicate that a sequence or nucleotide is covalently, non-covalently, or
otherwise bound to something else:
misc_binding location
/bound_moiety="bound molecule"
4. Repeat features
Indicate repetitive sequence elements:
repeat_region location
5. Recombination features
Indicate regions that have been either inserted or deleted by recombination:
misc_recomb location
6. Structure features
Indicate sequence for which there is secondary or tertiary structural
information:
misc_structure location
3.2.4 Feature key examples
Key Description
CDS Protein-coding sequence
rep_origin Origin of replication
protein_bind Protein binding site on DNA
tRNA mature transfer RNA
See Appendix II for descriptions of all feature keys.
3.3 Qualifiers
3.3.1 Purpose
Qualifiers provide a general mechanism for supplying information about
features in addition to that conveyed by the key and location.
3.3.2 Format and conventions
Qualifiers take the form of a slash (/) followed by the qualifier name and, if
applicable, an equal sign (=) and a value. Each qualifier should have a single
value; if multiple values are necessary, these should be represented by
iterating the same qualifier, eg:
Key Location/Qualifiers
source 1..1000
/culture_collection="ATCC:11775"
/culture_collection="CECT:515"
If the location descriptor does not need a continuation line, the first
qualifier begins a new line in the feature location column. If the location
descriptor requires a continuation line, the first qualifier may follow
immediately after the location. Any necessary continuation lines begin in the
same column. See Section 4 for a complete description of data item positions.
3.3.3 Qualifier values
Since qualifiers convey many different types of information, there are several value formats:
1. Free text
2. Controlled vocabulary or enumerated values
3. Citation or reference numbers
4. Sequences
3.3.3.1 Free text
Most qualifier values will be a descriptive text phrase which must be enclosed
in double quotation marks. When the text occupies more than one line, a single
set of quotation marks is required at the beginning and at the end of the
text. The text itself may be composed of any printable characters (ASCII
values 32-126 decimal). If double quotation marks are used within a free text
string, each set (") must be 'escaped' by placing a second double quotation
mark immediately before it (""). For example:
/note="This is an example of ""escaped"" quotation marks"
3.3.3.2 Controlled vocabulary or enumerated values
Some qualifiers require values from a controlled vocabulary and are entered
without quotation marks. For example, the '/direction' qualifier has only
three values: 'left', 'right' or 'both'. Qualifier value controlled
vocabularies, like feature table component names, must be treated as
completely case insensitive: they may be entered and displayed in any
combination of upper and lower case ('/direction=Left' '/direction=left' and '/
direction=LEFT' are all legal and all convey the same meaning). The database
staffs reserve the right to regularize the case of qualifier values. Qualifier
value controlled vocabularies will be maintained by the cooperating database
staffs. Examples of controlled vocabularies can be found in Appendices IV and
V. The database staff should be contacted for the current lists.
3.3.3.3 Citation or reference numbers
The citation or published reference number (as enumerated in the entry
'REFERENCE' or 'RN' data item) should be enclosed in square brackets
(e.g., [3]) to distinguish it from other numbers.
3.3.3.4 Sequences
Literal sequence of nucleotide bases e.g., join(12..45,"atgcatt",988..1050) in
location descriptors has become illegal starting from implementation of
version 2.1 of the Feature Table Definition Document (December 15, 1998)
3.3.4 Qualifier examples
Key Location/Qualifiers
source 1..1509
/organism="Mus musculus"
/strain="CD1"
/mol_type="genomic DNA"
regulatory <1..9
/gene="ubc42"
/regulatory_class="promoter"
mRNA join(10..567,789..1320)
/gene="ubc42"
CDS join(54..567,789..1254)
/gene="ubc42"
/product="ubiquitin conjugating enzyme"
/function="cell division control"
3.4 Location
3.4.1 Purpose
The location indicates the region of the presented sequence which corresponds
to a feature.
3.4.2 Format and conventions
The location contains at least one sequence location descriptor and may
contain one or more operators with one or more sequence location descriptors.
Base numbers refer to the numbering in the entry. This numbering designates
the first base (5' end) of the presented sequence as base 1.
Base locations beyond the range of the presented sequence may not be used in
location descriptors, the only exception being location in a remote entry (see
3.4.2.1, e).
Location operators and descriptors are discussed in more detail below.
3.4.2.1 Location descriptors
The location descriptor can be one of the following:
(a) a single base number
(b) a site between two indicated adjoining bases
(c) a single base chosen from within a specified range of bases (not allowed for new
entries)
(d) the base numbers delimiting a sequence span
(e) a remote entry identifier followed by a local location descriptor
(i.e., a-d)
A site between two adjoining nucleotides, such as endonucleolytic cleavage
site, is indicated by listing the two points separated by a carat (^). The
permitted formats for this descriptor are n^n+1 (for example 55^56), or, for
circular molecules, n^1, where "n" is the full length of the molecule, ie
1000^1 for circular molecule with length 1000.
A single base chosen from a range of bases is indicated by the first base
number and the last base number of the range separated by a single period
(e.g., '12.21' indicates a single base taken from between the indicated
points). From October 2006 the usage of this descriptor is restricted :
it is illegal to use "a single base from a range" (c) either on its own or
in combination with the "sequence span" (d) descriptor for newly created entries.
The existing entries where such descriptors exist are going to be retrofitted.
Sequence spans are indicated by the starting base number and the ending base
number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may
be used with the starting and ending base numbers to indicate that an end
point is beyond the specified base number. The starting and ending base
positions can be represented as distinct base numbers ('34..456') or a site
between two indicated adjoining bases.
A location in a remote entry (not the entry to which the feature table
belongs) can be specified by giving the accession-number and sequence version
of the remote entry, followed by a colon ":", followed by a location
descriptor which applies to that entry's sequence (i.e. J12345.1:1..15, see
also examples below)
3.4.2.2 Operators
The location operator is a prefix that specifies what must be done to the
indicated sequence to find or construct the location corresponding to the
feature. A list of operators is given below with their definitions and most
common format.
complement(location)
Find the complement of the presented sequence in the span specified by "
location" (i.e., read the complement of the presented strand in its 5'-to-3'
direction)
join(location,location, ... location)
The indicated elements should be joined (placed end-to-end) to form one
contiguous sequence
order(location,location, ... location)
The elements can be found in the
specified order (5' to 3' direction), but nothing is implied about the
reasonableness about joining them
Note : location operator "complement" can be used in combination with either "
join" or "order" within the same location; combinations of "join" and "order"
within the same location (nested operators) are illegal.
3.4.3 Location examples
The following is a list of common location descriptors with their meanings:
Location Description
467 Points to a single base in the presented sequence
340..565 Points to a continuous range of bases bounded by and
including the starting and ending bases
<345..500 Indicates that the exact lower boundary point of a feature
is unknown. The location begins at some base previous to
the first base specified (which need not be contained in
the presented sequence) and continues to and includes the
ending base
<1..888 The feature starts before the first sequenced base and
continues to and includes base 888
1..>888 The feature starts at the first sequenced base and
continues beyond base 888
102.110 Indicates that the exact location is unknown but that it is
one of the bases between bases 102 and 110, inclusive
123^124 Points to a site between bases 123 and 124
join(12..78,134..202) Regions 12 to 78 and 134 to 202 should be joined to form
one contiguous sequence
complement(34..126) Start at the base complementary to 126 and finish at the
base complementary to base 34 (the feature is on the strand
complementary to the presented strand)
complement(join(2691..4571,4918..5163))
Joins regions 2691 to 4571 and 4918 to 5163, then
complements the joined segments (the feature is on the
strand complementary to the presented strand)
join(complement(4918..5163),complement(2691..4571))
Complements regions 4918 to 5163 and 2691 to 4571, then
joins the complemented segments (the feature is on the
strand complementary to the presented strand)
J00194.1:100..202 Points to bases 100 to 202, inclusive, in the entry (in
this database) with primary accession number 'J00194'
join(1..100,J00194.1:100..202)
Joins region 1..100 of the existing entry with the region
100..202 of remote entry J00194
4 Feature table Format
The examples below show the preferred sequence annotations for a number of
commonly occurring sequence types. These examples may not be appropriate in
all cases but should be used as a guide whenever possible. This section
describes the columnar format used to write this feature table in "flat-file"
form for distributions of the database.
4.1 Format examples
Feature table format example (EMBL):
FT source 1..1859
FT /db_xref="taxon:3899"
FT /organism="Trifolium repens"
FT /tissue_type="leaves"
FT /clone_lib="lambda gt10"
FT /clone="TRE361"
FT /mol_type="genomic DNA"
FT CDS 14..1495
FT /db_xref="MENDEL:11000"
FT /db_xref="UniProtKB/Swiss-Prot:P26204"
FT /note="non-cyanogenic"
FT /EC_number="3.2.1.21"
FT /product="beta-glucosidase"
FT /protein_id="CAA40058.1"
FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSR.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
Feature table format example (GenBank):
source 1..8959
/organism="Homo sapiens"
/db_xref="taxon:9606"
/mol_type="genomic DNA"
gene 212..8668
/gene="NF1"
CDS 212..8668
/gene="NF1"
/note="putative"
/codon_start=1
/product="GAP-related protein"
/protein_id="AAA59924.1"
/translation="MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTE.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
Feature table format example (DDBJ):
source 1..2136
/clone="pK28"
/organism="Rattus norvegicus"
/strain="Sprague-Dawley"
/tissue_type="kidney"
/mol_type="genomic DNA"
mRNA 19..2128
CDS 31..1212
/codon_start=1
/function="Dual specificity protein tyrosine/threonine
kinase"
/product="MAP kinase kinase"
/protein_id="BAA02603.1"
/translation="MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKL.......
---------+---------+---------+---------+---------+---------+---------+---------
1 10 20 30 40 50 60 70 79
4.2 Definition of line types
The feature table consists of a header line, which contains the column titles
for the table, and the individual feature entries. Each feature entry is
composed of a feature descriptor line and qualifier and continuation lines,
if needed. The feature descriptor line contains the feature's name, key, and
location. If the location cannot be contained on the first line of the feature
descriptor, it is continued on a continuation line immediately following the
descriptor line. If the feature requires further attributes, feature qualifier
lines immediately follow the corresponding feature descriptor line (or its
continuation). Qualifier information that cannot be contained on one line
continues on the following continuation lines as necessary.
Thus, there are 4 types of feature table lines:
Line type Content #/entry #/feature
--------- ------- ------- ---------
Header Column titles 1* N/A
Feature descriptor Key and location 1 to many* 1
Feature qualifiers Qualifiers and values N/A 0 to many
Continuation lines Feature descriptor or 0 to many 0 to many
qualifier continuation
4.3 Data item positions
The position of the data items within the feature descriptor line is as follows:
column position data item
--------------- ---------
1-5 blank
6-20 feature key
21 blank
22-80 location
Data on the qualifier and continuation lines begins in column position 22 (the
first 21 columns contain blanks). The EMBL format for all lines differs from
the GenBank / DDBJ formats that it includes a line type abbreviation in
columns 1 and 2.
4.4 Use of blanks
Blanks (spaces) may, in general, be used within the feature location and
qualifier values to make the construction more readable. The following rules
should be observed:
* Names of feature table components may not contain blanks (see Section 3.1)
* Operator names may not be separated from the following open parenthesis (the
beginning of the operand list) by blanks.
* Qualifiers may not be separated from the preceding slash or the following
equals sign (if one) by blanks
5 Examples of sequence annotation
The examples below show the preferred sequence annotations for a number of
commonly occurring sequence types. These examples may not be appropriate in
all cases but should be used as a guide whenever possible.
5.1 Eukaryotic gene
source 1..1509
/organism="Mus musculus"
/strain="CD1"
/mol_type="genomic DNA"
regulatory <1..9
/gene="ubc42"
/regulatory_class="promoter"
mRNA join(10..567,789..1320)
/gene="ubc42"
CDS join(54..567,789..1254)
/gene="ubc42"
/product="ubiquitin conjugating enzyme"
/function="cell division control"
/translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY
QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS
AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"
exon 10..567
/gene="ubc42"
/number=1
intron 568..788
/gene="ubc42"
/number=1
exon 789..1320
/gene="ubc42"
/number=2
regulatory 1310..1317
/regulatory_class="polyA_signal_sequence"
/gene="ubc42"
5.2 Bacterial operon
source 1..9430
/organism="Lactococcus sp."
/strain="MG1234"
/mol_type="genomic DNA"
operon 160..6865
/operon="gal"
regulatory 160..165
/operon="gal"
/regulatory_class="minus_35_signal"
regulatory 179..184
/operon="gal"
/regulatory_class="minus_10_signal"
CDS 405..1934
/operon="gal"
/gene="galA"
/product="galactose permease"
/function="galactose transporter"
CDS 2003..3001
/operon="gal"
/gene="galM"
/product="aldose 1-epimerase"
/EC_number="5.1.3.3"
/function="mutarotase"
CDS 3235..4537
/operon="gal"
/gene="galK"
/product="galactokinase"
/EC_number="2.7.1.6"
mRNA 189..6865
/operon="gal"
5.3 Artificial cloning vector (circular)
source 1..5300
/organism="Cloning vector pABC"
/lab_host="Escherichia coli"
/mol_type="other DNA"
/focus
source 1..5138
/organism="Escherichia coli"
/mol_type="other DNA"
/strain="K12"
source 5139..5247
/organism="Aequorea victoria"
/mol_type="other DNA"
/dev_stage="adult"
source 5248..5300
/organism="Escherichia coli"
/mol_type="other DNA"
/strain="K12"
CDS join(complement(1..799),complement(5080..5120))
/gene="mob1"
/product="mobilization protein 1"
CDS complement(1697..2512)
/gene="Km"
/product="kanamycin resistance protein"
CDS 3037..3711
/gene="rep1"
/product="replication protein 1"
CDS complement(4170..4829)
/gene="Cm"
/product="chloramphenicol resistance protein"
CDS 5139..5247
/gene="GFP"
/product="green fluorescent protein"
5.4 Plasmid
source 1..2245
/organism="Escherichia coli"
/plasmid="Plasmid XYZ"
/strain="K12"
/mol_type="genomic DNA"
rep_origin 6
/direction=LEFT
/note="ori"
CDS join(complement(567..795),complement(21..349))
/gene="trbC"
/product="transfer protein C"
CDS 803..1344
/gene="traN"
/product="transfer protein N"
CDS 1559..1985
/gene="incA"
/product="incompatability protein A"
CDS join(2004..2195,3..20)
/gene="finP"
/product="fertility inhibition protein P"
5.5 Repeat element
source 1..1011
/organism="Homo sapiens"
/clone="pha281u/1DO"
/mol_type="genomic DNA"
repeat_region 80..401
/rpt_type=DISPERSED
/rpt_family="Alu-J"
5.6 Immunoglobulin heavy chain
source 1..321
/organism="Mus musculus "
/strain="BALB/c2
/cell_line="hybridoma 1A4"
/rearranged
/mol_type="mRNA"
CDS <1..>321
/codon_start=1
/gene="VFM1-DFL16.1-JH4"
/product="immunoglobulin heavy chain"
V_region 1..277
/gene="VFM1"
/product="immunoglobulin heavy chain variable region"
5.7 T-cell receptor
source 1..402
/organism="Homo sapiens"
/sex="male"
/cell_type="CD4+ T-lymphocyte"
/rearranged
/clone="TCR1A.12"
/mol_type="mRNA"
sig_peptide 1..54
/gene="TCR1A"
CDS 1..402
/gene="TCR1A"
/product="T-cell receptor alpha chain"
mat_peptide 55..399
/gene="TCR1A"
/product="T-cell receptor alpha chain"
V_region 55..327
/gene="TCR1A"
J_segment 328..393
/gene="TCR1A"
C_region 394..399
/gene="TCR1A"
5.8 Transfer RNA
source 1..2345
/organism="Yersinia sp."
/strain="IP134"
/mol_type="genomic DNA"
regulatory 644..650
/gene="tRNA-Leu(UUR)"
/regulatory_class="minus_35_signal"
tRNA 655..730
/gene="tRNA-Leu(UUR)"
/anticodon=(pos:678..680,aa:Leu,seq:taa)
/product="transfer RNA-Leu(UUR)"
6 Limitations of this feature table design
During the development of the feature table design numerous choices between
simplicity and representational power had to be made. In order to create a
design which was capable of representing the most common features of
biological significance, a certain degree of complexity in the syntax was
guaranteed. However, to limit that level of complexity, certain limitations of
the design syntax have been accepted.
7 Appendices
7.1 Appendix I EMBL, GenBank and DDBJ entries
7.1.1 EMBL Format
ID X64011; SV 1; linear; genomic DNA; STD; PRO; 756 BP.
XX
AC X64011; S78972;
XX
SV X64011.1
XX
DT 28-APR-1992 (Rel. 31, Created)
DT 30-JUN-1993 (Rel. 36, Last updated, Version 6)
XX
DE Listeria ivanovii sod gene for superoxide dismutase
XX
KW sod gene; superoxide dismutase.
XX
OS Listeria ivanovii
OC Bacteria; Firmicutes; Bacillus/Clostridium group;
OC Bacillus/Staphylococcus group; Listeria.
XX
RN [1]
RX MEDLINE; 92140371.
RA Haas A., Goebel W.;
RT "Cloning of a superoxide dismutase gene from Listeria ivanovii by
RT functional complementation in Escherichia coli and characterization of the
RT gene product.";
RL Mol. Gen. Genet. 231:313-322(1992).
XX
RN [2]
RP 1-756
RA Kreft J.;
RT ;
RL Submitted (21-APR-1992) to the EMBL/GenBank/DDBJ databases.
RL J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am
RL Hubland, 8700 Wuerzburg, FRG
XX
FH Key Location/Qualifiers
FH
FT source 1..756
FT /db_xref="taxon:1638"
FT /organism="Listeria ivanovii"
FT /strain="ATCC 19119"
FT /mol_type="genomic DNA"
FT regulatory 95..100
FT /gene="sod"
FT /regulatory_class="ribosome_binding_site"
FT regulatory 723..746
FT /gene="sod"
FT /regulatory_class="terminator"
FT CDS 109..717
FT /transl_table=11
FT /gene="sod"
FT /EC_number="1.15.1.1"
FT /db_xref="GOA:P28763"
FT /db_xref="HSSP:P00448"
FT /db_xref="InterPro:IPR001189"
FT /db_xref="UniProtKB/Swiss-Prot:P28763"
FT /product="superoxide dismutase"
FT /protein_id="CAA45406.1"
FT /translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVSG
FT HAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLKAA
FT IESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPVLGL
FT DVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
XX
SQ Sequence 756 BP; 247 A; 136 C; 151 G; 222 T; 0 other;
cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 60
gtaatttctt .......... 120
//
7.1.2 GenBank Format
LOCUS LISOD 756 bp DNA linear BCT 30-JUN-1993
DEFINITION Listeria ivanovii sod gene for superoxide dismutase.
ACCESSION X64011 S78972
VERSION X64011.1 GI:44010
KEYWORDS sod gene; superoxide dismutase.
SOURCE Listeria ivanovii
ORGANISM Listeria ivanovii
Bacteria; Firmicutes; Bacillales; Listeriaceae; Listeria.
REFERENCE 1 (bases 1 to 756)
AUTHORS Haas,A. and Goebel,W.
TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by
functional complementation in Escherichia coli and characterization
of the gene product
JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992)
MEDLINE 92140371
REFERENCE 2 (bases 1 to 756)
AUTHORS Kreft,J.
TITLE Direct Submission
JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie,
Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG
FEATURES Location/Qualifiers
source 1..756
/organism="Listeria ivanovii"
/strain="ATCC 19119"
/db_xref="taxon:1638"
/mol_type="genomic DNA"
regulatory 95..100
/gene="sod"
/regulatory_class="ribosome_binding_site"
gene 95..746
/gene="sod"
CDS 109..717
/gene="sod"
/EC_number="1.15.1.1"
/codon_start=1
/transl_table=11
/product="superoxide dismutase"
/db_xref="GI:44011"
/db_xref="GOA:P28763"
/db_xref="InterPro:IPR001189"
/db_xref="UniProtKB/Swiss-Prot:P28763"
/protein_id="CAA45406.1"
/translation="MTYELPKLPYTYDALEPNFDKETMEIHYTKHHNIYVTKLNEAVS
GHAELASKPGEELVANLDSVPEEIRGAVRNHGGGHANHTLFWSSLSPNGGGAPTGNLK
AAIESEFGTFDEFKEKFNAAAAARFGSGWAWLVVNNGKLEIVSTANQDSPLSEGKTPV
LGLDVWEHAYYLKFQNRRPEYIDTFWNVINWDERNKRFDAAK"
regulatory 723..746
/gene="sod"
/regulatory_class="terminator"
ORIGIN
1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat
61 gtaa