ATIDB : The Arabidopsis thaliana Integrated Database

Home Genome Browser BLAST Gene Traps Sub-Seq Contact

1. What is Arabidopsis thaliana?

2. What is ATIDB?

3. What does the ATIDB system integrate?

4. How do I use ATIDB?

5. How else can I access ATIDB?

6. How can I add my own data?

7. I would like a specific data set integrated within ATIDB, will you add it?

8. How is the insertion point determined?


spacer

1. What is Arabidopsis thaliana?

See the Wikipedia article on Arabidopsis thaliana which describes the role of this model organism in plant research. For a recent review of the Arabidopsis genome see this Genome Research article.

2. What is ATIDB?

ATIDB is the "Arabidopsis thaliana Integrated Database". It is the result of the continued development of the "Arabidopsis thaliana Insertion Database" whose focus was on merging the TIGR version 4 Arabidopsis genome with insertional mutagens (see Nucleic Acids Research publication). Since 2002, the code base and data have been completely re-developed. The name change also reflects the wider scope of the system.

3. What does the ATIDB system integrate?

The ATIDB system integrates,

4. How do I use ATIDB?

ATIDB is most commonly used through the web interface (atidb.org) although there are several other ways to access and query the system (see later questions). The web interface is built around the very powerful Genome Browser software developed by Lincoln Stein at his Cold Spring Harbour Lab. This allows free-text searches and dynamically generated views of the genome. The best way to get going is to try some of the examples,



There are many other examples at the top of the ATIDB genome browser page. Apart from having search, scroll and zoom functionality, the browser is highly configurable. Check out the "Set Track Options" and the "Dumps, Searches and Other Operations" buttons.

5. How else can I access ATIDB?

to be added

6. How can I add my own data?

to be added

7. I would like a specific data set integrated within ATIDB, will you add it?

to be added

8. How is the insertion point determined?

ATIDB automates the process of locating insertional mutagens (T-DNAs and transposons) in the complete Arabidopsis thaliana (At) genome sequence. This is achieved by taking each flanking sequence (the "Query") and searching for sequence similarity in each of the Arabidopsis chromosomes (the "Sbjct"), using the WU-BLASTN program. Below is an example local region of similarity known as a "High Scoring Segment Pair" or "HSP" for short. A WU-BLASTN report often contains many HSPs per sequence and hits against more than one chromosome.
Query:       29 CAGCATATAACTCCGGTCTTTAAAA 5
                ||  | ||| ||| |||||||||||
Sbjct: 20080877 CATGAAATAGCTCGGGTCTTTAAAA 20080901

ATIDB automates the interpretation of the WU-BLASTN reports to define the point of insertion of the insertional mutagen.
In the majority of cases (around 58%) the interpretation of the WU-BLASTN report is simple; a region of the Query which includes the first base pair matches a region of an At chromosome in the highest scoring HSP.

Query:       25 CATGAAATAGCTCGGGTCTTTAAAA 1
                |||||||||||||||||||||||||
Sbjct: 20080877 CATGAAATAGCTCGGGTCTTTAAAA 20080901

However in other cases the interpretation is not so simple. Single pass sequencing is used to determine the flanking sequences of the insertional mutagen. As a result, this sequence is not highly accurate, resulting in insertions, deletions and mis-called bases relative to the "true" sequence. In comparison the Arabidopsis genome sequence was determined from multiple overlapping reads and is therefore of much higher quality. Moreover, the insertion site sequences are determined from templates produced from protocols such as Inverse PCR and Adapter Ligation PCR, and these techniques can also introduce errors such as amplification of non-adjacent sequences. The flanking sequence is primed from an oligonucleotide complementary to the insertional mutagen and so the flanking sequence initially contains some T-DNA or transposon sequence which must be clipped to leave only At sequence; however, this process is not perfect, another source of error. It is therefore unsurprising that in about 42% of cases to determine the precise match of insertion site sequences to the genome a more sophisticated approach is required. The following section describes the algorithm that we currently use. In the simple case outlined above we take the base corresponding to the first base of Query sequence as the insertion point. Where the first base is not matched in the highest scoring HSP we take the highest scoring HSP as the starting point and look for other HSPs which further refine the location. For example, this is a high scoring HSP, which gave a putative insertion point of 28511905,

Query:          49 TATATTTGGAAGTAATATCTAATTTTCGCTGTTATCACAACCAAATAATGTATACATTCA 108
                   ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct:    28511905 TATATTTGGAAGTAATATCTAATTTTCGCTGTTATCACAACCAAATAATGTATACATTCA 28511964


Query:         109 AATACTATTTTCATACTCTCCTAAGTTAAGTTTTTTTCTTGTTCAACAAACTATTTTTTT 168
                   |||||||||||||||||||||||||||||||||||||||  |  |  ||    |||||| 
Sbjct:    28511965 AATACTATTTTCATACTCTCCTAAGTTAAGTTTTTTTCTGTTCAACAAACTATTTTTTTA 28512024 


Query:         169 AGCTTGTTCTTGTAATCTTTTTTTTT 194
                      |  |  |   |   ||||||||| 
Sbjct:    28512025 GCTTGTTCTTGTAATCTTTTTTTTTT 28512050

Another HSP from this Query sequence was then used to further refine the insertion point to base 28511858

Query:        1 GTATAGCTAATGTAACTCGTTTGATGAAAATAAAGTCGGNGAAACTAT 48  
                ||||||||||||||||||||||||||||||  ||    | ||||||||  
Sbjct: 28511858 GTATAGCTAATGTAACTCGTTTGATGAAAAATAAAGTCGGGAAACTAT 28511905

Each of the interpretations is currently scored according to the following table. This serves two purposes, one to give the user a feel for the confidence in the automated location of a particular flanking sequence and two, to allow us to identify test cases for improving the algorithm.

Score Query sequence matched to b.p. HSPs used % of sequences
1 1 1 58
2 < 15 1 11
3 1 > 1 7
4 < 15 > 1 2

A score of 6 is given to sequences which do not match the above criteria but show hits to multiple chromosomes with a P values less that 1e-30 (approx. 3% of cases). The remaining insertion point detereminations (17%) are given a score of 5. Finally, approximately 3% of flanking sequences do not generate any HSPs to the At genome sequence using our criteria and are not represented in AtIDB.

The "take home" message from this is that the although the vast majority of locations within ATIDB are accurate and can be confirmed by PCR and sequencing, users will find it is worthwhile re-running BLAST on any insertion of interest in order to validate the insertion point before ordering seed for experiments. The ATIDB server provides an easy to use BLAST service which runs a search in a few seconds in order to achieve this ; look for the "One-click WU-BLASTN" link. Please note that we will continue to develop the flanking sequence location algorithm to increase accuracy and reliability.

We recommend that before you invest time analysing the phenotype of an insertion line you undertake a verification step. This involves designing primers to the genomic DNA flanking the insertion site that will amplify a genomic region of approx. 1kb. These primers should then be used in conjunction with a T-DNA or transposon specific primer in a three primer PCR reaction on DNA isolated from the insertion line. Two reactions will be required one using a left border T-DNA primer or a 3' transposon primer and the second using a T-DNA right border primer or a 5' transposon primer. Predicted fragments will be amplified that will confirm the insertion site, whether the insertion is homozygous or heterozygous and if any deletion or rearrangement has occurred adjacent to the insertion site.

Link here for...







































































































gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.