Module:Bio::DB::SeqFeature::Store
From BioPerl
Pdoc documentation: Bio::DB::SeqFeature::Store | CPAN documentation: Bio::DB::SeqFeature::Store |
---|
- Bio::DB::SeqFeature
- The feature object customized to work well with DB::SeqFeature::Store
- Bio::DB::SeqFeature::Store::GFF3Loader
- The GFF to Bio::DB::SeqFeature loader object. Envoked from the command-line via bp_seqfeature_load.PLS.
- Storage adaptors
- Bio::DB::SeqFeature::Store::DBI::mysql
- Bio::DB::SeqFeature::Store::DBI::Pg
- Bio::DB::SeqFeature::Store::DBI::SQLite
- Bio::DB::SeqFeature::Store::berkeleydb
- Bio::DB::SeqFeature::Store::memory
- Features are returned should be Bio::SeqFeatureI (Bio::DB::SeqFeature).
Contents
|
Notes
See also: HOWTO:Feature-Annotation.
Schema
Using knowledge of the schema, we can directly query the database, independently of the DB::SeqFeature::Store interface... this can be useful, for example, when scripts like bp_seqfeature_delete.pl are slow, or ignore the precious '--namespace' flag.
SELECT @typeid := id FROM typelist WHERE tag = 'someFeature:iWantGone'; DELETE FROM attribute WHERE id IN ( SELECT id FROM feature WHERE typeid = @typeid); DELETE FROM name WHERE id IN ( SELECT id FROM feature WHERE typeid = @typeid); DELETE FROM parent2child WHERE id IN ( SELECT id FROM feature WHERE typeid = @typeid); DELETE FROM parent2child WHERE child IN ( SELECT id FROM feature WHERE typeid = @typeid); -- Finally... DELETE FROM feature WHERE typeid = @typeid;
Of course the above will be DOG SLOW if you have many features (the query within the IN clause is executed once per row of the outer query). To re-write these queries using JOINS is a little tricky, because of the MySQL specific 'multi-table delete' syntax. Here goes anyway:
SELECT @typeid := id FROM typelist WHERE tag = 'someFeature:iWantGone'; DELETE a #SELECT a.* FROM attribute a INNER JOIN feature f ON a.id = f.id WHERE f.typeid = @typeid; DELETE n #SELECT n.* FROM name n INNER JOIN feature f ON n.id = f.id WHERE f.typeid = @typeid; DELETE p #SELECT p.* FROM parent2child p INNER JOIN feature f ON p.id = f.id WHERE f.typeid = @typeid; DELETE c #SELECT c.* FROM parent2child c INNER JOIN feature f ON c.child = f.id WHERE f.typeid = @typeid; -- Finally (as above)... DELETE FROM feature WHERE typeid = @typeid;
This will potentially leave unused values in the three *list tables, but that shouldn't matter.
Note, this will surely screw up the interval_stats table!
To get an overview of an existing DB::SeqFeature::Store database
The following code demonstrates how to get an overview of an existing DB::SeqFeature::Store database:
#!/usr/bin/perl -w use strict; use Getopt::Long; use Bio::DB::SeqFeature::Store; my $adaptor = 'DBI::mysql'; my $database = 'test'; my $host = 'localhost'; my $user; my $pass; GetOptions( 'adaptor=s' => \$adaptor, 'database=s' => \$database, 'host=s' => \$host, 'user=s' => \$user, 'pass=s' => \$pass, ) or die "failure to communicate\n"; my $dsn = "$database:$host"; warn "using '$dsn'\n"; ## $db is a DB::SeqFeature::Store object my $db = Bio::DB::SeqFeature::Store-> new( -adaptor => $adaptor, -dsn => $dsn, -user => $user, -pass => $pass, ); ## Are sub-features being indexed? print "\nIndexed sub-features? : ", $db->index_subfeatures, "\n"; ## What serializer is being used? print "\nThe serializer is : ", $db->serializer, "\n"; ## List all the feature types in the database: print "\nFeature types:\n"; print "\t", join("\n\t", $db->types), "\n"; ## List how many there are for each feature type in the database #print "Feature type counts:\n"; #my %types = $db->types(-count => 1); #print join("\n", map { $_ . "\t". "$types{$_}" } keys %types), "\n"; ## List the feature attributes (tags) in the database: print "\nAttributes:\n"; print "\t", join("\n\t", $db->attributes), "\n"; ## How many sequence ids in the database: print "\nThere are : ", scalar($db->seq_ids), " sequences in the database\n"; ### How many features in the database: #print "There are : ", scalar($db->features), " features in the database\n";
Simple parsing
Sometimes wrestling with Bio::DB::SeqFeature::Store is a waste of time. Here is an example of code that 'gets the job done', and avoids using BioPerl to handle the GFF. Note this only handles the simplest cases w/o sequences and directives.
#!/usr/bin/perl -w use strict; use Bio::GFF3::LowLevel qw/ gff3_parse_feature /; use Bio::SeqIO; open( GFF, '<', 'some.gff') or die "failed to open file : $!\n"; my $seqIO = Bio::SeqIO-> new( -file => 'some.fa', -format => 'fasta'); my %ref_lookup; while(<GFF>){ next if /^#/; my $feat = gff3_parse_feature( $_ ); $ref_lookup{$feat->{attributes}{ID}} = $feat->{seq_id}; } while(my $seqO = $seqIO->next_seq){ print join("\t", $seqO->id, $seqO->length, $ref_lookup{$seqO->id} ), "\n"; } warn "OK\n";