Module:Bio::DB::SeqFeature::Store

From BioPerl

Jump to: navigation, search

Pdoc documentation: Bio::DB::SeqFeature::Store	CPAN documentation: Bio::DB::SeqFeature::Store

Bio::DB::SeqFeature: The feature object customized to work well with DB::SeqFeature::Store
Bio::DB::SeqFeature::Store::GFF3Loader: The GFF to Bio::DB::SeqFeature loader object. Envoked from the command-line via bp_seqfeature_load.PLS.

Storage adaptors
- Bio::DB::SeqFeature::Store::DBI::mysql
- Bio::DB::SeqFeature::Store::DBI::Pg
- Bio::DB::SeqFeature::Store::DBI::SQLite
- Bio::DB::SeqFeature::Store::berkeleydb
- Bio::DB::SeqFeature::Store::memory

Features are returned should be Bio::SeqFeatureI (Bio::DB::SeqFeature).

1 Notes
- 1.1 Schema
- 1.2 To get an overview of an existing DB::SeqFeature::Store database
- 1.3 Simple parsing

Notes

Schema

Using knowledge of the schema, we can directly query the database, independently of the DB::SeqFeature::Store interface... this can be useful, for example, when scripts like bp_seqfeature_delete.pl are slow, or ignore the precious '--namespace' flag.

SELECT @typeid := id FROM typelist WHERE tag = 'someFeature:iWantGone';
 
DELETE FROM attribute    WHERE id IN (
  SELECT id FROM feature WHERE typeid = @typeid);
 
DELETE FROM name         WHERE id IN (
  SELECT id FROM feature WHERE typeid = @typeid);
 
DELETE FROM parent2child WHERE id IN (
  SELECT id FROM feature WHERE typeid = @typeid);
 
DELETE FROM parent2child WHERE child IN (
  SELECT id FROM feature WHERE typeid = @typeid);
 
-- Finally...
DELETE FROM feature      WHERE typeid = @typeid;

Of course the above will be DOG SLOW if you have many features (the query within the IN clause is executed once per row of the outer query). To re-write these queries using JOINS is a little tricky, because of the MySQL specific 'multi-table delete' syntax. Here goes anyway:

SELECT @typeid := id FROM typelist WHERE tag = 'someFeature:iWantGone';
 
DELETE a
#SELECT a.*
FROM attribute a INNER JOIN feature f
ON a.id = f.id WHERE f.typeid = @typeid;
 
DELETE n
#SELECT n.*
FROM name n INNER JOIN feature f
ON n.id = f.id WHERE f.typeid = @typeid;
 
DELETE p
#SELECT p.*
FROM parent2child p INNER JOIN feature f
ON p.id = f.id WHERE f.typeid = @typeid;
 
DELETE c
#SELECT c.*
FROM parent2child c INNER JOIN feature f
ON c.child = f.id WHERE f.typeid = @typeid;
 
-- Finally (as above)...
DELETE FROM feature      WHERE typeid = @typeid;

This will potentially leave unused values in the three *list tables, but that shouldn't matter.

Note, this will surely screw up the interval_stats table!

To get an overview of an existing DB::SeqFeature::Store database

The following code demonstrates how to get an overview of an existing DB::SeqFeature::Store database:

#!/usr/bin/perl -w
 
use strict;
 
use Getopt::Long;
 
use Bio::DB::SeqFeature::Store;
 
my $adaptor  = 'DBI::mysql';
my $database = 'test';
my $host     = 'localhost';
my $user;
my $pass;
 
 
GetOptions( 'adaptor=s'  => \$adaptor,
	    'database=s' => \$database,
	    'host=s'     => \$host,
	    'user=s'     => \$user,
	    'pass=s'     => \$pass,
	    )
  or die "failure to communicate\n";
 
 
 
my $dsn = "$database:$host";
 
warn "using '$dsn'\n";
 
 
 
## $db is a DB::SeqFeature::Store object
my $db = Bio::DB::SeqFeature::Store->
  new( -adaptor => $adaptor,
       -dsn     => $dsn,
       -user    => $user,
       -pass    => $pass,
     );
 
## Are sub-features being indexed?
print "\nIndexed sub-features? : ",
  $db->index_subfeatures, "\n";
 
## What serializer is being used?
print "\nThe serializer is : ",
  $db->serializer, "\n";
 
## List all the feature types in the database:
print "\nFeature types:\n";
print "\t", join("\n\t", $db->types), "\n";
 
## List how many there are for each feature type in the database
#print "Feature type counts:\n";
#my %types = $db->types(-count => 1);
#print join("\n", map { $_ . "\t". "$types{$_}" } keys %types), "\n";
 
## List the feature attributes (tags) in the database:
print "\nAttributes:\n";
print "\t", join("\n\t", $db->attributes), "\n";
 
## How many sequence ids in the database:
print "\nThere are : ", scalar($db->seq_ids), " sequences in the database\n";
 
### How many features in the database:
#print "There are : ", scalar($db->features), " features in the database\n";

Simple parsing

Sometimes wrestling with Bio::DB::SeqFeature::Store is a waste of time. Here is an example of code that 'gets the job done', and avoids using BioPerl to handle the GFF. Note this only handles the simplest cases w/o sequences and directives.

#!/usr/bin/perl -w
 
use strict;
 
use Bio::GFF3::LowLevel qw/ gff3_parse_feature /;
 
use Bio::SeqIO;
 
open( GFF, '<', 'some.gff')
  or die "failed to open file : $!\n";
 
my $seqIO = Bio::SeqIO->
  new( -file => 'some.fa',
       -format => 'fasta');
 
my %ref_lookup;
 
while(<GFF>){
  next if /^#/;
  my $feat = gff3_parse_feature( $_ );
 
  $ref_lookup{$feat->{attributes}{ID}} = $feat->{seq_id};
}
 
while(my $seqO = $seqIO->next_seq){
  print
    join("\t",
	 $seqO->id,
	 $seqO->length,
	 $ref_lookup{$seqO->id}
	), "\n";
}
 
warn "OK\n";

Retrieved from "bioperl.org/wiki/Module:Bio::DB::SeqFeature::Store"

Categories: Modules | Core Modules

Module:Bio::DB::SeqFeature::Store

From BioPerl

Contents

Notes

Schema

To get an overview of an existing DB::SeqFeature::Store database

Simple parsing

Views

Personal tools

Main Links

Search

documentation

community

development

Toolbox