1. Download
2. Introduction
3. Installation
4. Usage
5. File Formats
6. Contact
1. INTRODUCTION
The software is distributed under a BSD type license. Make sure you read the license (included in the distribution as LICENSE.txt file) before downloading and using XML2PDB.
XML2PDB is a command line program designed to extract the structure information (.pdb), sequence information ( .pdbaa, .pdbnt) and the correspondence between coordinate and sequence residue numbering (.sc) from the xml structure files. Source code and makefiles for each operating system are included in the distribution for users who may wish to extend the output of XML2PDB.
XML2PDB was initially developed as an auxiliary tool for MolIDE
Currently the xml structure files can be obtained from
ftp://beta.rcsb.org/pub/pdb/uniformity/data/XML/
XML2PDB uses Expat library that can be obtained from:
sourceforge.net/projects/expat/
2. INSTALLATION
XML2PDB is provided in binary form for the following operating systems:
Windows 9x/Me/2000/XP
Linux
and the installation kits for each operating system have the following names, respectively:
xml2pdb_win.zip
xml2pdb_lin.tar.gz
After you download the installation kit appropriate for your operating system, you should uncompress it:
for Windows, you have to use WinZip
for Linux you have to ungzip and untar the appropriate installation kit.
gzip -d xml2pdb_lin.tar.gz
tar -xf xml2pdb_lin.tar
The previous step will generate directory xml2pdb_win / xml2pdb_lin that contains the executable, the source code and an example.
3. USAGE
xml2pdb -i <xml_file> [-c] [-s] [-p] [-n] [-d <Out_PDB_Dir>] [-e <Out_S2c_Dir>] [-f <Out_Pdbaa_Dir>]
-c extract PDB coordinates file
-s creates s2c file
-p creates pdbaa file
-n creates pdbnt file
-h help
4. FILE FORMATS
XML2PDB generates the following file types:
PDB
XML2PDB provides a stripped-down PDB file for homology modeling purposes. This file contains the title, SEQRES records, and coordinates. Users who wish additional records in their PDB files are urged to edit the source files and recompile the program.
Sequence-Coordinates Correspondence
These files provide the correspondence of the residue numbering implicit in the sequence (1,2,3,...) and that used in the coordinates. The coordinate residue numbering may not start with 1, may skip some residue numbers, and may add insert codes so that a residue may be numbered 62A. This correspondence is not provided in the legacy PDB format, but is contained within the mmCIF and XML file formats now provided by RCSB. These files may be used to provide this information to programs that use the legacy PDB format. They have the following format:
Col. | Pos | Item |
1 | 0-5 | Record identifier |
2 | 7 | Chain |
3 | 9 | One letter residue code |
4 | 11-13 | SEQRES three letter residue code |
5 | 15-17 | ATOM three letter residue code |
6 | 19-23 | SEQRES residue number |
7 | 25-30 | ATOM residue number |
8 | 32 | PDB secondary structure |
Example 1 (from 1o0d.sc)
SEQCRD L T THR --- 1 - -
SEQCRD L F PHE --- 2 - -
SEQCRD L G GLY GLY 3 1F C
SEQCRD L S SER SER 4 1E C
SEQCRD L G GLY GLY 5 1D C
SEQCRD L E GLU GLU 6 1C C
SEQCRD L A ALA ALA 7 1B C
SEQCRD L D ASP ASP 8 1A C
SEQCRD L C CYS CYS 9 1 C
SEQCRD L G GLY GLY 10 2 C
Example 2 (from 1o07.sc)
SEQCRD A A ALA ALA 1 4 C
SEQCRD A P PRO PRO 2 5 H
SEQCRD A Q GLN GLN 3 6 H
SEQCRD A Q GLN GLN 4 7 H
SEQCRD A I ILE ILE 5 8 H
SEQCRD A N ASN ASN 6 9 H
SEQCRD A D ASP ASP 7 10 H
SEQCRD A I ILE ILE 8 11 H
SEQCRD A V VAL VAL 9 12 H
SEQCRD A H HIS HIS 10 13 H
PDBAA
This file contains the sequences in FASTA format for each peptide chain.
The header has the following structure:
>StructName_And_Chain ChainLength Method Resolution RFactor FreeRFactor Descr <DBCode> [Organism]
Example
>1B6A_ 478 XRAY 1.60 0.187 0.216 METHIONINE AMINOPEPTIDASE <AMP2_HUMAN> [HOMO SAPIENS]
MAGVEEVAASGSHLNGDLDPDDREEGAASTAEEAAKKKRRKKKKSKGPSAAGEQEPDKES
GASVDEVARQLERSALEDKERDEDDEDGDGDGDGATGKKKKKKKKKRGPKVQTDPPSVPI
CDLYPNGVFPKGQECEYPPTQDGRTAAWRTTSEEKKALDQASEEIWNDFREAAEAHRQVR
KYVMSWIKPGMTMIEICEKLEDCSRKLIKENGLNAGLAFPTGCSLNNCAAHYTPNAGDTT
VLQYDDICKIDFGTHISGRIIDCAFTVTFNPKYDTLLKAVKDATNTGIKCAGIDVRLCDV
GEAIQEVMESYEVEIDGKTYQVKPIRNLNGHSIGQYRIHAGKTVPIVKGGEATRMEEGEV
YAIETFGSTGKGVVHDDMECSHYMKNFDVGHVPIRLPRTKHLLNVINENFGTLAFCRRWL
DRLGESKYLMALKNLCDLGIVDPYPPLCDIKGSYTAQFEHTILLRPTCKEVVSRGDDY
PDB chains without chainids are specified with an underscore character. The protein name is obtained from the SwissProt and GenBank records (listed in DBCode) in the XML file. The organism name is obtained from the scientific name given in the XML file.
PDBNT
This file contains the nucleotide sequences for each nucleic acid chain. The header has the same structure as the one for PDBAA file.
Example
>1O0BB 75 XRAY 2.70 0.2161 0.289 Glutaminyl tRNA <SYQ_ECOLI> []
UGGGGUAUCGCCAAGCGGUAAGGCACCGGAUUCUGAUUCCGGCAUUCCGAGGUUCGAAUC
CUCGUACCCCAGCCA
CONTACT THE AUTHORS:
Adrian Canutescu
Roland Dunbrack
|