++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
THE DATA_TEMPLATE.TEXT FILE FOR X-RAY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTES AND REMINDER
The data template file contains data entries for unique chemical sequences
present in the structure and other non-electronically captured information.
PLEASE CHECK CATEGORIES 1 & 2: Before proceeding any further, make necessary
corrections here so that all information in these categories are complete
and correct.
You may choose to fill in CATEGORIES (3-19) either here or later in ADIT.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GUIDELINES FOR USING THIS FILE
1. Only strings included between the 'lesser than' and 'greater than'
signs (<.....>) will be parsed for evaluation by the program. Therefore,
DO NOT write either on the left or right of the 'less than' and 'greater
than' signs respectively.
2. All alphanumeric values or strings that you include in the different
categories should be within double-quotes. Blank spaces or carriage
returns within a pair of double quotes are ignored by the program.
DO NOT use double quotes (") within strings that you enter.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~START INPUT DATA BELOW~~~~~~~~~~~~~~~~~~~~~~~
================CATEGORY 1: Crystallographic Data=======================
Enter crystallographic data
<space_group = "P 1 21 1"> (use International Table conventions)
<space_group_number = "? ">
<unit_cell_a = " 56.800 " >
<unit_cell_b = " 69.950 " >
<unit_cell_c = " 60.530 " >
<unit_cell_alpha = " 90.00 " >
<unit_cell_beta = "114.50 " >
<unit_cell_gamma = " 90.00 " >
================CATEGORY 2: Sequence Information =======================
Enter one letter sequence for each polymeric entity in asymmetric unit
--------------------------------------------------------------------------
SOME DEFINITIONS
An ENTITY is defined as any unique molecule present in the asymmetric
unit. Each unique biological polymer (protein or nucleic acids) in the
structure is considered an entity. Thus, if there are five copies of
a single protein in the asymmetric unit, the molecular entity is still
only one. Water and non-polymers like ions, ligands and sugars are
also entities.
Here we only consider the sequences of polymeric entities (protein or
nucleic acid).
GUIDELINES FOR COMPLETING THIS CATEGORY
* In a PDB or mmCIF format file, all residues of a single polymeric
entity should have one chain ID. Multiple copies of the same entity
should each be assigned a unique chain ID. The multiple chain IDs
should be separated by commas as 'A,B,C,...'. If incorrect chain IDs
are used the entity groups extracted by this program will not be
correct. To avoid this, make necessary corrections in the PDB or mmCIF
file used to generate the data_template file and regenerate the
data_template.text file. Alternatively, edit the extracted sequence
in this file to correctly represent the sequence and chain IDs of each
polymeric entity.
* In addition to chain IDs, this program uses distance geometry to
asses if there are any breaks in the polymer sequence. These breaks
may occur due to missing residues (not included in the model due to
missing electron density) or due to poor geometry. Four question marks
'????' are used to denote these chain breaks. Replace these question
marks with the sequence of residues missing from the coordinates. Also
add any residues missing from the N- and/or C-termini here.
* If there are non-standard residues in the coordinates, this program
lists them according to the three letter code used in the coordinate
file as (ABC). If all the residues in your sequence are nonstandard,
check and edit the sequence manually to represent it correctly in this
file.
* If any residue was modeled as Ala or Gly due to lack of the side-chain
density, the sequence extracted here will represent them as A or G
respectively. Correct this to the original sequence that was present in
the crystal.
----------------------------------------------------------------------------
Below is the one letter chemical sequence extracted from your PDB
coordinate file. The molecular entities are grouped and listed
together.
PLEASE CHECK THE SEQUENCE of each entity carefully and modify it, as necessary.
Make sure that you REVIEW THE FOLLOWING:
* chain breaks due to missing residues,
* missing residues in the N- and/or C-termini,
* non-standard residues and
* cases of residues modeled as Ala or Gly due to missing side-chain density.
<molecule_entity_id="1" >
<molecule_entity_type="polypeptide(L)" >
<molecule_one_letter_sequence="
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNHPNIVKLLDVI
HTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGA
IKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRRALFPGDSEID
QLFRIFRTLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAA
LAHPFFQDVTKPVPHLRL" >
< molecule_chain_id="A" >
< target_DB_id=" " > (if known)
<molecule_entity_id="2" >
<molecule_entity_type="polypeptide(L)" >
<molecule_one_letter_sequence="
MSHKQIYYSDKYDDEEFEYRHVMLPKDIAKLVPKTHLMSESEWRNLGVQQSQGWVHYMIHEPEPHILLFR
RPLPKKPKK" >
< molecule_chain_id="B" >
< target_DB_id=" " > (if known)
<molecule_entity_id=" " >
<molecule_entity_type=" " >
<molecule_one_letter_sequence=" " >
<molecule_chain_id=" " >
<target_DB_id=" " > (if known)
================CATEGORY 3: Contact Authors=============================
Enter information about the contact authors.
Note: items marked by (e.g. ) are manditory.
PI information should be always given.
1. Information about the Principal investigator (PI) should be given.
<contact_author_PI_id = "1 "> (must be given 1)
<contact_author_PI_salutation = " "> ( Dr./Prof./Mr./Mrs./Ms.)
<contact_author_PI_first_name = " "> (e.g. John)
<contact_author_PI_last_name = " "> (e.g. Rodgers)
<contact_author_PI_middle_name = " ">
<contact_author_PI_role = " "> (e.g. investigator/responsible scientist)
<contact_author_PI_organization_type = " "> (e.g. academica/commercial/goverment/other)
<contact_author_PI_email = " "> (e.g. name@host.domain.country)
<contact_author_PI_address = " "> (e.g. 610 Taylor road)
<contact_author_PI_city = " "> (e.g. Piscataway)
<contact_author_PI_State_or_Province = " "> (e.g. New Jersey)
<contact_author_PI_Zip_Code = " "> (e.g. 08864)
<contact_author_PI_Country = " "> (e.g. UNITED STATES)
<contact_author_PI_fax_number = " ">
<contact_author_PI_phone_numer = " ">
2. Information about other contact authors
<contact_author_id = "2 "> (e.g. 2,3,4..)
<contact_author_salutation = " ">
<contact_author_first_name = " ">
<contact_author_last_name = " ">
<contact_author_middle_name = " ">
<contact_author_role = " ">
<contact_author_organization_type = " ">
<contact_author_email = " ">
<contact_author_address = " ">
<contact_author_city = " ">
<contact_author_State_or_Province = " ">
<contact_author_Zip_Code = " ">
<contact_author_Country = " ">
<contact_author_fax_number = " ">
<contact_author_phone_numer = " ">
...(add more if needed)...
================CATEGORY 4: Structure Genomics=========================
If it is the structure genomics project, give the information
<SG_project_id = " 1">
<SG_project_name = " "> (e.g. NPPSFA/PSI, Protein Structure Initiative)
<full_name_of_SG_center = " "> (e.g. Berkeley Structural Genomics Center)
================CATEGORY 5: Release Status==============================
Enter release status for the coordinates,structure_factor, and sequence
Status for sequence should be chosen from one of the following:
(release now, hold for release)
Status for others should be chosen from one of the following:
(release now, hold for publication, hold for 4 weeks, hold for 6 weeks,
hold for 6 months, hold for 1 year)
<Release_status_for_coordinates = " "> (e.g. release now)
<Release_status_for_structure_factor = " ">
<Release_status_for_sequence = " ">
================CATEGORY 6: Title=======================================
Enter the title for the structure
<structure_title = " "> (e.g. Crystal Structure Analysis of the B-DNA)
<structure_details = " ">
================CATEGORY 7: Authors of Structure============================
Enter authors of the deposited structures (e.g. Surname, F.M.)
<structure_author_name = " ">
<structure_author_name = " ">
<structure_author_name = " ">
<structure_author_name = " ">
...add more if needed...
================CATEGORY 8: Citation Authors============================
Enter author names for the publications associated with this deposition.
The primary citation is the article in which the deposited coordinates
were first reported. Other related citations may also be provided.
1. For the primary citation
<primary_citation_author_name = " "> (e.g. Surname, F.M.)
<primary_citation_author_name = " ">
<primary_citation_author_name = " ">
<primary_citation_author_name = " ">
...add more if needed...
2. For other related citations (if applicable)
<citation_author_id = " "> (e.g. 1, 2 ..)
<citation_author_name = " ">
<citation_author_name = " ">
<citation_author_name = " ">
<citation_author_name = " ">
...add more if needed...
...(add more other citations if needed)...
================CATEGORY 9: Citation Article============================
Enter citation article (journal, title, year, volume, page)
If the citation has not yet been published, use 'To be published'
for the category 'journal_abbrev' and leave pages and volume blank.
1. For primary citation
<primary_citation_id = "primary">
<primary_citation_journal_abbrev = " "> (e.g. to be published)
<primary_citation_title = " ">
<primary_citation_year = " ">
<primary_citation_journal_volume = " ">
<primary_citation_page_first = " ">
<primary_citation_page_last = " ">
2. For other related citation (if applicable)
<citation_id = "1 "> (e.g. 1, 2, 3 ...)
<citation_journal_abbrev = " ">
<citation_title = " ">
<citation_year = " ">
<citation_journal_volume = " ">
<citation_page_first = " ">
<citation_page_last = " ">
...(add more citations if needed)...
================CATEGORY 10: Molecule Names==============================
Enter the names of the molecules (entities) that are in the asymmetric unit
NOTE: The number of molecular names should be the same as CATEGORY 2 !
The name of molecule should be obtained from the appropriate
sequence database reference, if available. Otherwise the gene name or
other common name of the entity may be used.
e.g. HIV-1 integrase for protein
RNA Hammerhead Ribozyme for RNA
<molecule_name = " "> (entity 1)
<molecule_name = " "> (entity 2)
...(add more if needed)...
================CATEGORY 11: Molecule Details============================
Enter additional information about each entity, if known. (optional)
Additional information would include details such as fragment name
(if applicable), mutation, and E.C.number.
1. For entity 1
<Molecular_entity_id = "1 "> (e.g. 1, 2, ...)
<Fragment_name = " "> (e.g. ligand binding domain, hairpin)
<Specific_mutation = " "> (e.g. C280S)
<Enzyme_Comission_number = " "> (if known: e.g. 2.7.7.7)
2. For entity 2
<Molecular_entity_id = "2 ">
<Fragment_name = " ">
<Specific_mutation = " ">
<Enzyme_Comission_number = " ">
...(add more if needed)...
================CATEGORY 12: Genetically Manipulated Source=============
Enter data in the genetically manipulated source category
If the biomolecule has been genetically manipulated, describe its
source and expression system here.
1. For entity 1
<Manipulated_entity_id = "1 "> (e.g. 1, 2, ...)
<Source_organism_scientific_name = " "> (e.g. Homo sapiens)
<Source_organism_gene = " "> (e.g. RPOD, ALKA...)
<Source_organism_strain = " "> (e.g. BH10 ISOLATE, K-12...)
<Expression_system_scientific_name = " "> (e.g. Escherichia coli)
<Expression_system_strain = " "> (e.g. BL21(DE3))
<Expression_system_vector_type = " "> (e.g. plasmid)
<Expression_system_plasmid_name = " "> (e.g. pET26)
<Manipulated_source_details = " "> (any other relevant information)
2. For entity 2
<Manipulated_entity_id = "2 ">
<Source_organism_scientific_name = " ">
<Source_organism_gene = " ">
<Source_organism_strain = " ">
<Expression_system_scientific_name = " ">
<Expression_system_strain = " ">
<Expression_system_vector_type = " ">
<Expression_system_plasmid_name = " ">
<Manipulated_source_details = " ">
...(add more if needed)...
================CATEGORY 13: Natural Source=============================
Enter data in the natural source category (if applicable)
If the biomolecule was derived from a natural source, describe it here.
1. For entity 1
<natural_source_entity_id = " "> (e.g. 1, 2, ...)
<natural_source_scientific_name = " "> (e.g. Homo sapiens)
<natural_source_organism_strain = " "> (e.g. DH5a , BMH 71-18)
<natural_source_details = " "> (e.g. organ, tissue, cell ..)
2. For entity 2
<natural_source_entity_id = " ">
<natural_source_scientific_name = " ">
<natural_source_organism_strain = " ">
<natural_source_details = " ">
...(add more if needed)...
================CATEGORY 14: Synthetic Source=============================
If the biomolecule has not been genetically manipulated or synthesized,
describe its source here.
1. For entity 1
<synthetic_source_entity_id = " "> (e.g. 1, 2, ...)
<synthetic_source_description = " "> (if known)
2. For entity 2
<synthetic_source_entity_id = " ">
<synthetic_source_description = " ">
...(add more if needed)...
================CATEGORY 15: Keywords===================================
Enter a list of keywords that describe important features of the deposited
structure.
For example, beta barrel, protein-DNA complex, double helix,
hydrolase, structural genomics etc.
<structure_keywords = " ">
================CATEGORY 16: Biological Assembly========================
Enter data in the biological assembly category (if applicable)
Biological assembly describes the functional unit(s) present in the
structure. There may be part of a biological assembly, one or more
than one biological assemblies in the asymmetric unit.
Case 1
* If the asymmetric unit is the same as the biological assembly
nothing special needs to be noted here.
Case 2
* If the asymmetric unit does not contain a complete biological unit.
Please provide symmetry operations including translations required
to build the biological unit.
(example:
The biological assembly is a hexamer generated from the dimer
in the asymmetric unit by the operations: -y, x-y-1, z-1 and
-x+y, -x-1, z-l.)
Case 3
* If the asymmetric unit has multiple biological units
Please specify how to group the contents of the asymmetric unit into
biological units.
(example:
The biological unit is a dimer. There are 2 biological units in the
asymmetric unit (chains A & B and chains C & D).
<biological_assembly = " "> (biological unit 1)
<biological_assembly = " "> (biological unit 1)
....(add more if needed)....
================CATEGORY 17: Methods and Conditions=====================
Enter the crystallization conditions for each crystal
1. For crystal 1:
<crystal_number = "1 "> (e.g. 1, 2, ...)
<crystallization_method = " "> (e.g. vapor diffusion, hanging drop)
<crystallization_pH = " "> (e.g. 7.5 ...)
<crystallization_temperature = " "> (e.g. 298) (in Kelvin)
<crystallization_details = " "> (e.g. PEG 4000, NaCl etc.)
2. For crystal 2:
<crystal_number = " ">
<crystallization_method = " ">
<crystallization_pH = " ">
<crystallization_temperature = " ">
<crystallization_details = " ">
...(add more if needed)...
================CATEGORY 18: Crystal Property===========================
Enter solvent content, Matthews coefficient
These values were calculated based on the sequence as shown in
CATEGORY 2. If there are missing residues, you need to add the
missing residues and re-run the program to get accurate values.
(The command to re-run is 'extract -sol data_template.text')
1. For crystal 1:
<crystals_number = " 1 "> (e.g. 1, 2, ...)
<crystals_solvent_content = "50.6 ">
<crystals_matthews_coefficient = "2.5 ">
<crystals_mosaicity = " "> (e.g. 0.5 ...)
2. For crystal 2:
<crystals_number = " ">
<crystals_solvent_content = "50.6 ">
<crystals_matthews_coefficient = "2.5 ">
<crystals_mosaicity = " ">
...(add more if needed)...
================CATEGORY 19: Radiation Source (experiment)============
Enter the details of the source of radiation, the X-ray generator,
and the wavelength for each diffraction.
1. For experiment 1:
<radiation_experiment = "1 "> (e.g. 1, 2, ...)
<radiation_source = " "> (e.g. SYNCHROTRON, ROTATING ANODE ...)
<radiation_source_type = " "> (e.g. NSLS BEAMLINE X8C ...)
<radiation_wavelengths= " "> (e.g. 1.502 ...)
<radiation_detector = " "> (e.g. CCD/AREA DETECTOR/IMAGE PLATE ...)
<radiation_detector_type= " "> (e.g. SIEMENS-NICOLET/RIGAKU RAXIS ...)
<radiation_detector_details = " "> (e.g. mirrors...)
<data_collection_date = " "> (e.g. 2004-11-27)
<data_collection_temperature = " "> (e.g. 100 for crystal 1:)
<data_collection_protocol= " "> (e.g. SINGLE WAVELENGTH, MAD, ...)
<data_collection_monochromator= " "> (e.g. GRAPHITE, Ni FILTER ...)
2. For experiment 2:
<radiation_experiment = "2 ">
<radiation_source = " ">
<radiation_source_type = " ">
<radiation_wavelengths= " ">
<radiation_detector = " ">
<radiation_detector_type= " ">
<radiation_detector_details = " ">
<data_collection_data = " ">
<data_collection_temperature = " ">
<data_collection_protocol= " ">
<data_collection_monochromator= " ">
....(add more if needed)....
=====================================END==================================
|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
THE LOG_SCRIPT.INP FILE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTES AND REMINDER
This script file is used to enter the names of the crystallographic
software used for structure determination and the log, PDB, mmCIF or
text files generated by them.
PLEASE COMPLETE the ENTRY FIELDS according to the type of your experiment
and use the command 'extract -ext log_script.inp' to obtain the completed
structure data ready for validation and deposition.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GUIDELINES FOR USING THIS FILE
1. Only strings included between the 'lesser than' and 'greater than'
signs (<.....>) will be parsed for evaluation by the program. Therefore,
DO NOT write either on the left or right of the 'less than' and 'greater
than' signs respectively.
2. All alphanumeric values or strings that you include in the different
categories should be within double-quotes. Blank spaces or carriage
returns within a pair of double quotes are ignored by the program.
DO NOT use double quotes (") within strings that you enter.
3. Log files used for generating the deposition should be generated from
the best (usually the last) trial for each crystallographic software.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~~~~~~~~START INPUT DATA BELOW~~~~~~~~~~~~~~~~~~~~~~~
===============PART 1: Structure Factor for Final Refinement==============
Enter reflection data file used for final structure refinement
NOTE:
* Usually the highest resolution or best data set is used for the
refinement. Use that structure factor file here.
* In some cases, it may not be possible to collect a complete dataset
from a single crystal. Thus, multiple data sets have to be scaled
and merged together for refinement. Use the merged reflection file
here.
* If the reflection data format is not one of those listed below,
please use OTHER for the data format, and provide an ASCII file
that has at least five values [H, K, L, I (or F), sigmaI (or sigmaF)]
for each reflection and seperate each item by one or more spaces.
Include the test flags as the sixth column in the file (if available).
* If the reflection file is in mtz format (e.g. using REFMAC5), convert
it to mmCIF format using the mtz2various application provided by CCP4.
Reflection data format:
CNS|SHELX|TNT|REFMAC5|HKL|SCALEPACK|DTREK|SAINT|SCALA|3DSCALE
<reflection_data_type = "F" > [enter I (intensity) or F (amplitude)]
<reflection_data_format = "CNS" >
<reflection_data_file_name = " " >
==============PART 2: Structure Factors for Protein Phasing================
Enter reflection data files used for heavy atom or MAD phasing
NOTE:
* Enter this category if you have more than one complete reflection
file (e.g. in the case of MAD,SIRAS, MIR). The LOG files generated
from data scaling software for all these data sets are also needed.
* If the scaling program is not one of those listed below
(HKL|SCALEPACK|DTREK|SAINT|3DSCALE), enter OTHER for the program
name and provide an ASCII file with five values
[H, K, L, I (or F), sigmaI (or sigmaF)] for each reflection and
seperate each item by a space
* If the same crystal was used for collecting multiple data sets, the
crystal number will remain '1' as the wavelength numbers change.
However, if multiple crystals were used, for the data collections,
the corresponding crystal numbers should be used for each data set.
* IT IS IMPORTANT THAT THE LOG FILE AND DATA FILE COME FROM THE
SAME PROGRAM.
<scale_data_type = "I" > [enter I (intensity) or F (amplitude)]
<scale_program_name = "HKL" >
For data set 1:
<crystal_number = "1" >
<diffract_number = "1" >
<scale_data_file_name = " " >
<scale_log_file_name = " " >
For data set 2:
<crystal_number = "1" >
<diffract_number = "2" >
<scale_data_file_name = " " >
<scale_log_file_name = " " >
For data set 3:
<crystal_number = "1" >
<diffract_number = "3" >
<scale_data_file_name = " " >
<scale_log_file_name = " " >
==================PART 3: Statistics for Indexing=====================
Enter log file and software name for data indexing
NOTE:
* This is only for the data of final structure refinment.
Software for indexing is one of the following:
(HKL|DENZO|DTREK|MOSFLM)
<data_indexing_software = "HKL" >
<data_indexing_LOG_file_name = " " >
<data_indexing_CIF_file_name = " " > (if mmCIF format)
==================PART 4: Statistics for Data Scaling=====================
Enter log file and software name for data scaling
NOTE:
* The log file included here should have scaling statistics of
the file used for the final structure refinement. If multiple data
sets were scaled and merged for refinement (as described in Part 1
above) use the log file generated during merging of the data sets.
Software for scaling is one of the following:
(HKL|SCALEPACK|DTREK|SAINT|3DSCALE|SCALA)
<data_scaling_software = "HKL" >
<data_scaling_LOG_file_name = " " >
<data_scaling_CIF_file_name = " " > (if mmCIF format)
==============PART 5: Statistics for Molecular Replacement================
Enter log files and software name for molecular replacement
NOTE:
Software is one of the following:
(CNS|AMORE|MOLREP|EPMR|PHASER)
The log file should be from the best trial of MR.
<mr_software = " " >
<mr_log_file_LOG_1 = " " >
<mr_log_file_LOG_2 = " " >
=================PART 6: Statistics for Protein Phasing===================
Enter log files and software name for heavy atom phasing
NOTE:
The phasing method should be one of (SAD|MAD|SIR|SIRAS|MIR|MIRAS).
Software is one of the following:
(CNS|MLPHARE|SOLVE|SHELXS|SHELXD|SNB|BNP|SHARP|PHASES)
The log file should be from the best trial of phasing.
<phasing_method = "MAD" >
<phasing_software = "SOLVE" >
<phasing_log_file_LOG_1 = " " >
<phasing_log_file_PDB_1 = " " > (if PDB format (heavy atom coordinates))
<phasing_log_file_CIF_1 = " " > (if mmCIF format)
<phasing_log_file_LOG_2 = " " >
<phasing_log_file_PDB_2 = " " >
<phasing_log_file_CIF_2 = " " >
... add more if needed ...
===============PART 7: Statistics for Density Modification================
Enter log files and software name for density modification
NOTE:
Software is one of the following:
(CNS|DM|RESOLVE|SOLOMON|SHELXE)
The log file should be from the best trial of density modification.
<dm_software = "RESOLVE " >
<dm_log_file_LOG_1 = " " >
<dm_log_file_CIF_1 = " " > (if mmCIF format)
===============PART 8: Statistics for Structure Refinement================
Enter log files and software name used for final structure refinement
NOTE:
Software is one of the following:
(CNS|REFMAC5|SHELXL|TNT|PROLSQ|NUCLSQ|RESTRAIN)
The log file should be from the final trial of structure refinement.
<refine_software = "REFMAC5" >
<refine_log_file_PDB_1 = " " > (coordinate file in PDB format)
<refine_log_file_CIF_1 = " " > (mmCIF file containing refinement statistics)
<refine_log_file_LOG_1 = " " >
=======================PART 9: Data Template File=========================
Enter file name of the data template file
NOTE:
This file 'data_template.text' was generated by using the
command 'extract -pdb pdb_file' or 'extract -cif cif_file'. It
contains the sequences of all unique polymers (protein or nucleic
acid) present in the structure. It also contains other
non-electronically captured information. Please complete the
data template file before running pdb_extract.
<data_template_file = "data_template.text" >
==========================PART 10: Output Files============================
Enter the output file names
NOTE:
If you do not give the output file names, the default names
pdb_extract_sf.mmcif containing structure factors and
pdb_extract.mmcif containing coordinates will be assigned
by the program
<sf_output= " " > (for structure factors)
<statistics_output= " " > (for coordinates and statistics)
=====================================END==================================
|
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
THE DATA_TEMPLATE.TEXT FILE FOR NMR
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTES AND REMINDER
The data template file contains data entries for unique chemical sequences
present in the structure and other non-electronically captured information.
PLEASE CHECK CATEGORIES 1. Before proceeding any further, make necessary
corrections here so that all information in these categories are complete
and correct.
You may choose to fill in CATEGORIES (2-21) either here or later in ADIT.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GUIDELINES FOR USING THIS FILE
1. Only strings included between the 'lesser than' and 'greater than'
signs (<.....>) will be parsed for evaluation by the program. Therefore,
DO NOT write either on the left or right of the 'less than' and 'greater
than' signs respectively.
2. All alphanumeric values or strings that you include in the different
categories should be within double-quotes. Blank spaces or carriage
returns within a pair of double quotes are ignored by the program.
DO NOT use double quotes (") within strings that you enter.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~START INPUT DATA BELLOW~~~~~~~~~~~~~~~~~~~~~~~
================CATEGORY 1: Molecular Entity Sequence===================
Enter one letter code sequence for each molecular entity
A Molecular entity is defined as a unique monomer in each model.The
molecular entities are calculated and grouped together.
Please carefully check the entity and modify it, if necessary.
If a chain is broken, four question marks ???? are given at the broken
point. Please REPLACE the ? by the missing sequences including N and C
terminals. If residue name is not the standard one letter code (due to
modification), the full residue (three letter name) name should be given
and parenthesized.
NOTE: If all the residues are modified, sequence may not be extracted.
Please manually add the sequence.
<molecule_entity_id="1" >
<molecule_entity_type="polypeptide(L)" >
<molecule_one_letter_sequence="
MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDT????TAIREISLLKELNHPNIVKLLDVIHTENKLY
LVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHSHRVLHRDLKPQNLLINTEGAIKLADFG
LARAFGVPVRTYTHEVVTLWYRAPEILLGCKYYSTAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFR
TLGTPDEVVWPGVTSMPDYKPSFPKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQ
DVTKPVP" >
< molecule_chain_id="A" >
< target_DB_id=" " > (if known)
<molecule_entity_id="2" >
<molecule_entity_type="polypeptide(L)" >
<molecule_one_letter_sequence="
QIYYSDKYDDEEFEYRHVMLPKDIAKLVPKTHLMSESEWRNLGVQQSQGWVHYMIHEPEPHILLFRRPLP
" >
< molecule_chain_id="B" >
< target_DB_id=" " > (if known)
<molecule_entity_id=" " >
<molecule_entity_type=" " >
<molecule_one_letter_sequence=" " >
<molecule_chain_id=" " >
<target_DB_id=" " > (if known)
================CATEGORY 2: Contact Authors=============================
Enter information about the contact authors.
Note: items marked by (e.g. ) are manditory.
PI information should be always given.
1. Information about the Principal investigator (PI) should be given.
<contact_author_PI_id = "1 "> (must be given 1)
<contact_author_PI_salutation = " "> ( Dr./Prof./Mr./Mrs./Ms.)
<contact_author_PI_first_name = " "> (e.g. John)
<contact_author_PI_last_name = " "> (e.g. Rodgers)
<contact_author_PI_middle_name = " ">
<contact_author_PI_role = " "> (e.g. investigator/responsible scientist)
<contact_author_PI_organization_type = " "> (e.g. academica/commercial/goverment/other)
<contact_author_PI_email = " "> (e.g. name@host.domain.country)
<contact_author_PI_address = " "> (e.g. 610 Taylor road)
<contact_author_PI_city = " "> (e.g. Piscataway)
<contact_author_PI_State_or_Province = " "> (e.g. New Jersey)
<contact_author_PI_Zip_Code = " "> (e.g. 08864)
<contact_author_PI_Country = " "> (e.g. UNITED STATES)
<contact_author_PI_fax_number = " ">
<contact_author_PI_phone_numer = " ">
2. Information about other contact authors
<contact_author_id = "2 "> (e.g. 2,3,4..)
<contact_author_salutation = " ">
<contact_author_first_name = " ">
<contact_author_last_name = " ">
<contact_author_middle_name = " ">
<contact_author_role = " ">
<contact_author_organization_type = " ">
<contact_author_email = " ">
<contact_author_address = " ">
<contact_author_city = " ">
<contact_author_State_or_Province = " ">
<contact_author_Zip_Code = " ">
<contact_author_Country = " ">
<contact_author_fax_number = " ">
<contact_author_phone_numer = " ">
...(add more if needed)...
================CATEGORY 3: Structure Genomics=========================
If it is the structure genomics project, give the information
<SG_project_id = " 1">
<SG_project_name = " "> (e.g. NPPSFA/PSI, Protein Structure Initiative)
<full_name_of_SG_center = " "> (e.g. Berkeley Structural Genomics Center)
================CATEGORY 4: Release Status==============================
Enter Release Status for Coordinates, Constraints, Sequence
Status for sequence should be chosen from one of the following:
(release now, hold for release)
Status for others should be chosen from one of the following:
(release now, hold for publication, hold for 4 weeks, hold for 6 weeks,
hold for 6 months, hold for 1 year)
<Release_status_for_coordinates = " ">
<Release_status_for_NMR_constraints = " ">
<Release_status_for_sequence = " ">
================CATEGORY 5: Title=======================================
Enter a title for the structure
<structure_title = " "> (e.g. Crystal Structure Analysis of the B-DNA)
<structure_details = " ">
================CATEGORY 6: Authors of Structure============================
Enter authors of the deposited structures (e.g. Surname, F.M.)
<structure_author_name = " ">
<structure_author_name = " ">
<structure_author_name = " ">
<structure_author_name = " ">
...add more if needed...
================CATEGORY 7: Citation Authors============================
Enter author names for the publications associated with this deposition.
The primary citation is the article in which the deposited coordinates
were first reported. Other related citations may also be provided.
1. For the primary citation
<primary_citation_author_name = " "> (e.g. Surname, F.M.)
<primary_citation_author_name = " ">
<primary_citation_author_name = " ">
<primary_citation_author_name = " ">
...add more if needed...
2. For other related citations (if applicable)
<citation_author_id = " "> (e.g. 1, 2 ..)
<citation_author_name = " ">
<citation_author_name = " ">
<citation_author_name = " ">
<citation_author_name = " ">
...add more if needed...
...(add more other citations if needed)...
================CATEGORY 8: Citation Article============================
Enter citation article (journal, title, year, volume, page)
If the citation has not yet been published, use 'To be published'
for the category 'journal_abbrev' and leave pages and volume blank.
1. For primary citation
<primary_citation_id = "primary">
<primary_citation_journal_abbrev = " "> (e.g. to be published)
<primary_citation_title = " ">
<primary_citation_year = " ">
<primary_citation_journal_volume = " ">
<primary_citation_page_first = " ">
<primary_citation_page_last = " ">
2. For other related citation (if applicable)
<citation_id = "1 "> (e.g. 1, 2, 3 ...)
<citation_journal_abbrev = " ">
<citation_title = " ">
<citation_year = " ">
<citation_journal_volume = " ">
<citation_page_first = " ">
<citation_page_last = " ">
...(add more citations if needed)...
================CATEGORY 9: Molecule Names==============================
Enter the name of the molecule for each entity
The name of molecule should be obtained from the appropriate
sequence database reference, if available. Otherwise the gene name or
other common name of the entity may be used.
e.g. HIV-1 integrase for protein
RNA Hammerhead Ribozyme for RNA
The number of entities should be the same as in CATEGORY 1.
<molecule_name = " "> (entity 1)
<molecule_name = " "> (entity 2)
...(add more if needed)...
================CATEGORY 10: Molecule Details============================
Enter additional information about each entity, if known. (optional)
Additional information would include details such as fragment name
(if applicable), mutation, and E.C.number.
1. For entity 1
<Molecular_entity_id = "1 "> (e.g. 1, 2, ...)
<Fragment_name = " "> (e.g. ligand binding domain, hairpin)
<Specific_mutation = " "> (e.g. C280S)
<Enzyme_Comission_number = " "> (if known: e.g. 2.7.7.7)
2. For entity 2
<Molecular_entity_id = "2 ">
<Fragment_name = " ">
<Specific_mutation = " ">
<Enzyme_Comission_number = " ">
...(add more if needed)...
================CATEGORY 11: Genetically Manipulated Source==============
Enter data in the genetically manipulated source category
If the biomolecule has been genetically manipulated, describe its
source and expression system here.
1. For entity 1
<Manipulated_entity_id = "1 "> (e.g. 1, 2, ...)
<Source_organism_scientific_name = " "> (e.g. Homo sapiens)
<Source_organism_gene = " "> (e.g. RPOD, ALKA...)
<Expression_system_scientific_name = " "> (e.g. Escherichia coli)
<Expression_system_strain = " "> (e.g. BL21(DE3))
<Expression_system_vector_type = " "> (e.g. plasmid)
<Expression_system_plasmid_name = " "> (e.g. pET26)
<Manipulated_source_details = " "> (any other relevant information)
2. For entity 2
<Manipulated_entity_id = "2 ">
<Source_organism_scientific_name = " ">
<Source_organism_gene = " ">
<Expression_system_scientific_name = " ">
<Expression_system_strain = " ">
<Expression_system_vector_type = " ">
<Expression_system_plasmid_name = " ">
<Manipulated_source_details = " ">
...(add more if needed)...
================CATEGORY 12: Natural Source=============================
Enter data in the natural source category (if applicable)
If the biomolecule was derived from a natural source, describe it here.
1. For entity 1
<natural_source_entity_id = " "> (e.g. 1, 2, ...)
<natural_source_scientific_name = " "> (e.g. Homo sapiens)
<natural_source_organism_strain = " "> (e.g. DH5a , BMH 71-18)
<natural_source_details = " "> (e.g. organ, tissue, cell ..)
2. For entity 2
<natural_source_entity_id = " ">
<natural_source_scientific_name = " ">
<natural_source_organism_strain = " ">
<natural_source_details = " ">
...(add more if needed)...
================CATEGORY 13: Synthetic Source=============================
If the biomolecule has not been genetically manipulated or synthesized,
describe its source here.
1. For entity 1
<synthetic_source_entity_id = " "> (e.g. 1, 2, ...)
<synthetic_source_description = " "> (if known)
2. For entity 2
<synthetic_source_entity_id = " ">
<synthetic_source_description = " ">
...(add more if needed)...
================CATEGORY 14: Keywords===================================
Enter a list of keywords that describe important features of the deposited
structure.
For example, beta barrel, protein-DNA complex, double helix,
hydrolase, structural genomics etc.
<structure_keywords = " ">
================CATEGORY 15: Ensemble===================================
Enter data in category ensemble
Skip this section, if only one average structure has been deposited.
<conformers_calculated_total_number = " "> (e.g. 200)
<conformers_submitted_total_number = " "> (e.g. 20)
<conformers_selection_criteria = " "> (e.g. 20 structures for lowest energy)
================CATEGORY 16: Representative Conformers==================
Enter data in category representative conformers
Normally, only one of the ensemble is selected as a representative
structure.
<conformer_id = " "> (e.g. 1,2..)
<conformer_selection_criteria = " "> (e.g.lowest energy, fewest violations)
================CATEGORY 17: Sample Details=============================
Enter a description of each NMR sample, including the solvent system used.
1. for sample 1.
<solution_id_1= "1 "> (e.g. 1, 2.. )
<solution_content_1= " "> (e.g. 50mM phosphate buffer NA; 90% H2O, 10% D2O)
<solvent_system_1= " "> (e.g. 90% H2O, 10% D2O )
2. for sample 2.
<solution_id_2= " ">
<solution_content_2= " ">
<solvent_system_2= " ">
....add more if needed....
================CATEGORY 18: Sample Conditions==========================
Enter experimental conditions used for each sample.
Each set of conditions is identified by a numerical code.
1. for sample 1.
<Conditions_id_1 = "1 "> (e.g. 1, 2..)
<Temperature_1 = " "> (e.g. 298) (in Kelvin)
<Pressure_1 = " "> (e.g. ambient, 1atm)
<pH_value_1 = " "> (e.g. 7.2)
<Ionic_strength_1 = " "> (e.g. 100MM KCL)
2. for sample 2.
<Conditions_id_2 = " ">
<Temperature_2 = " ">
<Pressure_2 = " ">
<pH_value_2 = " ">
<Ionic_strength_2 = " ">
....add more if needed....
================CATEGORY 19: Spectrometer===============================
Enter the details about each spectrometer used to collect data.
1. for experiment 1:
<spectrometer_id_1 = "1 "> (e.g. 1, 2..)
<spectrometer_manufacturer_1 = " "> (e.g. Bruker ..)
<spectrometer_model_1 = " "> (e.g. DRX)
<spectrometer_field_strength_1 = " "> (e.g. 500, 700)
2. for experiment 2:
<spectrometer_id_2 = " ">
<spectrometer_manufacturer_2 = " ">
<spectrometer_model_2 = " ">
<spectrometer_field_strength_2 = " ">
....add more if needed....
================CATEGORY 20: Experiment Type============================
Enter information for those experiments that were used to generate
constraint data. For each NMR experiment, indicate which sample and
which sample conditions were used for the experiment.
1. for experiment type 1:
<experiment_type_id_1 = "1 "> (e.g. 1, 2..)
<solution_type_id_1= " 1"> (same ID as solution_id_1 in CATEGORY 17)
<conditions_type_id_1 = "1 "> (same ID as conditions_id_1 in CATEGORY 18)
<Experiment_type_1= " "> (e.g. 3D_15N-separated_NOESY)
2. for experiment type 2:
<experiment_type_id_2 = " "> (e.g. 1, 2..)
<solutio
|