Data formats for automated structure determination with SOLVE
Your choices about input data include:
These choices are discussed below in more detail. See also the SAMPLE
SCRIPTS.
Should you merge your data to the asymmetric unit before running
SOLVE?
SOLVE can read unmerged data or data merged to the asymmetric unit.
- PREMERGED data is best if your data is already well scaled
- UNMERGED data is best if your data has not been thoroughly scaled already
Can you input more than one data file for a native, derivative, or
wavelength?
For each native, derivative, or wavelength dataset, you can input one or
more separate data files.
- If a dataset has just one data file, just read in the datafile
- If a dataset consists of several data files, just read them in one after another
What data format? Amplitudes or intensities?
if you have DENZO-SCALEPACK output as
your raw data...
...and the data is NOT MERGED to the asymmetric unit, you will use the
flags:
-
READDENZO
-
UNMERGED
-
READ_INTENSITIES
if the data is ALREADY MERGED to the asymmetric unit, substitute the flag:
if you have FREE-FORMAT intensities or
amplitudes as your raw data...
...and the data looks like: H K L I SIGMA, use the flags
- READFORMATTED
- UNMERGED
- READ_INTENSITIES
if the data looks like: H K L I+ SIGMA+ I- SIGMA-, substitute the flag:
if you have free-format F(hkl) instead of intensities:
-
substitute the flag READ_AMPLITUDES
if you have a CCP4 MTZ file with amplitudes
scaled and reduced to the asymmetric unit as your raw data...
-
You will have to make sure that this mtz file contains only the data you
want and not lots of other columns of data
-
Note what you have called your data columns
-
The column names that SOLVE will want assigned are:
- MAD data:
- FPH1 (amplitude at wavelength 1)
- SIGFPH1 (sigma of FPH1)
- DPH1 (anomalous difference wavelength 1)
- SIGDPH1 (sigma of DPH1)
- FPH2 (etc for wavelength 2, 3 ...)
- MIR data:
- FP (amplitude for native)
- SIGFP (sigma of FP)
- FPH1 (amplitude for deriv 1)
- SIGFPH1 (sigma of FPH1)
- DPH1 (anomalous difference deriv 1)
- SIGDPH1 (sigma of DPH1)
- FPH2 (etc for derivs 2, 3 ...)
use the flags LABIN and HKLIN to tell SOLVE how to read your mtz file.
You can use multiple LABIN statements if you can't fit it all on one line.
A sample LABIN statement where native F is called FP and sigma is SIG and
deriv F is called FHG and sig of deriv F is SIGHG and anom diff for deriv
is called DELHG and its sigma is SIGDELHG and with an input file of input.mtz
is:
- LABIN FP=FP SIGFP=SIG FPH1=FHG SIGFPH1=SIGHG
- LABIN DPH1=DELHG SIGDPH1=SIGDELHG
- HKLIN input.mtz
- NOTE: use uppercase letters (unless your column names are lowercase) because
case matters here
- SOLVE figures out if this is MIR or MAD data based on whether or not you define FP and SIGFP.
- When SOLVE reads the HKLIN statement it will read in the file using the
information in all previous LABIN statements. HKLIN can be specified only
once in a SOLVE run.
- You do not need to input cell dimensions or space group if you use HKLIN.
The values read from the mtz file are used unless you change them with
a keyword after the HKLIN statement. SOLVE writes out a symmetry file in
the local directory based on the symmetry information in the mtz file that
you can use later if you wish. It is named with the space group name.
- NOTE: remove the SCALE_MAD command from your script file as your data is
assumed to be scaled already
- if you have a set of CCP4 MTZ files with unmerged intensities (LABIN I=I SIGI=SIGI)use the flag:
- READCCP4_UNMERGED !(instead of readdenzo or readformatted or readtrek)
- Enter data file names just as for readdenzo or premerged
- You may not specify a LABIN line with this option. Your mtz file must contain I and SIGI as the column labels.
if you have a d*TREK file with intensities as
your raw data...
use the flag READTREK (just one flag needed)
What if I have phases from molecular
replacement?
- If you have an "mtz" file containing FC PHIC FOM then specify (myFC is your column name for FC, etc):
- PHASES_LABIN FC=myFC PHIC=myPHIC FOM=myFOM
- PHASES_MTZ xxxx.mtz
- If you have a formatted file with H K L FC PHIC FOM (one record per line; there can be text in between the numbers, such as in CNS or X-PLOR formatted files), then specify:
- PHASES_FORMATTED xxxxx.fmt
- That's it. Put these lines somewhere in your input file before "SOLVE" and SOLVE will read in these phases and use them in initial difference Fouriers to find sites.