Data formats for automated structure determination with SOLVE

Your choices about input data include:

Premerged or unmerged data?
One data file or more per dataset?
What data format? Intensities or amplitudes? (Scalepack, formatted, CCP4 mtz, d*trek)
An additional file with phases (i.e. from MR)?

These choices are discussed below in more detail. See also the SAMPLE SCRIPTS.

Should you merge your data to the asymmetric unit before running SOLVE?

SOLVE can read unmerged data or data merged to the asymmetric unit.

PREMERGED data is best if your data is already well scaled
UNMERGED data is best if your data has not been thoroughly scaled already

Can you input more than one data file for a native, derivative, or wavelength?

For each native, derivative, or wavelength dataset, you can input one or more separate data files.

If a dataset has just one data file, just read in the datafile
If a dataset consists of several data files, just read them in one after another

What data format? Amplitudes or intensities?

if you have DENZO-SCALEPACK output as your raw data...

...and the data is NOT MERGED to the asymmetric unit, you will use the flags:

READDENZO
UNMERGED
READ_INTENSITIES

if the data is ALREADY MERGED to the asymmetric unit, substitute the flag:

PREMERGED

if you have FREE-FORMAT intensities or amplitudes as your raw data...

...and the data looks like: H K L I SIGMA, use the flags

READFORMATTED
UNMERGED
READ_INTENSITIES

if the data looks like: H K L I+ SIGMA+ I- SIGMA-, substitute the flag:

PREMERGED

if you have free-format F(hkl) instead of intensities:

substitute the flag READ_AMPLITUDES

if you have a CCP4 MTZ file with amplitudes scaled and reduced to the asymmetric unit as your raw data...

You will have to make sure that this mtz file contains only the data you want and not lots of other columns of data
Note what you have called your data columns
The column names that SOLVE will want assigned are:
- MAD data:
  - FPH1 (amplitude at wavelength 1)
  - SIGFPH1 (sigma of FPH1)
  - DPH1 (anomalous difference wavelength 1)
  - SIGDPH1 (sigma of DPH1)
  - FPH2 (etc for wavelength 2, 3 ...)
MIR data:
- FP (amplitude for native)
- SIGFP (sigma of FP)
- FPH1 (amplitude for deriv 1)
- SIGFPH1 (sigma of FPH1)
- DPH1 (anomalous difference deriv 1)
- SIGDPH1 (sigma of DPH1)
- FPH2 (etc for derivs 2, 3 ...)

use the flags LABIN and HKLIN to tell SOLVE how to read your mtz file. You can use multiple LABIN statements if you can't fit it all on one line. A sample LABIN statement where native F is called FP and sigma is SIG and deriv F is called FHG and sig of deriv F is SIGHG and anom diff for deriv is called DELHG and its sigma is SIGDELHG and with an input file of input.mtz is:

LABIN FP=FP SIGFP=SIG FPH1=FHG SIGFPH1=SIGHG
LABIN DPH1=DELHG SIGDPH1=SIGDELHG
HKLIN input.mtz
NOTE: use uppercase letters (unless your column names are lowercase) because case matters here

SOLVE figures out if this is MIR or MAD data based on whether or not you define FP and SIGFP.
When SOLVE reads the HKLIN statement it will read in the file using the information in all previous LABIN statements. HKLIN can be specified only once in a SOLVE run.
You do not need to input cell dimensions or space group if you use HKLIN. The values read from the mtz file are used unless you change them with a keyword after the HKLIN statement. SOLVE writes out a symmetry file in the local directory based on the symmetry information in the mtz file that you can use later if you wish. It is named with the space group name.
NOTE: remove the SCALE_MAD command from your script file as your data is assumed to be scaled already

if you have a set of CCP4 MTZ files with unmerged intensities (LABIN I=I SIGI=SIGI)use the flag:
- READCCP4_UNMERGED !(instead of readdenzo or readformatted or readtrek)
- Enter data file names just as for readdenzo or premerged
- You may not specify a LABIN line with this option. Your mtz file must contain I and SIGI as the column labels.

if you have a d*TREK file with intensities as your raw data...

use the flag READTREK (just one flag needed)

What if I have phases from molecular replacement?

If you have an "mtz" file containing FC PHIC FOM then specify (myFC is your column name for FC, etc):
PHASES_LABIN FC=myFC PHIC=myPHIC FOM=myFOM
PHASES_MTZ xxxx.mtz
If you have a formatted file with H K L FC PHIC FOM (one record per line; there can be text in between the numbers, such as in CNS or X-PLOR formatted files), then specify:
PHASES_FORMATTED xxxxx.fmt
That's it. Put these lines somewhere in your input file before "SOLVE" and SOLVE will read in these phases and use them in initial difference Fouriers to find sites.