The routine SOLVE: the core of automated structure determination by SOLVE
The SOLVE routine is an exceptionally powerful feature of this package that can find and evaluate the quality of heavy-atom sites in a MIR, SIR, or MIR-like dataset. The SOLVE routine treats MAD data almost exactly like MIR data, beginning with the output from MADMRG and MADBST.
Ordinarily SOLVE is called after SCALE_MAD and ANALYZE_MAD or SCALE_MIR and ANALYZE_MIR as part of automated structure determination. In this case you don't have to worry about all the keywords because the previous routines figure them out for you and write them to the script file solve_mad.script (or solve_mir.script).
You can, however, control much of what SOLVE does by setting keywords before running it. SOLVE can also be called using the solve_mad.script or solve_mir.script file written out by ANALYZE_MAD or an edited version of this file.
For MAD datasets, SOLVE uses a "compressed" form of MAD data that can be analyzed much more rapidly than the full n-wavelength data. This compressed dataset is generated by MADMRG in ANALYZE_MAD . The compressed dataset essentially consists of the SIR+anomalous scattering equivalent to the full MAD dataset. This dataset can be used to refine heavy atom parameters and generate native phases more quickly than a MAD dataset can. At the conclusion of SOLVE, phases are calculated with full Bayesian correlated MAD phasing.
The SOLVE routine operates by using a new version of HASSP to generate a few or many possible "seed" solutions for the anomalously scattering atoms in the structure. The heavy-atom parameters in each seed are first refined using the very fast refinement procedure in HEAVY (origin-removed patterson refinement). The refined seed is then used in self-difference Fouriers to suggest possible additional sites. A number of solutions are scored based on each seed, each solution being evaluated based on both the difference Patterson and a "free" difference Fourier. Additionally, the non-randomness of the native Fourier is used to judge the quality of a solution and to identify the correct hand of the structure if anomalous data is present. The figure of merit of phasing is the final scoring criterion.
If desired, a solution may be read in and evaluated directly with ANALYZE_SOLVE. Also, a solution may be read in and used as a seed in generating additional sites and a more complete solution with ADDSOLVE.
Using SOLVE is quite easy, particularly since ANALYZE_MAD or ANALYZE_MIR writes out a script file (usually solve_mad.script or solve_mir.script) that has everything you need to run SOLVE.
The only really non-obvious thing you need to know about running SOLVE on MAD data is that it requires 2 input data files. One is the compressed datafile from MADMRG, usually called "solve.data". The other is the full MAD dataset, usually called "mad_fpfm.scl". SOLVE uses "solve.data" for most of its analyses, then switches to the full MAD dataset at the very end.
The way you enter information on scattering factors is a little different in the SOLVE routine from the way it was entered in SCALE_MAD and ANALYZE_MAD . In the SOLVE routine you define atom types for each wavelength and specify the scattering factors for that atom type. Then you tell SOLVE what atom type goes with which wavelength. In SCALE_MAD, in contrast, you specified scattering factors directly for each wavelength. The reason for the difference is that SOLVE has to deal with both MAD and MIR data and defining atom types is a simple way to do that.
The solve_mad.script control file for MAD data
A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MAD routine.
This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:
!------------------solve_mad.script: solve a MAD problem---------------------- @solve.setup LOGFILE solve.logfile INFILE solve.data !input file with MADMRG-compressed data MADFPFMFILE mad_fpfm.scl !input file with full MAD dataset JSTD 1 ! Lambda 1 is reference wavelength used in MADMRG IMADPHASE 1 ! this is a MAD dataset, reference ! wavelength is #1 (should match jstd) NNATF 1 ! Pseudo-native F is column 1 of solve.data NNATS 2 ! sigma is column 2 ! Atom definitions with f' and f" values for the 3 wavelengths: NEWATOMTYPE LAM1 AVAL 17.0006 5.8196 3.9731 4.3543 BVAL 2.4098 .2726 15.2372 43.8163 CVAL 2.8409 FPRIMV -1.6 FPRPRV 3.4 NEWATOMTYPE LAM2 AVAL 17.0006 5.8196 3.9731 4.3543 BVAL 2.4098 .2726 15.2372 43.8163 CVAL 2.8409 FPRIMV -8.5 FPRPRV 4.8 NEWATOMTYPE LAM3 AVAL 17.0006 5.8196 3.9731 4.3543 BVAL 2.4098 .2726 15.2372 43.8163 CVAL 2.8409 FPRIMV -9.85 FPRPRV 2.86 LAMBDA 1 ! This is wavelength #1 LABEL Wavelength 1 from MADMRG ! label for lambda 1 NCOLFBAR 3 ! Ncolfbar...ncolsdelf are column #'s NCOLSFBAR 4 ! in solve.data (MADMRG-compressed) NCOLDELF 5 ! datafile NCOLSDELF 6 INPHASE INANO NOREFINESCALE ! Don't refine overall scale factor ! because this is MADMRG data ! Information for MADPHASE: NCOLFPLUS 1 ! these 4 column numbers refer to the NCOLSIGPLUS 2 ! full MAD datafile (mad_fpfm.scl) NCOLFMINUS 3 NCOLSIGMINUS 4 ! Heavy atoms for this wavelength: ATOMNAME LAM1 ! "LAM1" tells the program to use OCCUPANCY .1 ! the scattering factors input above for BVALUE 35.0 ! LAM1 REFINEALL ! the occupancy and b values are guesses LAMBDA 2 LABEL Wavelength 2 from MADMRG NCOLFBAR 3 NCOLSFBAR 4 NCOLDELF 5 NCOLSDELF 6 INPHASE INANO ! Information for MADPHASE: NCOLFPLUS 5 NCOLSIGPLUS 6 NCOLFMINUS 7 NCOLSIGMINUS 8 ! Heavy atoms for this derivative/wavelength: ATOMNAME LAM2 LAMBDA 3 LABEL Wavelength 3 from MADMRG NCOLFBAR 3 NCOLSFBAR 4 NCOLDELF 5 NCOLSDELF 6 INPHASE INANO ! Information for MADPHASE: NCOLFPLUS 9 NCOLSIGPLUS 10 NCOLFMINUS 11 NCOLSIGMINUS 12 ! Heavy atoms for this derivative/wavelength: ATOMNAME LAM3 ! Information for HASSP and SOLVE NCOLFHCOS 9 ! column #s for <fh cos theta> NCOLFHSIN 10 ! and <fh sin theta> in solve.data PATTFFTFILE patterson.patt ! name of Bayesian patterson calculated ! by MADBST SOLVE ! run SOLVE !---------------------------------------------------------------------------
The solve_mir.script control file for MIR data
Using SOLVE is quite easy with MIR data too, particularly since ANALYZE_MIR writes out a script file that has everything you need to run SOLVE. A sample SOLVE script file that will give you an idea of what you need to specify and what other things you can specify follows. This script is an edited version of a script file written out by the ANALYZE_MIR routine.
This script file is written out during automated SOLVE operation. You may wish to edit the one SOLVE has written out for you and use it if:
!------------------solve_mir.script: solve an MIR problem---------------------- @solve.setup LOGFILE solve.logfile INFILE mir_fbar.scl !input file with Fnat,sig, and !(fbar,sig,delano,sig) for each derivative.. NNATF 1 ! Native F is column 1 of mir_fbar.scl NNATS 2 ! sigma is column 2 Derivative 1 ! begin information about derivative 1 LABEL deriv 1 HG ! label for deriv 1 NCOLFBAR 3 ! Ncolfbar...ncolsdelf are column #'s NCOLSFBAR 4 ! in mir_fbar.scl datafile NCOLDELF 5 NCOLSDELF 6 INANO ! include anomalous differences ! Heavy atoms for this derivative: ATOMNAME HG ! the atom type is "HG" OCCUPANCY .1 ! guess for occupancy BVALUE 35.0 ! guess for bvalue REFINEALL ! refine everything that is reasonable Derivative 2 ! begin information about derivative 2 LABEL deriv 2 Iodine ! label for deriv 2 NCOLFBAR 7 ! Ncolfbar...ncolsdelf are column #'s NCOLSFBAR 8 ! in mir_fbar.scl datafile NCOLDELF 9 NCOLSDELF 10 INANO ! include anomalous differences ATOMNAME I- ! the atom type is "I-" OCCUPANCY .1 ! guess for occupancy BVALUE 35.0 ! guess for bvalue REFINEALL ! refine everything that is reasonable SOLVE ! run SOLVE !---------------------------------------------------------------------------
There are a lot of keywords that can affect what SOLVE does. Ordinarily you do not have to worry about most of these because they are all set for you in ANALYZE_MAD. The solve_mad.script file written out by ANALYZE_MAD or the solve_mir.script file written by ANALYZE_MIR will have most of these keywords set for you. The keywords are listed here so that you can understand what they do and so that you can set them if you want to.
Most of these keywords can be specified at the beginning of automated data analysis to control what happens when SOLVE is called. For example, typing "ntopsolve 2" in the keywords before running SCALE_MAD and ANALYZE_MAD will affect SOLVE when it is called by restricting the number of solutions analyzed at the end of the routine to 2.
SOLVE treats MAD phasing and MIR phasing in almost exactly the same way except at the very end of the routine. Consequently "derivative" and "lambda" have the same meaning to SOLVE. You can enter information about lambda 1 by typing "lambda 1" or "derivative 1". The keywords that are specific to MAD phasing are listed at the top of the list.
Keywords that have a meaning for MAD data but not for MIR data:
INFILE xxx.data Principal input dorgbn-style file with compressed MAD data from MADMRG and optional additional columns of data. (usual file name = "solve.data"). This file is usually produced by ANALYZE_MAD. MADFPFMFILE yyy.scl Additional input file with (F+,sigma,F- ,sigma) for each wavelength will be yyy.scl. This file is used at the very end of SOLVE for Bayesian correlated MAD phasing if the keyword "bayes" is set in ANALYZE_MAD or the keyword "imadphase n" is set in SOLVE. All the wavelengths have "inphase" specified for this work. (DEFAULT="mad_fpfm.scl") JSTD n wavelength to be used as reference (default = lowest wavelength) IMADPHASE n This is a MAD dataset, n should match JSTD n NOREFINESCALE include this for all wavelengths usually because the refinements in SOLVE are based on MADMRG output which should not be further refined. If xx is not recognized by SOLVE you need to specify instead: Keywords that apply to both MAD and MIR data: NNATF n column # in "infile" for native F (pseudo-native for MAD) NNATS n column # in "infile" for sigma of native f gotoderiv n go to derivative (wavelength) n and get ready to read some modifications of the parameters for this wavelength gotoatom n go to the n'th atom in this wavelength/derivative and get ready to read some modifications of its parameters LABEL xxxxxx label for this wavelength/derivative NCOLFBAR n column # for Fbar for this wavelength/derivative For MAD data, this and the next three values are only needed for the one wavelength defined by JSTD For MIR data, they are needed for all derivatives NCOLSFBAR n column # for sigma of Fbar NCOLDELF n column # for delAno (if INANO is specified) NCOLSDELF n column # for sig of delAno NCOLFHCOS xx column # in "infile" for estimated heavy atom structure factor component along native structure factor. (Output from MADBST for MAD data). This will be used in calculation of heavy atom difference Fouriers if ncolfhsin is also specified. For MIR data, you can specify which derivative this applies to by replacing the "1" in "ncolfhcos(1)" with another derivative number. NCOLFHCOS is equivalent to NCOLFHCOS(1) NCOLFHSIN xx column # in "infile" for estimated of heavy atom structure factor perpendicular to native structure factor. See ncolfhcos. NCOLFHSIN is equivalent to NCOLFHSIN(1) PATTFFTFILE xxxxxx MAD data: Use previously calculated Patterson FFT xxxxxx as the patterson map for the anomalously scattering atoms in this MAD structure. MIR data: use patterson FFT xxxxx as patterson map for derivative #1. PATTFFTFILE is equivalent to PATTFFTFILE(1)(For other derivs, change the "1" to the appropriate derivative number).
Also see all the commonly-used keywords for SOLVE.