HEAVY: Heavy atom refinement and phasing
[MAD phasing | Refinement | Rejecting data | Statistical output |]
[Correlated phasing | Typical cycle| Changing heavy atom parameters]
HEAVY is a general-purpose heavy atom refinement routine. It can be used to carry out either phase refinement or origin-removed Patterson refinement, as well as to calculate coefficients for native Fourier and difference Fourier maps. Ordinarily you will use HEAVY as part of an automated structure solution with SOLVE. In this case SOLVE will write out a "phases-hl.script" script file that you can edit and use for further refinement. This should usually mean that you will not have to generate the rather lengthy inputs to HEAVY by yourself.
Keywording inputs are most conveniently entered using a script file. Note that any values previously defined do not need to be specified. If you run HEAVY a second time without quitting the main program and do not specify any new parameters, the routine will start where it left off and carry out another set of refinements of the same type that you specified the last time you ran it. Note also that average residuals are maintained throughout. This means that if you want to refine a completely new set of data, you should start the program over.
An easy way to get a file containing all the keywords you can use for HEAVY is to edit the file generated by running HEAVY specifiying "newfile heavy.new".
There are 3 ways to phase MAD data using HEAVY.
(1) You can use MADMRG to compress MAD data to data that look like SIR+anomalous data at one wavelength (e.g., L1), then refine heavy atom parameters and phase just as if you did have SIRAS data. This gives you F and phase for the structure without the anomalously scattering atoms.
(2) You can use MADMRG and then Bayesian phasing. This is what SOLVE does. In this case you convert MAD data to SIRAS using MADMRG, then refine heavy atom parameters all as in #1. Then you use these heavy atom parameters with the original MAD data to phase it using Bayesian MAD phasing. As you have already refined the heavy atom parameters at L1, you do not need to redo the refinement at L2, L3 because they are all the same. You simply run HEAVY again, specifying the keyword IMADPHASE n (where n=the wavelength # for which heavy atom structure factors are to be calculated), and specifying REFINENONE for each heavy atom site. You also need to specify a new input file that has (instead of the MADMRG data or FBAR,DELANO data) the complete F+,sig+,F-,sig- scaled MAD data. This can be the data file used to create the Fbar,delano data file. You specify column numbers with NCOLFPLUS, NCOLSIGPLUS, NCOLFMINUS, NCOLSIGMINUS. Derivative 1 is now your L1 data, derivative 2, L2, etc. You include the refined heavy atom parameters for your L1 derivative and dummy atoms with the correct scattering factors for the L2 and L3 derivatives. The dummy atoms are used to calculate scattering at these other wavelengths; they are not used in phasing per se. You don't even have to put in any occupancies or xyz for these atoms. In this way, you have 1 set of heavy atom parameters that are applied to all 3 wavelengths. If your heavy atoms are not selenium at L1, L2, and L3, you will need to use the NEWATOMTYPE keyword to input their scattering factors.
The SOLVE routine with MAD data does all this without even leaving SOLVE. You can also run HEAVY once after MADMRG, then edit the HEAVY.NEW (or whatever you have called it) file as described above, then run HEAVY again with IMADPHASE specified.
If you specify IMADPHASE 1 then your output map will be calculated at the lambda of "derivative" 1, if you specify n then it will be at the wavelength of dataset n. The phasing will be the same in any case. Note that the value of "n" you specify will determine which heavy atom values are used in the MAD phasing calculation. If you specify "2" then the heavy atom parameters that you type in for derivative 2 will be used.
(3) Refinement as if the MAD data were MIR data. In this case, you choose one wavelength (the one with a small f" and the most negative f' usually, often called "L1") as "native" and treat the other wavelength data as derivatives. In this case you will need to define new "atoms" with the NEWATOMTYPE keyword that have values of f" that are actual values, but values of f' that are the difference between the value at that wavelength and the value at L1, and values of all the other parts (a1, b1, etc) of zero. For example, if the L1 values of f' and f" are -9.6 and 2.2, and at L2 they are -7.6 and 5.8, then you need a new atom type as follows for the f' difference:
NEWATOMTYPE L2L1 AVAL 0.00000000 0.00000000 0.00000000 0.00000000 BVAL 0.00000000 0.00000000 0.00000000 0.00000000 CVAL 0.00000000 FPRIMV 2.0 FPRPRV 5.8
You then use "L2L1" as your atomname for heavy atoms in derivative 2 (L2). If you have 3 wavelength MAD data, you now have 1 native and 2 derivatives. You can refine the heavy atom parameters of the 2 derivatives in just the usual way and obtain phases for the L1 data (including the heavy atoms) as if this were MIR data.
Refinement against the origin-removed Patterson map
Refinement against an origin-removed Patterson map is a way of refining heavy atom parameters of each derivative independently, and is particularly useful because the occupancies of heavy atom sites are quite accurately estimated and because the refinement is very fast. When using this package, the recommended refinement method is this one, with JALT=0 and KALT=0.
This refinement minimizes the sum over all reflections of,
R = WGT * DEL**2
with respect to heavy atom parameters. WGT is a weighting factor, and DEL is defined as:
DEL = (Fph-Fnat)**2 - K*FH**2 - < (Fph-Fnat)**2 - K*FH**2 >
where the average <> is taken in a shell of resolution and FH is the magnitude of the calculated heavy atom structure factor. K is 1 for centric reflections, 1/2 for acentric reflections.
Rejecting data with large Del F's:
HEAVY uses all the data that you give it that satisfy the criteria of minimum F, FOM, etc that are set above. If you want it to reject data with especially large Del F, then you need to specify this when you scale the data with LOCALSCALE. There is an option in localscale to "reject large del F" (TOSSBAD). Use this to get rid of the large Del F before going into HEAVY.
Interpreting statistical output from HEAVY
Many of the values listed at the end of a set of refinements are more-or-less self explanatory. This should include the number of reflections read, within resolution limits, and greater than the minimum figure of merit. As these statistics are usually printed for a cycle in which refinement is not carried out, the number of reflections used to refine is usually zero in this listing.
The statistical output for MAD phasing using Bayesian correlated MAD phasing is not as intuitive as the output for MIR phasing because the standard phasing statistics do not really apply. If are running SOLVE in automatic format and you want approximate phasing statistics, you can run SOLVE specifying "NOBAYES". This will suppress the Bayesian correlated MAD phasing at the end of SOLVE and use SIRAS phasing which isn't quite as good but for which the statistics are easy to understand.
Other values listed at the end of a set of refinements include:
RMS HEAVY ATOM F: The rms value of the calculated heavy atom F in the resolution range
RMS PHASE AVG'D RESIDUAL: This is the rms value of the difference between calculated and observed derivative F, where it is averaged not only over all reflections, but over all phases for each reflection, weighted by the phase probability
RMS(FH)/RMS(E): This is the ratio of the rms heavy atom F to the rms phase averaged residual
CENTRIC R FACTOR: This is <| |Fder-Fnat| - |FH| | >/< |Fder-Fnat| >
RMS DERIVATIVE F: This is the rms value of Fder
RMS SIGMA OF FPH: This is the rms sigma of Fder
RMS SIGMA OF FP: This is the rms sigma of Fnat
RMS OBSERVED DIFFERENCE: For anomalous differences, this is the rms value of DelAno= (F+ - F-)
RMS CALCULATED DIFFERENCE: This is the rms calculated anomalous difference
MEAN RATIO OF ISO TO ANO: This is the ratio of calculated |FH| due to normal scattering relative to that due to anomalous scattering. If all anomalous scatterers are identical, this is equal to (f+f')/f" for that anomalous scatterer.
RMS(RES HA SF+LACK OF ISO SF): This is an estimate of the total errors in the heavy atom model plus lack of isomorphism that remain. It is obtained from the rms phase averaged residual and the rms native and derivative sigmas.
RMS LACK OF ISOMORPHISM SF: This is an estimate of the remaining lack of isomorphism. It is based on a comparison of the anomalous and isomorphous differences that remain
RMS RESIDUAL HEAVY ATOM SF: This is an estimate of the remaining heavy atom structure factor, based on the anomalous differences and the errors in measurement.
CENTRIC LOC: This is an estimate of the "centric" lack-of-closure residual, obtained using both centric and acentric reflections and correcting acentric lack-of-closure residuals by a factor of 2. These residuals are all corrected for errors in measurement, so that if the derivative is "solved" and there is little lack of isomorphism, these values should all be near zero.
ANOMALOUS LOC: This is the lack-of-closure error for anomalous differences, corrected for errors in measurement.
If you specify the keyword "CORRELPHASE" then HEAVY will use a routine in phasing that takes into account the correlations in non-isomorphism errors among the derivatives. The derivatives must be grouped into sets with correlated errors. You can specify this grouping using IEGROUP (see below) or you can let HEAVY group them for you using the flag GETGROUPS. Note that correlated phasing makes a major improvement in the phasing power of a set of derivatives if the errors are highly correlated (>50%). If they are not highly correlated, the routine yields essentially the same results as the standard routine.
Whether or not correlated phasing is being used, the correlation of errors among derivatives is analyzed by HEAVY. An example of part of a log file that illustrates this is shown below:
-----------------------------example------------------------------------------- Analysis of correlated modeling and non-isomorphism errors obtained using phased residuals. The derivatives were grouped into 1 sets where the members of a set had some mutual correlation. Set 1 contains derivatives 1 2 3 SUMMARY OF CORRELATED ERRORS AMONG DERIVATIVES DERIVATIVE: 1 CENTRIC REFLECTIONS: DMIN: ALL 10.81 6.94 5.46 4.65 4.11 3.73 3.43 3.20 RMS errors correlated and uncorrelated with others in group: Correlated: 363.5 322.2 291.7 253.8 458.7 434.5 371.2 404.0 337.8 Uncorrelated: 285.9 362.3 340.3 292.9 288.9 279.9 229.0 201.9 174.7 Correlation of errors with other derivs: DERIV 2: 0.83 0.66 0.76 0.70 0.88 0.89 0.89 0.98 1.00 DERIV 3: 0.74 0.59 0.64 0.51 0.78 0.83 0.82 0.86 0.95 --------------------------------------------------------------------------------
In this example, there are 3 derivatives, all in the same group (IEGROUP=1 for each). Of the lack-of-closure errors for derivative 1, most (363.5, arbitrary units) were correlated with derivatives 2 and 3, and some (285.9) were unique to this derivative. The overall correlation of errors with derivatives 2 and 3 were 83% and 74%, respectively. In fact, correlated phasing made a major improvement in the phasing for this group of derivatives.
Normal refinement/phasing cycles.
A. Refinement vs. origin-removed Patterson map.
Input parameters: all defaults used
NCYCLE = 1 to 30
IREFCY(I) = 1,1,1,2,2,2.....6,6,6,0
results:
Zeroth cycle: phases calculated for all derivatives identified with INPHASE using input lack-of-closure residuals. New lack-of-closure residuals are calculated for these derivatives. Statistics are printed.
Cycles 1 through NCYCLE-1: in this example, IREFCY(I) is zero onlast cycle, but non-zero for all other cycles. For each cycle when IREFCY(I) is non-zero: no phases are calculated no new residuals are calculated derivative IREFCY(I) is refined as described above
Note that only 1 derivative is refined at a time and all are independent. Therefore in polar space groups, the coordinate(s) of at least one atom in each derivative must be fixed. In space group P1 parameters for a single heavy atom may not be refined at all. If two atoms are present, the occupancy, xyz, B of one of them only may be refined. If you use IHEAVYPROC then all this is taken care of for you (in P1 SOLVE will refine with phase refinement if any derivs have fewer than 3 sites, otherwise it will use patterson refinement).
Cycle NCYCLE: IREFCY(NCYCLE)=0 in this example, so this cycle is like the zeroth cycle: phases are calculated, new residuals calculated. If KOUT is non-zero, output data are calculated as well.
B. Refinement by minimization of lack-of-closure at most probable phase.
Input parameters: all default except JALT=1, KALT=0.
Results: identical to the above example except:
(1) phases will be calculated every cycle
(2) derivatives will be refined by minimization of (Fph-Fc)**2
This is not the recommended manner of using HEAVY in this package. In most circumstances origin-removed Patterson refinement is much more accurate. There are some instances in which phase refinement may be useful, however. One is when it is necessary to correlate the origins in different derivatives. In space group C2, for example, the y-coordinate is indeterminate. That means that if you have two derivatives and refine them independently, you will not have refined the relative y-coordinates of the atoms in the two derivatives (though you will have refined the relative y-coordinates of atoms within each derivative). You might wish to use phase refinement to carry this out, using one derivative to phase and refining y-coordinates in the other derivative. In practice, however, these relative y-coordinates can be obtained even more accurately by simply calculating a difference Fourier for one derivative, phasing with the other derivative. The centroid of the peak corresponding to the heavy atom site (which can be found, for example, by PEAKSEARCH in this package) will give you the relative y-coordinate you need with very good accuracy, and refinement of this coordinate is unnecessary. This is how SOLVE does this.
Note that still only 1 derivative may be refined at one time. (If you really want to phase only once per refinement of all derivatives, calculate phases during one run and write them out with KOUT=7. Then merge file containing phases with input DORGBN file (3 extra columns). Then run HEAVY with INPHAS=0 for each derivative and specifying INOLD=1. Also set INRESD=-1. The program will then use the input phases during phase refinement if JALT=1. Its probably faster to just phase each time.)
C. Just calculating phases and a map or other output.
Input: all default, except NCYCLE >0
If the input lack-of-closure residuals are ok., you can set INRESD = -1 so that new residuals will not be calculated and a zeroth cycle will not be included. Otherwise leave INRESD = 0.
Specify the type of map with KOUT, the derivative (if applicable) with KDER.
D. Carrying out a procedure with IHEAVYPROC. Heavy has the capability of carrying out an ordered sequence of refinements. These are useful if you want to carry out refinement in a semi-automatic fashion. When you specify a procedure with iheavyproc, you need to specify all the parameters that you want refined at all. Then the procedure you choose decides which parameters to refine on which cycles. Usually you will specify REFINEALL for all atoms, then let the procedures decide which to refine. If you use a procedure, the program will automatically fix all coordinates that cannot possibly be refined. For example, in space group C2 one atom in each derivative must have y fixed if origin-removed Patterson refinement is used, because the y-direction is polar. The program will fix the coordinate(s) of the atom that is the strongest in each derivative. If you have already fixed the coordinate(s) of an atom in a derivative (by not specifying that they be refined) then the program will just fix the atom you chose and not fix any others.
Note that you can carry out any series of refinements that you wan by setting up all your keywords for the first type of refinement, initiating refinement with the command HEAVY, then going back to KEYWORD mode, specifying the next type of refinement without changing or setting any other parameters unless you want to, then initiating the next refinement cycles with HEAVY, and so on. For example, you might type in all your heavy atom parameters, finishing with
... NREP 5 IHEAVYPROC 2 ! now refine 5 cycles with iheavyproc=2 HEAVY NREP 7 IHEAVYPROC 4 ! now refine 7 cycles with iheavyproc=4 HEAVY
This sequence of commands results in 5 cycles of refinement of xyz of all atoms that you specified refinement of xyz, then 7 cycles of refinement of xyz,occ, and B of all atoms that you specified these parameters to be refined in. You can do this sort of thing in any order and ad infinitum if you wish.
Note that there is no procedure to refine just thermal factors. With this package there is no need to alternately refine occupancies and thermal factors. If there is insufficient data (i.e., very low resolution) to refine both occupancies and thermal factors, then set the thermal factors to any reasonable value and just refine the occupancies.
Changing heavy atom parameters after you have gone on to the next atom or derivative
If you want to change which parameters for which atoms are refined after you have already set up the atoms and refinement parameters, then you have to use a special way to reset them. The reason you have to do something special is that if you say "DERIVATIVE" then the routine assumes you are inputting data for a new derivative, so you can't go back to a previous one with that command. Instead, you type:
GOTODERIV 2 ( to go to derivative #2) GOTOATOM 3 (to atom #3 in deriv #2) REFINENONE (set all refinement flags back to zero) REFINEXYZB (or whatever you want to refine for this atom) GOTOATOM 1 ( now do atom 1 in deriv 2) GOTODERIV 1 (now do derivative 1)