Linear prediction estimates the value of a point based on the values of adjacent points. This method can be used to replace corrupted values in an FID. For example, in NMR experiments with fast relaxation rates, data collection must begin almost immediately following an RF pulse, even before the receiver and preamplifier can fully recover from ringing and saturation caused by the RF pulse. Consequently, several of the first datapoint values can be corrupted by instrumentationinduced noise. The effect of this noise on the spectrum can be profound because it often amounts to large fractions of the highestamplitude signal values. Therefore the spectrum can improve noticeably when these corrupted point values are replaced with values estimated by linear prediction, where the estimated values are based on subsequent points of greater integrity.
A second application of linear prediction to NMR data is to extend an FID. This is useful for experiments in which data collection ceased before the signals completely decayed; that is, the FID is truncated. Here, the data values of the FID are used to estimate new data values that are appended to the end of the FID. Often the FID can be extended 20% or more by this means.
The Process1D/Transform/Linear Prediction menu item uses linear prediction to replace data values at the beginning or end of the FID and can also be used to extend the FID. The mathematical underpinnings of the linear prediction algorithm are discussed next, so that in the ensuing descriptions of the menu items, the settings in the control panels may be related to the mechanics of the algorithm.
Forward prediction is based on the assertion that the value of the nth data point X_{n} is determined from the values of K antecedent points:
Eq. 1 
where a_{k} are the linear prediction (LP) coefficients. These LP coefficients are determined from nk values of X_{n}, cast in matrix notation as:
Eq. 2 
In general, the matrix C is overdetermined; therefore, the solution to this equation is found by leastsquares analysis, which is based on a singular value decomposition in order to ensure a solution when C^{T}C is not invertible:
Eq. 3 
The matrices n, L, and u are obtained from the singular value decomposition of the matrix C. The matrix L is a diagonal matrix of singular values of the columns of n and u containing the right and left singular vectors.
With knowledge of the maximum number of signals in the FID, the number of singular values in the matrix L can be reduced by truncating the matrices n, L^{1}, and uT. The result is reduced noise in the predicted points (Makhoul 1978, Kumaresan and Tufts 1982, Barkhuijsen et al. 1985).
To assure that the predicted data points are well behaved, i.e. decaying, the technique of "root reflection" is often used. In this procedure, one constructs a complex polynomial defined by the LP coefficients. Then the roots are determined from this polynomial. For a noiseless FID, the roots should lie within the unit circle, which, of course, does not always hold for real data. To enforce proper behaviors, one reflects all the roots into the unit circle and reconstructs the polynomial from which a new set of LP coefficients are extracted (see e.g., Zhu and Bax 1993).
For backward linear prediction, you start with an analogous linear equation, as in Eq. 1:
Eq. 4 
The mathematics of determining b_{k} are similar to that used for forward prediction. However, you must pay attention when using root reflection that the roots in this case are reflected outside the unit circle.
Lastly, as pointed out by Zhu and Bax (1993), it is sometimes advantageous to perform both backward and forward predictions to obtain averaged LP coefficients. This procedure tends to be very good at suppressing noise and artifacts.
As mentioned earlier, linear prediction is accessed using the Process1D/Transform/Linear Prediction command. Choosing this displays a control panel. The following three controls, which determine how the LP calculation is performed are set in the control panels: Points to use, Number of coefficients, and Number of peaks. Points to use defines the number of points used to calculate the LP coefficients a_{k }or b_{k}. This corresponds to the value n in Eq. 1. The Number of coefficients corresponds to the value nk. And Number of peaks is used to truncate the matrices n, L, and u (Eq. 3).
FELIX has several alternative autophasing methods for 1D spectra and one method for 2D and 3D spectra. The basic method and the method based on peak integration are used exclusively for 1D spectra. The method based on PAMPAS (Dzakula 1999) or APSL (Heuer 1991) is commonly used for 1D, 2D, and 3D spectra and is described in this section.
For a 1D spectrum in the workspace or a selected dimension of an ND spectrum, FELIX divides the spectrum into 1040 segments of equal widths along the phasing dimension and tries to find a best sample peak per segment. The phase error of each individual sample peak is detected using the PAMPAS or APSL algorithm, and then the global zero and firstorder phase errors (represented by FELIX symbols phase0 and phase1) are obtained by linear regression analysis.
If solvent peaks or other noise peaks exist in the spectrum, it is important to exclude them from the phase detection process. FELIX allows you to define up to 10 excluded areas (using the aph exclude command). An excluded region is specified by the starting and ending datapoint numbers, and the dimension if it is an ND spectrum (which can be different from the phasing dimension). Any peaks that fall in the specified range along this dimension are not used as sample peaks.
For an ND spectrum, FELIX searches all the vectors along the phasing dimension and tries to find a best sample peak for each segment, without considering peaks in the excluded areas. Peaks are selected based on their width (must be wider than a userdefined threshold), height (the higher the better), and symmetry (the more symmetric the better). For a 1D spectrum, sample peaks are searched similarly in the only vector for each segment.
If more than two sample peaks are found for the whole spectrum, the phase errors are detected from each of them. Next, the global phase parameters (phase0 and phase1) are obtained by linear regression analysis. The phase errors detected are between 0° and 180°, which do not reflect the actual phase differences between them. The possible pjumps between the individual phase errors are automatically compensated within the expected range of phase1. The default range of phase1 is approximately 720° to 720°. You can change the range of phase1 when using the aph command. See FELIX Command Language Reference for more details about using this command.
One approach to making assignments, developed by Wüthrich and coworkers (Wüthrich 1986), is the "sequential assignment strategy", which obtains sequencespecific resonance assignments (see following figure).
The first step in this procedure is the delineation of individual spin systems, based on homonuclear 2D and/or 3D spectra or on heteronuclear spectra. If homonuclear 2D spectra are utilized, this spinsystem delineation can be based on COSY type spectra (e.g., DQFCOSY or PCOSY). A spin system is understood to be a group of spins connected by scalar or through bond spinspin couplings. This means that, for homonuclear spectra of a peptide or protein, one amino acid residue can contain one or more spin systems. However, due to the peptide bond (which yields a structure in which no nonlabile hydrogens are three bonds apart in neighbor residues) spin systems do not comprise spins from more than one residue. Therefore, in a COSYtype spectrum, cross peaks are expected between scalarly coupled spins within individual residues.
For spinsystem delineation, the TOCSY or HOHAHA type spectra can also be used, since the cross peaks arise only within individual spin systems. Ideally, TOCSY contains more cross peaks for the same spin system than COSY; thus, in complicated overlapping situations it is advantageous to combine the information from both types of spectra.
For spinsystem delineation or detection it is possible to use two different methods: systematic searching (when all the spin systems in the spectrum are collected through systematically searching the spectrum for wellaligned peaks) and simulated annealing.
The complete delineation of spin systems is facilitated by the fact that the chemical shifts of the protons are in well defined regions depending on structure (primary or secondary). The expectation values have been extracted for different protons in amino acid residues, based on analysis of assigned proteins (Gross 1988). You would expect to find most H_{N}H_{a} cross peaks at 612 ppm and 36 ppm. The chemical shift information and the pattern of cross peaks is necessary, but not sufficient by itself, for identification of spin systems with specific residue types, since neither all the chemical shifts nor all the cross peak patterns are unique.
To reduce ambiguities, additional homonuclear experiments can be performed (either NOESY or changing the solvent, pH, or temperature). In more difficult situations, heteronuclear experiments may also be added, if possible.
After spinsystem delineation is complete, typically some spin systems remain that are not identified uniquely. Also, unique residue type identification is not always enough, since most biopolymers contain several copies of most residue types. These facts then necessitate the second step of sequencespecific resonance assignment, i.e., matching spin systems with specific residues. This is done through identification of NOE connectivities between spin systems corresponding to neighbor residues in the sequence.
Rigorous analysis of Xray structures of globular proteins (Billeter et al. 1982) showed that several interresidue distances are almost always small enough to yield cross peaks in a NOESY spectrum. These are, in particular: d_{a}N(i,i+1), d_{b}N(i,i+1), and d_{NN}(i,i+1). Usually d_{a}N(i+1,i) and d_{b}N(i+1,i) are not short enough. Therefore, these distances can be used to tie together stretches of spin systems by analyzing the NOE interactions between protons of different spin systems. If such stretches can be matched with stretches of residue sequence, sequencespecific assignment is achieved with a high level of confidence.
Since NMR data tend to be incomplete or illresolved, a systematic search of perfect assignments yields only partial results. One has to tolerate a significant amount of imperfection and use loose criteria in order to find a probable assignment. Even then, an exhaustive inspection of all possible solutions might become impossible within reasonable computation times. Optimization methods such as simulated annealing then provide a powerful alternative; all constraints are used to define an energy for the system, which is minimized according to careful schemes. No general tolerance has to be adjusted. The minimizer spontaneously allows imperfections where (and only where) no better solution can be found with the available data (just as it does if peaks are missing from a TOCSY pattern, or if NOEs or even residues are missing from a sequential assignment) (Morelle et al. 1994a, 1994b).
The following steps were used by Eccles et al. (1991) for sequencespecific assignment of proteins based on homonuclear 2D spectra (DQFCOSY, TOCSY, and NOESY):
a. Start with the H_{N}H_{a} cross peak in DQFCOSY.
b. Identify all cross peaks belonging to this spin system using TOCSY (the alignment of COSY and TOCSY spectra may be rather poor  0.03 ppm).
c. Check all resulting groups against a database of 20 amino acid residues and score them.
a. A search can be carried out along the w2 direction at the H_{a}, H_{b},... resonances of each spin system (with an allowed uncertainty of 0.01 ppm) to find the corresponding H_{N} resonance of the sequential neighbor spin system. This can yield a list of NOE connectivities between the alpha, beta, and amide protons of each spin system and the amide proton of other spin systems.
b. Uninterrupted pathways can be searched for, using the sequential NOEs and the amino acid sequence.
For spinsystem delineation you can use (besides 2D spectra) homonuclear 3D spectra such as the 3D TOCSYTOCSY spectrum (Kleywegt et al. 1993). The spin system can be connected sequentially via 3D TOCSYNOE or 3D NOENOE spectra.
Homonuclear experiments can have a great deal of spectral overlap, especially as the size of the molecule being studied increases. One way to alleviate the overlap problem is to use NMRdetectable heteronuclear atom labels such as carbon13 and nitrogen15 to singly or doubly label proteins or nucleic acids (some classic examples include Redfield et al. 1994, Hansen et al. 1994, Freund et al. 1994, Clubb et al. 1994, Anglister et al. 1994). Such isotopic labeling in common practice in the NMR community. Here, several methods of assigning singly or doubly labeled proteins are briefly described. The number of different 3D and 4D experiments available using labeled samples is virtually unlimited, so we limit our discussion here to the most widely utilized and implemented experiments. (For additional detail see Farmer et al. 1993, Sklenar et al. 1993, Marino et al. 1994, Legault et al. 1994.)
One of the first methods described in the literature used several combinations of double and tripleresonance experiments to unravel the assignment for much bigger proteins than was possible earlier with homonuclear methods (Grzesiek et al. 1992). The steps involved can be summarized as follows:
a. Some peaks could have been missing from some spectra. For example, if water presaturation was used to measure HNCA (which might have been necessary) the H_{N} protons, which were in fast exchange, would not give a signal when compared to HNCO.
b. The typical peak halfwidth is 4555 Hz for H_{a }and 3545 Hz for C_{a}, therefore the peak positions were prone to ambiguities.
c. Some spectra could have given extra peaks. For example, HNCA gave extra peaks for H_{N},iN_{i}C_{a},i1, but this can be used in conjunction with HN(CO)CA. Special care was taken in automated assignment.
For smaller proteins you can use fewer tripleresonance experiments and still achieve good sequential assignment of the backbone. In particular, you can use a combination of two 3D tripleresonance experiments (Grzesiek and Bax 1992), which correlates the C_{b}, C_{a}, N, and H_{N} resonances of the ith and i1^{th} residues using CBCANH or HNCACB and CACB(CO)NH experiments. Then the residue types can be identified using a database of C_{a} and C_{b} chemical shifts (Grzesiek and Bax 1993).
You can also choose to apply 4D methods using these steps. Although they are usually limited by spectrometer time, the information content of two 4D spectra can be very large. Ideally, for example, you can use a HNNCAHA or HACANHN experiment and a HACA(CO)NHN experiment to obtain a complete list of H_{a},i, C_{a},i, N_{i}, H_{N,}i, H_{a},i1, and C_{a},i1 resonances. From this list you can then construct a full sequential assignment of the backbone (Boucher et al. 1992a, 1992b, Constantine et al. 1993).
The Assign module includes several building blocks: the database, the visualization and analysis tools, and the automated algorithms.
The database consists of the data structures, which are built on the FELIX database management system. Using this Assign tool, you can keep all necessary information in a compact binary file, from which the specific entities can be exported in ASCII format. Also, FELIX provides spreadsheet capability, allowing you to view and, in certain cases, edit the entities.
The Assign database is structured in several layers. First you define a project entity, which is a container entity (i.e., it is at the top level and contains all information about the lowerlevel entities). The project entity, in which the assignment is carried out, then contains spectrum parameters (e.g., plot appearance, peaks, and spectra type description); spinsystem information (patterns and frequencies, together with properties); sequence information (if any); the library of chemical shifts of residues; and various entities needed in the assignment procedure.
The following table lists the most important entities and symbols used in an Assign project:
^{1}No variables defined

Visualization and analysis tools are provided by the graphical user interface (GUI). Functionality in the Assign module is organized into several submenus. Menu items dealing with repetitive visualization tasks (e.g., changing plot type, redefining plot limits and plot parameters) are easily accessible at all times through icons at the top of the FELIX window. In addition, several commands that help the assignment process are mapped to the right mouse button (drawing frequencies, picking and removing peaks,and displaying correlated cursors). Menu items specific to Assign include tools for tiling patterns in different spectra, overlaying contour plots, easy access to strip plots, and displaying frequency lists from different sources on different spectra.
When working with multiple 3D/4D data sets at once, quick access to different spectra as well as coordinated navigation through 2D planes of different views of different spectra is crucial for successful assignment. Therefore multiple spectra can be connected in one, two, or three dimensions, letting you define which axes from which views are to be changed together when plot limits are changed. There is also the option, as mentioned above, to overlay multiple spectra in the same window.
Many automated algorithms that facilitate the sequential assignment strategy are found within the Collect Prototype Patterns, Frequency Clipboard/Compare Frequencies, Residue Type/Score Residue Type, Residue Type/Match Residue Type, Neighbor/Find Neighbor Via..., Sequential/Systematic Search, Sequential/Simulated Annealing, and Peak Assign/Autoassign Peaks menu items.
The prototypepattern detection menu items serve as spinsystem detection tools. FELIX currently has several methods for automatically detecting spin systems:
The 2D homonuclear systematic search algorithm performs the steps shown in the following figure.
The 3D homonuclear spinsystem detection algorithm can be used on a [J,J], a [J,NOE], or a [NOE,J]type experiment and performs the steps in the following figure.
The 2D simulated annealing detection algorithm uses these steps:
Assign implements several heteronuclear spinsystem detection methods, which can be based on double and tripleresonance experiments.
It is possible to use a ^{15}N separated TOCSY experiment to collect spin systems starting on H_{N} and N frequencies of the pseudodiagonal peaks (i.e., ^{15}NH_{N}H_{N} peaks). The algorithm is similar to the 3D homonuclear one, but you need to specify the dimension that the root frequency (i.e., the H_{N}) is in, and then all the frequencies are collected into spin systems that have peaks with the same H_{N} and N frequencies. A method for collecting spin systems starting with 2D peaks is also implemented. For this method, you need to provide a ^{15}N^{1}H HSQC and an ^{15}NHSQCTOCSY spectra (certainly they can be HMQC, too).
Tripleresonance based spinsystem detection algorithms are also provided in Assign. Particularly, you can use a combination of three 3D spectra  the HNCO, HN(CO)CA, and HNCA  and collect interresidue spin systems or prototype patterns containing H_{N,}iN_{i}C_{a},iC_{a},i1C_{i}1 frequencies. Then during promotion to patterns, an automated routine can find the sequential neighboring systems and store the results as neighbor probabilities.
Similarly, you can use two tripleresonance 3D experiments  a CBCA(CO)NHN and a CBCANHN or HNNCACB  to collect prototype patterns containing H_{N,}iN_{i}C_{a},iC_{b},iC_{a},i1C_{b},i1 frequencies, and a similar promotion can be carried out. Now you can also use the ^{15}N^{1}H HSQC spectrum to help collect the spin systems  mixing the information from heteronuclear double and tripleresonance spectra.
In a similar method to that used in the previous example, you can collect spin systems in three tripleresonance experiments: HNCO, CBCANH, and CBCACO(N)H. This method results in spin systems containing: H_{N,}iN_{i}C_{a},iC_{b},iC_{a},i1C_{b},i1C_{i}1.
It is possible to use an experiment to specifically find glycine spin systems (Wittekind 1993). A spinsystem detection method is implemented in Assign for this, using the HNHA, CBCANH, and CBCA(CO)NH spectra.
Finally, you may use two 4D tripleresonance experiments  an HACA(CO)NHN and HACANHN  to automatically collect prototype patterns containing H_{N,}iN_{i}C_{a},iH_{a},iC_{a},i1H_{a},i1 frequencies.
A visual tool is provided to allow you to quickly find the previously detected spin systems in multidimensional spectra and to verify them visually.
The Frequency Clipboard/Compare Frequencies menu item checks a frequency list against a set of patterns or prototype patterns. The algorithm finds how many "fuzzy similarities" (Kleywegt et al. 1989) are present between the current collection of frequencies and the frequencies in each pattern or prototype pattern. You can use this to decide whether a new frequency list constitutes a new pattern.
The residuetype matching and scoring of patterns helps in assigning frequencies in patterns to specific residues. For each frequency in a pattern, the matchtype action calculates a delta value based on the library expectation values and standard deviations collected for each residue type (delta = [actual_shift_of_freq  expectation_value]/standard_deviation). The scoretype action automatically matches patterns and residue types. The algorithm tries to find a matching frequency in the pattern. The bestmatching frequency may lie no more than a given number of standard deviations away from the expectation value. If at least a certain number of atoms can be matched, a score is computed.
The Neighbor/Find Neighbor Via... menu items serve to find potential neighbor patterns through a 2D NOESY, a 3D [J,NOE], a 3D [NOE,NOE], or a _{15}Nresolved NOESY spectrum. Therefore, this is the first step in sequencespecific assignment of patterns.
The algorithm consists of these steps:
FELIX implements a tripleresonanceneighborfinding algorithm in the Neighbor/Find Neighbor Via 3D/4D menu item. This action works on a tripleresonance experiment (e.g., HN(CO)CA or CBCA(CO)NHN) which contains cross peaks connecting neighboring residues. The algorithms for each pattern look through the spectrum and try to find peaks that connect this pattern to, for example, the i1^{th} residue. Then the sequential connectivity is set for the pattern that had the candidate frequency.
The Sequential/Systematic Search and Sequential/Simulated Annealing menu items find potential matches of patterns with the sequence of an unbranched biopolymer. This can be done using systematic search methodology through a recursive matching procedure or using simulated annealing. In the latter case, the program tries to assign a set of patterns to the primary sequence (or a requested segment of it) of the molecule using optimization techniques rather than systematic searching. Each step in the simulated annealing routine is a permutation between the pool of patterns and a residue in the sequence (initially filled with empty dummy residues and, optionally, some previously assigned residues) or between two stretches of residues tentatively assigned in the primary sequence. The energy is computed from previously determined residuefit scores and neighbor information. Eventually, the results converge to a solution of low energy, which is unique in well defined parts of the sequence. However, when the data are inadequate, several runs of the program might yield locally different solutions.
The Peak Assign/Autoassign Peaks menu item automatically assigns J or NOE peaks based on a set of assigned patterns, library, and molecule information. The patterns must have spectrumspecific shifts set for their resonances. The algorithm works as follows:
a. If one of the possible assignments has a distance connected to it that is shorter than the unambiguous cutoff and none of the others do, then this assignment is stored as a single assignment.
b. If more than one of the possible assignments has a shorter distance than the unambiguous cutoff, or none do, and multiple assignment is requested, then these are stored as multiple possible assignments.
A crucial part of the Model module is the calculation of the theoretical proton NOEs for a model structure. The basic physical principles underlying these calculations are described here.
Multispin relaxation of a biomolecule in solution can be approximately described by the generalized Bloch equations (Boelens et al. 1988). In this approximation, the time development of the matrix of normalized crosspeak intensities A in a 2D NOE spectrum is described by a set of coupled differential equations (Ernst et al.1987, Keeler et al. 1984, Massefski et al. 1985, Dobson et al. 1982). First:
Eq. 5 
which can be formally solved for a mixing time r_{m} as:
Eq. 6 
where R is the relaxation matrix with:
Eq. 7 
The relaxation rate constants in the dipolar crossrelaxation matrix for a rigid molecule with N protons are defined (Ernst et al. 1987, Dobson et al. 1982) by:
Eq. 8 
Eq. 9 
where r_{ij} is the distance between protons i and j and K = 0.1 g4 (h/2p)^{2}. For isotropic tumbling of the molecule with a correlation time r_{c}, the spectral densities J_{n} (nw) take the simple form:
Eq. 10 
where w is the Larmor frequency of the protons. The exponential matrix exp[Rt_{m}] can be expanded in a power series:
Eq. 11  . 
For sufficiently short mixing times, only the first two terms in Eq. 14 contribute appreciably, and the NOE intensity a_{ij} becomes s_{ij} ¥ t_{m},which builds up linearly with the mixing time.
Alternatively, the matrix equations can be solved as:
Eq. 12 
where l is the eigenvalue matrix obtained after diagonalization of R, and X is the matrix of eigenvectors:
Eq. 13 
Therefore, given a molecular model, the NOE matrix can be calculated from the relaxation matrix for each mixing time t_{m} of a 2D NOE experiment, and the buildup of NOE intensities in a time series can be obtained.
The reverse procedure is also possible, although so far it has only been applied to systems with chemical exchange between a small number of sites (Massefski et al. 1985, Dobson et al. 1982). When the complete NOE matrix A is known, the relaxation matrix R can be obtained by diagonalizing the NOE matrix (Ernst et al. 1987, Olejniczak et al. 1986):
Eq. 14 
Eq. 15 
Thus theoretically, matrix R can be obtained from one 2D NOE experiment performed at a suitably chosen mixing time. In practice, averaging over a series of t_{m} values is more accurate.
Additional detail on the full treatment of dipolar spin relaxation of spin networks in the presence of internal motions appear in Boelens et al. (1988, 1989), Koning (1990), and references cited therein.
The Model module of FELIX offers a very efficient method for computation of NOEs.
The NOE matrix is related to the rate matrix as described in Eq. 11. In principle, one can calculate NOEs exactly by diagonalization (Eq. 12). Unfortunately, the computational load of diagonalization is too demanding (O(N 3)), to allow for fast backcalculation. Thus, one must seek an alternative method of evaluating NOEs. We present a method here which utilizes the general properties of the matrix exponential operator and the distance dependence of the rate matrix.
As mentioned earlier, the NOE matrix is related to the rate matrix as follows.
If t_{m} is sufficiently small, you can ignore the higherorder terms and approximate the NOE matrix by the linear term:
Eq. 16 
The problem is then to obtain the NOEs at longer mixing times. The multiplicative properties of the matrix exponential are the key:
Eq. 17 
If the mixing time t is broken down into 2^{n} smaller times steps s, such that t = s2n, then you have:
Eq. 18 
Eq. 19 
Thus in n steps of repeated matrix squaring (or doubling in the time parameter  the notation of the method used in FELIX is matrix doubling and refers to this procedure), we arrive at (exp (R · t)) from (exp (R · (t/2^{n})). For example, to get to t_{m} at 128 ms from 1 ms, we only need to do 7 matrixsquaring operations. The formula in Eq. 16 allows one to bootstrap from NOEs at a small mixing time, where a linear approximation (Eq. 16) is justified. To truly make such an scheme efficient, one must optimize the performance of the matrix multiplication involved. To this end, make use of the distance dependence of the rate matrix:
Eq. 20 
The inverse powerofsix behavior implies that the rate drops off extremely rapidly with distance. Thus, for most practical applications, one can impose a cutoff distance beyond which R is assumed to be zero. This further implies that the resulting R matrix is sparse. The matrix multiplications involved can then exploit the sparsity of the rate matrix (and the resultant NOE matrices) to gain substantially in efficiency.
The application of a general theory of spin relaxation to biological molecules is usually hindered by the lack of a complete description of the molecular motions over a sufficiently long time scale. You can nevertheless account for certain types of internal motions such as methyl rotations and ring flipping. Remember that, although any one particular model for such motions would not be perfectly satisfactory, it is nevertheless important to use more physically based models that involve fewer adjustable parameters and are rigorously derived from well defined assumptions. Thus for methyl rotations and ring flips, we adopt the jump models as proposed by Tropp (1980). We further assume that methyl rotations occur over three sites and on a time scale much shorter the overall tumbling time of the molecule. For ring flips, the jumps are assumed to occur over two states (180° flips) and on a time scale much slower than the tumbling time. The relaxation rates are then modified to include the contributions from the various jump states. For further details see Yip and Case (1991). An important feature of the jump models that we assume here is that no additional parameter is in the relaxation matrix other than those already used by the rigid isotropic tumbling model.
NMR techniques for characterization of proteinligand interactions are being widely used in rational drug design. As an example, the SAR by NMR strategy, devised by the Fesik group at Abbott Laboratories (Shuker et al. 1996; Hajduk et al. 1997), consists of the following general steps:
1. First, a library of small molecules is screened for binding to an ^{15}Nlabeled protein. If a molecule binds to the protein, it alters the local chemical environment and thereby cause changes in the chemical shifts of nuclei in the protein's binding site. Such changes are detected in HSQC spectra acquired in the presence and absence of added ligand.
2. Once initial ligands are identified, analogs are screened and binding constants are obtained in order to optimize the interactions with the protein.
3. Next, the 3D structure of the protein complexed with the ligands is obtained, and linked compounds are synthesized based on this structural information.
The Autoscreen module of FELIX automatically characterizes the proteinligand interactions by identifying the changes between the reference and test spectra. With Autoscreen you can organize the project, process and score the 2D test spectra, and identify highaffinity ligands. For the above SAR by NMR steps, it offers help for the first two steps.
Autoscreen analysis of 2D ^{15}NHSQC data consists of the following general steps:
1. Organizing HSQC experimental data into a project. The data can be FID raw data from various spectrometers or processed data. An experiment can be associated with one or more molecules.
2. If raw data are used, process the control (or reference) spectrum. The same processing methods and parameters are remembered for processing all other spectra. Otherwise, if processed data are used, designate one of them as the control.
3. If available, input the peaks and assignments of the control spectrum. Otherwise, do peak picking and resonance assignment in the 2D and Assign modules, respectively.
4. Process and score the test spectra.
5. Cluster the experiments based on their displaced peaks and identify binding subsites. Export the results for QSAR study, fitting K_{d}, etc.
A crucial step in using the Autoscreen module is scoring of the test spectra against the control spectrum. This consists of the following steps:
1. Pick peaks in the test spectrum.
2. Match test peaks to control peaks based on chemical shifts and peak shapes, and then score the experiment based on the displacements between the matched peak pairs.
3. Add penalties for the unmatched control and test peaks to the score.
The underlying principles are explained in the following sections.
The prerequisites for a test peak to be matched to a control peak are Eqs. 2123:
Eq. 21 
Eq. 22 
Eq. 23 
where DH is the absolute displacement of the ^{1}H chemical shift between the test peak and the control peak, maxDH the upper limit of this ^{1}H displacement, DN the absolute displacement of the ^{15}N chemical shift, maxDN the upper limit of the ^{15}N displacement, S the shape similarity between the test peak and the control peak, and minS the lower limit of the shape similarity.
The test in Eq. 23 is applied when you choose to consider the shape of peaks. You can choose to define the shape similarity S as the product of the similarities of the peak widths and intensity:
Eq. 24 
where S_{w}_{H}, the similarity of peak widths along the ^{1}H dimension, is calculated based on the width of the control peak wC_{H} and the width of the test peak wT_{H} along the ^{1}H dimension:
Eq. 25 
Analogously, S_{w}_{N}, the similarity of peak widths along the ^{15}N dimension, is calculated based on the width of the control peak wC_{N} and the width of the test peak wT_{N}along the ^{15}N dimension:
Eq. 26 
SI, the similarity of peak intensities, is calculated based on the peak heights of the control peak I^{C }and of the test peak I^{T}:
Eq. 27 
If a test peak is matched to a control peak, its contribution to the score of the experiment is:
Eq. 28 
where a and b are the weights of the ^{1}H and ^{15}N chemical shifts, respectively.
A control peak that does not match any test peak contributes a penalty c' (default is 0.6) to the score. The score of an experiment is hence the sum of the contributions of all matched and unmatched control peaks:
Eq. 29 
Sometimes some of the test peaks do not match any control peak. Although most unmatched test peaks are noise, any that are legitimate peaks can, if you choose to, contribute to the score such that:
Eq. 30 
where cð is the penalty for each unmatched test peak.
Normally, the contribution of peaks with toosmall displacements are ignored, namely:
Eq. 31 
where minDH and minDN are the lower limits of the ^{1}H and ^{15}N displacements, respectively.
In practice you can change the weight of some control peaks to hide or highlight their contributions. In this way the reported score becomes:
Eq. 32 
where W is the weight of each control peak. Since test peaks are automatically picked on the fly, you cannot specify weights for them. Furthermore, although they contribute to the score, unmatched test peaks are not stored in the database as other peaks are.
Except in the simplest case, Eqs. 2123 do not guarantee an unambiguous mapping of all test peaks to control peaks. Based on the assumptions that only a small portion of the peaks are displaced and that most peaks retain their locations and shapes in the test spectrum, Autoscreen assumes that the mapping that has a globalminimum value C as defined in Eq. 29 is the true mapping.
Peak picking in test spectra is done on the fly during scoring, i.e., you have no chance to optimize the picked peaks, hence the quality of the picked peaks is one of the key factors for successful scoring. Although most of the peakpicking parameters should remain identical to those used for peak picking in the control spectrum, the threshold may need to be adjusted for differences in sample concentrations.
Autoscreen first determines a peakpicking area, which is the minimum rectangle that includes all the control peaks plus margins of maxDH and maxDN in the D1 and D2 dimensions, respectively.
Autoscreen provides three ways of defining the threshold for peak picking: Automatic, Control, and Define. If the default setting (Automatic) is selected, an initial threshold is automatically calculated, and then the threshold is raised or lowered according to the following rules for each test spectrum:
1. If the number of test peaks is fewer than 1.1 times the number of control peaks, lower the threshold.
2. Otherwise, for each control peak, the test peaks that meet the criteria of Eqs. 2123 are taken as candidate matches to the control peak. If over 10% of control peaks do not have a candidate match, lower the threshold.
3. Otherwise, if the number of test peaks is greater than 1.5 times the number of control peaks and each control peak has more than 3.0 candidate matches on average, raise the threshold.
No more than seven adjustment loops are performed.
If Control or Define is selected for the threshold method, the same threshold as that used for the control spectrum or a userdefined threshold is applied to all test spectra, respectively, and no automatic adjustment of thresholds is done.
For each control peak, the contributions of those test peaks that meet the criteria of Eqs. 2123 are calculated using Eq. 28. These test peaks are sorted in ascending order of c, and the first four are retained as candidate matches. At this stage, similarities in peak heights (SI) are not used in Eq. 24, since the intensity level may be different between the control and test spectra.
To normalize the intensity level of the test spectrum, an intensity ratio IR is calculated as follows:
Eq. 33 
where ÂIC is the sum of peak heights for all control peaks that have at least one candidate match, and ÂIT is the sum of peak heights of the first candidate matches to all control peaks. The peak heights of the test peaks are then normalized by multiplying by IR.
Next, the list of candidate matches is updated by including the peak intensity in Eq. 24.
The remaining task is to choose one or no test peak for each control peak, so that the sum of contributions (Eq. 29) is minimal. Some control peaks may not match any test peak, and such unmatched control peaks also contribute to the score.
Autoscreen provides two alternative methods for searching the mapping to find a global minimum for Eq. 29.
The first is a depthfirst treesearch method. This deterministic method, in principle, should always find the true minimum, but in practice it may be too slow for complex spectra. Several heuristic methods are used to enhance its efficiency, so that, for normal spectra, satisfactory results are obtained in a few seconds. By default, the total CPU time is limited to 10 seconds although you can change this.
Another method is simulated annealing, a stochastic method that efficiently lowers the score (or energy) to or close to the global minimum. This method is recommended for very complicated spectra where tree searching does not give satisfactory results in a reasonable amount of CPU time.
There are many possible reasons for a control peak not to match any test peak. For example, if the upper limits for peak matching are set to too low a value, a peak that is shifted very far will not find its match. More often, two well separated peaks in the control spectrum become overlapped in the test spectrum. When this happens, only one test peak is picked, leading to one of the control peaks' being unmatched.
To unravel the overlapping test peaks in the latter case, Autoscreen fits such unmatched control peaks to the test spectrum using a local optimization method (see Peaks/Optimize in Chapter 4., Processing, visualization, and analysis interface (1D/2D/ND) in the FELIX User Guide.) All unmatched control peaks, together with those neighboring control peaks that lie closer than four times their peak width to the unmatched control peaks, are fitted to the test spectrum with only the peak centers optimized. This process is iterated up to three times or until the change in the penalty is less than 5%. Next, the widths of these peaks are optimized for one iteration.
For each optimized peak, its equivalent peak height in the test spectrum is normalized using the intensity ratio, as for other test peaks. Then the optimized peak is matched to the original control peak to see if the conditions in Eqs. 2123 are met. If this is successful, the contribution is calculated using Eq. 28. Otherwise, the control peak remains unmatched and its penalty is added to the score according to Eq. 29.
The fitting of unmatched control peaks to the test peaks is done only when the percentage of unmatched control peaks is below a certain threshold (default = 40%). If too many control peaks are not matched, this usually means that the spectrum is corrupted, and the fitting is not done. A corrupt spectrum usually has a score significantly higher than a normal spectrum, because of the contribution of unmatched control and test peaks.
Test peaks that are not matched to a control peak are usually noise, but they can also be real peaks corresponding to unwanted control peaks (e.g., the sidechain amide peaks are usually excluded from the control peaks), or to a control peak that was not noticed in the control spectrum because of peak overlap or other reasons. A legitimate test peak that was not observed in the control spectrum but that is picked in the test spectrum should contribute to the score.
Therefore, from all the unmatched test peaks, Autoscreen identifies legitimate test peaks based on their chemical shifts and peak shapes. If you choose to use the chemicalshift constraint, then only those peaks that are displaced less than maxDH and maxDN (in the ^{1}H and ^{15}N dimensions, respectively) with respect to their closest control peaks are considered. Next, the peak widths and heights of all the matched test peaks are analyzed statistically. At this point, an unmatched test peak is considered legitimate if its peak widths (wH and wN) and peak height (I) meet the following criteria:
Eq. 34 
Eq. 35 
Eq. 36 
where , , and are the average values of the peak width along the ^{1}H dimension, the peak width along the ^{15}N dimension, and the peak height, respectively; sH, sN, and_{ }sI are the standard deviations of the peak width along the ^{1}H dimension, the peak width along the ^{15}N dimension, and the peak height, respectively; and n is a userdefined coefficient (default value is 2.0).
By default, each unmatched test peak contributes a penalty of 0.20 to the score, though you can always change this parameter. This penalty is set smaller than that for an unmatched control peak, because you usually have a better chance to refine the control peaks based on the assignment. The test peaks, on the other hand, are automatically picked so are usually less reliable.
Presently, unmatched test peaks are only reported to you in the text window and in a log file, but not saved in the database like other peaks.
The Autoscreen/Setup Scoring menu item (see Chapter 8., Autoscreen user interface) allows you to change or verify the parameters used for scoring. The following table correlates these parameter names with the symbols and terms mentioned in this chapter, for your convenience.
Parameter  Options  Default  Symbol or meaning 

minDH in Eq. 31.  
maxDH in Eq. 21.  
minDN in Eq. 31.  
maxDN in Eq. 22.  
a in Eq. 28.  
b in Eq. 28.  
minS in Eq. 23.  
Method to search for the global minimum. Options are Tree Search and Simulated Annealing.  
Maximum CPU time (seconds) to spend searching the global minimum. Used only when Tree Search is selected.  
Upper limit of unmatched control peaks, above which the spectrum is regarded as corrupt and no fitting is done.  
Penalty for an unmatched control peak (c' in Eq. 29).  
Penalty for an unmatched test peak  cð in Eq. 30. 