// @(#)root/hist:$Name: $:$Id: TMultiDimFit.cxx,v 1.10 2003/01/16 18:07:52 brun Exp $ // Author: Christian Holm Christensen 07/11/2000 //____________________________________________________________________ // // // /*Multidimensional Fits in ROOT
OverviewA common problem encountered in different fields of applied science is to find an expression for one physical quantity in terms of several others, which are directly measurable.
An example in high energy physics is the evaluation of the momentum of a charged particle from the observation of its trajectory in a magnetic field. The problem is to relate the momentum of the particle to the observations, which may consists of of positional measurements at intervals along the particle trajectory.
The exact functional relationship between the measured quantities (e.g., the space-points) and the dependent quantity (e.g., the momentum) is in general not known, but one possible way of solving the problem, is to find an expression which reliably approximates the dependence of the momentum on the observations.
This explicit function of the observations can be obtained by a least squares fitting procedure applied to a representive sample of the data, for which the dependent quantity (e.g., momentum) and the independent observations are known. The function can then be used to compute the quantity of interest for new observations of the independent variables.
This class TMultiDimFit implements such a procedure in ROOT. It is largely based on the CERNLIB MUDIFI package [2]. Though the basic concepts are still sound, and therefore kept, a few implementation details have changed, and this class can take advantage of MINUIT [4] to improve the errors of the fitting, thanks to the class TMinuit.
In [5] and [6] H. Wind demonstrates the utility of this procedure in the context of tracking, magnetic field parameterisation, and so on. The outline of the method used in this class is based on Winds discussion, and I refer these two excellents text for more information.
And example of usage is given in $ROOTSYS/tutorials/multidimfit.C.
The MethodLet by the dependent quantity of interest, which depends smoothly on the observable quantities , which we'll denote by . Given a training sample of tuples of the form, (TMultiDimFit::AddRow)
where are independent variables, is the known, quantity dependent at , and is the square error in , the class TMultiDimFit will try to find the parameterization
such that
is minimal. Here are monomials, or Chebyshev or Legendre polynomials, labelled , in each variable , .So what TMultiDimFit does, is to determine the number of terms , and then terms (or functions) , and the coefficients , so that is minimal (TMultiDimFit::FindParameterization).
Of course it's more than a little unlikely that will ever become exact zero as a result of the procedure outlined below. Therefore, the user is asked to provide a minimum relative error (TMultiDimFit::SetMinRelativeError), and will be considered minimized when
Optionally, the user may impose a functional expression by specifying the powers of each variable in specified functions (TMultiDimFit::SetPowers). In that case, only the coefficients is calculated by the class.
Limiting the Number of TermsAs always when dealing with fits, there's a real chance of over fitting. As is well-known, it's always possible to fit an polynomial in to points with , but the polynomial is not likely to fit new data at all [1]. Therefore, the user is asked to provide an upper limit, to the number of terms in (TMultiDimFit::SetMaxTerms).
However, since there's an infinite number of to choose from, the user is asked to give the maximum power. , of each variable to be considered in the minimization of (TMultiDimFit::SetMaxPowers).
One way of obtaining values for the maximum power in variable , is to perform a regular fit to the dependent quantity , using a polynomial only in . The maximum power is is then the power that does not significantly improve the one-dimensional least-square fit over to [5].
There are still a huge amount of possible choices for ; in fact there are possible choices. Obviously we need to limit this. To this end, the user is asked to set a power control limit, (TMultiDimFit::SetPowerLimit), and a function is only accepted if
where is the leading power of variable in function . (TMultiDimFit::MakeCandidates). So the number of functions increase with (1, 2 is fine, 5 is way out).
Gram-Schmidt Orthogonalisation
To further reduce the number of functions in the final expression, only those functions that significantly reduce is chosen. What `significant' means, is chosen by the user, and will be discussed below (see 2.3).
The functions are generally not orthogonal, which means one will have to evaluate all possible 's over all data-points before finding the most significant [1]. We can, however, do better then that. By applying the modified Gram-Schmidt orthogonalisation algorithm [5] [3] to the functions , we can evaluate the contribution to the reduction of from each function in turn, and we may delay the actual inversion of the curvature-matrix (TMultiDimFit::MakeGramSchmidt).
So we are let to consider an matrix , an element of which is given by
where labels the rows in the training sample and labels functions of variables, and . That is, is the term (or function) numbered evaluated at the data point . We have to normalise to for this to succeed [5] (TMultiDimFit::MakeNormalized). We then define a matrix of which the columns are given by
and is the component of orthogonal to . Hence we obtain [3],
We now take as a new model . We thus want to minimize
where is a vector of the dependent quantity in the sample. Differentiation with respect to gives, using (6),
or
Let be the sum of squares of residuals when taking functions into account. Then
Using (9), we see that
So for each new function included in the model, we get a reduction of the sum of squares of residuals of , where is given by (4) and by (9). Thus, using the Gram-Schmidt orthogonalisation, we can decide if we want to include this function in the final model, before the matrix inversion.
Function Selection Based on ResidualSupposing that steps of the procedure have been performed, the problem now is to consider the function.
The sum of squares of residuals can be written as
where the relation (9) have been taken into account. The contribution of the function to the reduction of S, is given by
Two test are now applied to decide whether this function is to be included in the final expression, or not.
Test 1Denoting by the subspace spanned by the function is by construction (see (4)) the projection of the function onto the direction perpendicular to . Now, if the length of (given by ) is very small compared to the length of this new function can not contribute much to the reduction of the sum of squares of residuals. The test consists then in calculating the angle between the two vectors and (see also figure 1) and requiring that it's greater then a threshold value which the user must set (TMultiDimFit::SetMinAngle).
Test 2Let be the data vector to be fitted. As illustrated in figure 1, the function will contribute significantly to the reduction of , if the angle between and is smaller than an upper limit , defined by the user (TMultiDimFit::SetMaxAngle)
However, the method automatically readjusts the value of this angle while fitting is in progress, in order to make the selection criteria less and less difficult to be fulfilled. The result is that the functions contributing most to the reduction of are chosen first (TMultiDimFit::TestFunction).
In case isn't defined, an alternative method of performing this second test is used: The function is accepted if (refer also to equation (13))
where is the sum of the first residuals from the functions previously accepted; and is the total number of functions allowed in the final expression of the fit (defined by user).From this we see, that by restricting -- the number of terms in the final model -- the fit is more difficult to perform, since the above selection criteria is more limiting.
The more coefficients we evaluate, the more the sum of squares of residuals will be reduced. We can evaluate before inverting as shown below.
Coefficients and Coefficient Errors
Having found a parameterization, that is the 's and , that minimizes , we still need to determine the coefficients . However, it's a feature of how we choose the significant functions, that the evaluation of the 's becomes trivial [5]. To derive , we first note that equation (4) can be written as
where
Consequently, is an upper triangle matrix, which can be readily inverted. So we now evaluate
The model can therefore be written asThe original model is therefore identical with this if
The reason we use rather then is to save storage, since can be stored in the same matrix as (TMultiDimFit::MakeCoefficients). The errors in the coefficients is calculated by inverting the curvature matrix of the non-orthogonal functions [1] (TMultiDimFit::MakeCoefficientErrors).
ConsiderationsIt's important to realize that the training sample should be representive of the problem at hand, in particular along the borders of the region of interest. This is because the algorithm presented here, is a interpolation, rahter then a extrapolation [5].
Also, the independent variables need to be linear independent, since the procedure will perform poorly if they are not [5]. One can find an linear transformation from ones original variables to a set of linear independent variables , using a Principal Components Analysis (see TPrincipal), and then use the transformed variable as input to this class [5] [6].
H. Wind also outlines a method for parameterising a multidimensional dependence over a multidimensional set of variables. An example of the method from [5], is a follows (please refer to [5] for a full discussion):
- Define are the 5 dependent quantities that define a track.
- Compute, for different values of , the tracks through the magnetic field, and determine the corresponding .
- Use the simulated observations to determine, with a simple approximation, the values of . We call these values .
- Determine from a set of at least five relevant coordinates , using contrains, or alternative:
- Perform a Principal Component Analysis (using TPrincipal), and use to get a linear transformation , so that are constrained and linear independent.
- Perform a Principal Component Analysis on , to get linear indenpendent (among themselves, but not independent of ) quantities
- For each component make a mutlidimensional fit, using as the variables, thus determing a set of coefficents .
To process data, using this parameterisation, do
- Test wether the observation within the domain of the parameterization, using the result from the Principal Component Analysis.
- Determine as before.
- Detetmine as before.
- Use the result of the fit to determind .
- Transform back to from , using the result from the Principal Component Analysis.
Testing the parameterizationThe class also provides functionality for testing the, over the training sample, found parameterization (TMultiDimFit::Fit). This is done by passing the class a test sample of tuples of the form , where are the independent variables, the known, dependent quantity, and is the square error in (TMultiDimFit::AddTestRow).
The parameterization is then evaluated at every in the test sample, and
is evaluated. The relative error over the test sampleshould not be to low or high compared to from the training sample. Also, multiple correlation coefficient from both samples should be fairly close, otherwise one of the samples is not representive of the problem. A large difference in the reduced over the two samples indicate an over fit, and the maximum number of terms in the parameterisation should be reduced.It's possible to use Minuit [4] to further improve the fit, using the test sample.
Christian Holm
November 2000, NBI
Bibliography