10 Oct 2017, 11:07

L3 LEP data

Still working our way through this tutorial based on C++ and MINUIT:

http://www.desy.de/~rosem/flc_statistics/data/04_parameters_estimation-C.pdf

Now, we tackle the L3 LEP data. L3 was an experiment at the Large Electron Positron collider, at CERN, near Geneva. Until 2000, it recorded the decay products of e+e- collisions at center of mass energies up to 208 GeV.

An example is the muon pair production:

$$ e^+ e^- \rightarrow \mu^+\mu^-$$

Both muons are mainly detected and reconstructed from the tracking system. From the measurements, the curvature, charge and momentum are determined.

The file L3.dat contains recorded muon pair events. Every line is an event, a recorded collision of a \(e^+e^-\) pair producing a \(\mu^+\mu^-\) pair.

The first three columns contain the momentum components \(p_x\), \(p_y\) and \(p_z\) of the \(\mu^+\). The other three columns contain the momentum components for the \(mu^-\). Units are in \(GeV/c\).

Forward-Backward Asymmetry

An important parameter that constrains the Standard Model (the theoretical framework that models our current understanding of Physics) is the forward-backward asymmetry A:

$$ A = (N_F - N_B) / (N_F + N_B) $$

where:

  • \(N_F\) are the events in which the \(\mu^-\) flies forwards (\(\cos \theta_{\mu^-} > 0\));
  • \(N_B\) are the events in which the \(\mu^-\) flies backwards.

Given the L3.dat dataset, we would like to estimate the value of \(A\) and determine its statistical error.

In a simple counting experiment, we can write the statistical error as:

$$ \sigma_A = \sqrt{ \frac{(1-A^2)}{N} }$$

where \(N = N_F + N_B\).

So, as a first step, we can simply count the forward events.

First estimation

Let’s look at that data:

$> head -n 5 L3.dat
     4.584    -9.763   -18.508   -24.171    50.464    95.865
    62.570   184.448  -175.983   -28.392   -83.491    70.656
     7.387   101.650    13.531    -7.853  -108.002   -14.472
    43.672    56.083   -77.367   -40.893   -52.481    72.804
   -36.620   -60.832   -46.156    35.863    59.591    -3.220

First we need to assemble a bit of code to read in that file:

func read(fname string) []float64 {
	f, err := os.Open(fname)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	var costh []float64
	sc := bufio.NewScanner(f)
	for sc.Scan() {
		var pp, pm [3]float64
		_, err = fmt.Sscanf(
			sc.Text(), "%f %f %f %f %f %f",
			&pp[0], &pp[1], &pp[2],
			&pm[0], &pm[1], &pm[2],
		)
		if err != nil {
			log.Fatal(err)
		}
		v := pm[0]*pm[0] + pm[1]*pm[1] + pm[2]*pm[2]
		costh = append(
			costh,
			pm[2]/math.Sqrt(v),
		)
	}

	if err := sc.Err(); err != nil {
		log.Fatal(err)
	}

	plot(costh)
	return costh
}

Now, with the \(\cos \theta_{\mu^{-}}\) calculation out of the way, we can actually compute the asymmetry and its associated statistical error:

func asym(costh []float64) (float64, float64) {
	n := float64(len(costh))
	nf := 0
	for _, v := range costh {
		if v > 0 {
			nf++
		}
	}

	a := 2*float64(nf)/n - 1
	sig := math.Sqrt((1 - a*a) / n)
	return a, sig
}

Running the code gives:

events: 346
A = 0.231 +/- 0.052

plot

OK. Let’s try to use gonum/optimize and a log-likelihood.

Estimation with gonum/optimize

To let optimize.Minimize loose on our dataset, we need the angular distribution:

$$ f(\theta, A) = \frac{3}{8} (1 + \cos^2 \theta) + A \cos\theta $$

we just need to feed that through the log-likelihood procedure:

func main() {
	costh := read("L3.dat")
	log.Printf("events: %d\n", len(costh))
	a, sig := asym(costh)
	log.Printf("A= %5.3f +/- %5.3f\n", a, sig)

	fcn := func(x []float64) float64 {
		lnL := 0.0
		A := x[0]
		const k = 3.0 / 8.0
		for _, v := range costh {
			lnL += math.Log(k*(1+v*v) + A*v)
		}
		return -lnL
	}

	grad := func(grad, x []float64) {
		fd.Gradient(grad, fcn, x, nil)
	}

	hess := func(h mat.MutableSymmetric, x []float64) {
		fd.Hessian(h.(*mat.SymDense), fcn, x, nil)
	}

	p := optimize.Problem{
		Func: fcn,
		Grad: grad,
		Hess: hess,
	}

	var meth = &optimize.Newton{}
	var p0 = []float64{0} // initial value for A

	res, err := optimize.Minimize(p, p0, nil, meth)
	if err != nil {
		log.Fatal(err)
	}
	display(res, p)
}

which, when run, gives:

$> go build -o fba fba.go
$> ./fba
events: 346
A= 0.231 +/- 0.052

results: &optimize.Result{Location:optimize.Location{X:[]float64{0.21964440563754575}, F:230.8167059887097, Gradient:[]float64{0}, Hessian:(*mat.SymDense)(0xc42004b440)}, Stats:optimize.Stats{MajorIterations:4, FuncEvaluations:5, GradEvaluations:5, HessEvaluations:5, Runtime:584120}, Status:4}

minimal function value:  230.817
number of parameters: 1
grad=[0]
hess=[429.2102851867676]
errs= [0.04826862636461834]
par-000: 2.196444e-01 +/- 4.826863e-02

ie: the same answer than the MINUIT-based code there:

Results of MINUIT minimisation
-------------------------------------

 Minimal function value:               230.817  
 Estimated difference to true minimum:   9.044e-11 
 Number of parameters:           1     
 Error definition (Fmin + Delta):         0.500  
 Exact covariance matrix.

   Parameter     Value       Error    positive    negative    L_BND    U_BND
 0          A  2.196e-01  4.827e-02 +4.767e-02 -4.879e-02  0.0e+00  0.0e+00

Covariance matrix: 
  2.330e-03

Correlation matrix: 
  1.000

09 Oct 2017, 11:11

Introduction to Minimization with Gonum

Switching gears a bit with regard to last week, let’s investigate how to perform minimization with Gonum.

In High Energy Physics, there is a program to calculate numerically:

  • a function minimum of \(F(a)\) of parameters \(a_i\) (with up to 50 parameters),
  • the covariance matrix of these parameters
  • the (asymmetric or parabolic) errors of the parameters from \(F_{min}+\Delta\) for arbitrary \(\Delta\)
  • the contours of parameter pairs \(a_i, a_j\).

This program is called MINUIT and was originally written by Fred JAMES in FORTRAN. MINUIT has been since then rewritten in C++ and is available through ROOT.

Let’s see what Gonum and its gonum/optimize package have to offer.

Physics example

Let’s consider a radioactive source. n measurements are taken, under the same conditions. The physicist measured and counted the number of decays in a given constant time interval:

rs := []float64{1, 1, 5, 4, 2, 0, 3, 2, 4, 1, 2, 1, 1, 0, 1, 1, 2, 1}

What is the mean number of decays ?

A naive approach could be to just use the (weighted) arithmetic mean:

mean := stat.Mean(rs, nil)
merr := math.Sqrt(mean / float64(len(rs)))
fmt.Printf("mean=  %v\n", mean)
fmt.Printf("µ-err= %v\n", merr)
// Output:
// mean=  1.7777777777777777
// µ-err= 0.31426968052735443 

Let’s plot the data:

func plot(rs []float64) {
	mean := stat.Mean(rs, nil)

	// hbook is go-hep.org/x/hep/hbook.
	// here we create a 1-dim histogram with 10 bins,
	// from 0 to 10.
	h := hbook.NewH1D(10, 0, 10)
	for _, x := range rs {
		h.Fill(x, 1)
	}
	h.Scale(1 / h.Integral())

	// hplot is a convenience package built on top
	// of gonum.org/v1/plot.
	// hplot is go-hep.org/x/hep/hplot.
	p := hplot.New()
	p.X.Label.Text = "r"
	hh := hplot.NewH1D(h)
	hh.FillColor = color.RGBA{255, 0, 0, 255}
	p.Add(hh)

	fct := hplot.NewFunction(distuv.Poisson{Lambda: mean}.Prob)
	fct.Color = color.RGBA{0, 0, 255, 255}
	p.Add(fct)
	p.Add(hplot.NewGrid())

	err := p.Save(10*vg.Centimeter, -1, "plot.png")
	if err != nil {
		log.Fatal(err)
	}
}

which gives:

plot

Ok, let’s try to estimate µ using a log-likelihood minimization.

With MINUIT

From the plot above and from first principles, we can assume a Poisson distribution. The Poisson probability is:

$$P(n|\mu) = \frac{\mu^{n}}{n!} e^{-\mu}$$

This therefore leads to a log-likelihood of:

$$\ln L(\mu) = n \ln(\mu) - \ln(n!) - \mu$$

which is the quantity we’ll try to optimize.

In C++, this would look like:

#include <math.h>
#include <cstdio>
#include "TMinuit.h"

#define NDATA 18
int r[NDATA]    = {1,1,  5, 4,2,0,3,2, 4,1,2,1,1,0,1,1,2,1};
int rfac[NDATA] = {1,1,120,24,2,1,6,2,24,1,2,1,1,1,1,1,2,1};

void fcn(int &npar, double *gin, double &f, double *par, int iflag) {
	int i;
	double mu, lnL;

	mu = par[0];
	lnL = 0.0;
	for (i=0; i<NDATA; i++) {
		lnL += r[i]*std::log(mu) - mu - std::log((double)rfac[i]);
	}
	f = -lnL;
}

int main(int argc, char **argv) {

	double arglist[10];
	int ierflg = 0;

	double start = 1.0; // initial value for mu
	double step = 0.1;
	double l_bnd = 0.1;
	double u_bnd = 10.;

	TMinuit minuit(1); // 1==number of parameters
	minuit.SetFCN(fcn);
	minuit.mnparm(
		0, "Poisson mu",
		start, step,
		l_bnd, u_bnd, ierflg
	);

	// set a 1-sigma error for the log-likelihood
	arglist[0] = 0.5;
	minuit.mnexcm("SET ERR",arglist,1,ierflg);

	// search for minimum.
	// computes covariance matrix and computes parabolic
	// errors for all parameters.
	minuit.mnexcm("MIGRAD",arglist,0,ierflg);

	// calculates exact, asymmetric errors for all
	// variable parameters.
	minuit.mnexcm("MINOS",arglist,0,ierflg);

	// set a 2-sigma error for the log-likelihood
	arglist[0] = 2.0;
	minuit.mnexcm("SET ERR",arglist,1,ierflg);

	// calculates exact, asymmetric errors for all
	// variable parameters.
	minuit.mnexcm("MINOS",arglist,0,ierflg);

	results(&minuit);
	return 0;
}

As this isn’t a blog post about how to use MINUIT, we won’t go too much into details.

Compiling the above program with:

$> c++ -o radio `root-config --libs --cflags` -lMinuit radio.cc

and then running it, gives:

$> ./radio
[...]
Results of MINUIT minimisation
-------------------------------------

 Minimal function value:                29.296  
 Estimated difference to true minimum:   2.590e-09 
 Number of parameters:           1     
 Error definition (Fmin + Delta):         2.000  
 Exact covariance matrix.

   Parameter     Value       Error    positive    negative    L_BND    U_BND
 0 Poisson mu  1.778e+00  6.285e-01 +7.047e-01 -5.567e-01  1.0e-01  1.0e+01

Covariance matrix: 
  3.951e-01

Correlation matrix: 
  1.000

So the mean of the Poisson distribution is estimated to 1.778 +/- 0.629.

With gonum/optimize

func main() {
	rs := []float64{1, 1, 5, 4, 2, 0, 3, 2, 4, 1, 2, 1, 1, 0, 1, 1, 2, 1}
	rfac := []float64{1, 1, 120, 24, 2, 1, 6, 2, 24, 1, 2, 1, 1, 1, 1, 1, 2, 1}

	mean := stat.Mean(rs, nil)
	merr := math.Sqrt(mean / float64(len(rs)))

	fmt.Printf("mean=%v\n", mean)
	fmt.Printf("merr=%v\n", merr)

	fcn := func(x []float64) float64 {
		mu := x[0]
		lnl := 0.0
		for i := range rs {
			lnl += rs[i]*math.Log(mu) - mu - math.Log(rfac[i])
		}
		return -lnl
	}

	grad := func(grad, x []float64) {
		fd.Gradient(grad, fcn, x, nil)
	}

	hess := func(h mat.MutableSymmetric, x []float64) {
		fd.Hessian(h.(*mat.SymDense), fcn, x, nil)
	}

	p := optimize.Problem{
		Func: fcn,
		Grad: grad,
		Hess: hess,
	}

	var meth = &optimize.Newton{}
	var p0 = []float64{1} // initial value for mu

	res, err := optimize.Minimize(p, p0, nil, meth)
	if err != nil {
		log.Fatal(err)
	}

	display(res, p)
	plot(rs)
}

Compiling and running this program gives:

$> go build -o radio radio.go
$> ./radio
mean=1.7777777777777777
merr=0.31426968052735443

results: &optimize.Result{Location:optimize.Location{X:[]float64{1.7777777839915905}, F:29.296294958031794, Gradient:[]float64{1.7763568394002505e-07}, Hessian:(*mat.SymDense)(0xc42022a000)}, Stats:optimize.Stats{MajorIterations:6, FuncEvaluations:9, GradEvaluations:7, HessEvaluations:7, Runtime:191657}, Status:4}

minimal function value:   29.296
number of parameters: 1
grad=[1.7763568394002505e-07]
hess=[10.123388051986694]
errs= [0.3142947001265352]

par-000: 1.777778e+00 +/- 6.285894e-01

Same result. Yeah!

gonum/optimize doesn’t try to automatically numerically compute the first- and second-derivative of an objective function (MINUIT does.) But using gonum/diff/fd, it’s rather easy to provide it to gonum/optimize.

gonum/optimize.Result only exposes the following informations (through gonum/optimize.Location):

// Location represents a location in the optimization procedure.
type Location struct {
	X        []float64
	F        float64
	Gradient []float64
	Hessian  *mat.SymDense
}

where X is the parameter(s) estimation and F the value of the objective function at X.

So we have to do some additional work to extract the error estimations on the parameters. This is done by inverting the Hessian to get the covariance matrix. The error on the i-th parameter is then:

erri := math.Sqrt(errmat.At(i,i)).

And voila.

Exercize for the reader: build a MINUIT-like interface on top of gonum/optimize that provides all the error analysis for free.

Next time, we’ll analyse a LEP data sample and use gonum/optimize to estimate a physics quantity.

NB: the material and orignal data for this blog post has been extracted from: http://www.desy.de/~rosem/flc_statistics/data/04_parameters_estimation-C.pdf.