Final Project for Spring 2006

This is the final project for the PHY447 (Tutorial in Advanced Topics: Statistical Data Analysis). The project is due May 16 at 5pm, and should sent by Email to my normal email address (clark.mcgrew@sunysb.edu). The solution should consist of a PDF or postscript file with all plots and text, as well as a file(s) containing the source used as part of the solution.

  1. Download the data for the final project from the course web-site. The data is in a text file with a single column of 10,000 numbers randomly generated according to an underlying model consisting of an exponential and two Gaussian distributions. The final project is to estimate the model that was used to generate the data.

  2. Use a histogram to understand the distribution of the data. You will need to determine the correct range, and a reasonable number of bins for the histogram. A plot of this histogram should be included in your final report.

  3. Describe the features of the histogram, and make initial “eye-ball” estimates for the model parameters. In your report, you should describe how you made the estimate.

  4. Build a function that can be used to fit the best values of the parameters. As mentioned above, the data is the sum of two Gaussian and an exponential distributions. You will need eight free parameters to describe the model. Use this function to compare your “eye-ball” estimate to the histogram of data (i.e. overlay a plot of the function using your eye-ball parameters on a plot of the data histogram, can be combined with the plot in step two).

  5. Use the function to fit the best values of the model parameters. Describe how you found the best fit values. Your report should give the total chi-squared value and the number of degrees of freedom of the best fit values. What is the probability that the best fit model describes the data (What is the P-Value for your chi-square). Make a second overlay plot showing the best fit function, and the data histogram.

  6. Use the “delta chi-squared equals one” rule to determine the uncertainty on each of the function parameters. In your report, show the value of chi-squared vs each parameter value (9 individual figures). Quote the best fit values for the model parameters as “value +- uncertainty”.

I expect that your report will be approximately 2 pages of text (excluding figures). You will need to make approximately 11 plots.