* Intro to data analysis * Class Procedure * Philosophy * Measurement uncertainty (BR1.1,BR1.2) * Statistical * Systematic * Reporting data vs interpreting data * Identifying your assumptions. * Reporting Data vs Finding Parameters of a Model * A word about the text books: * Two words to show disdain, "Frequentist" and "Bayesian". The describe a someone who has replaced emotion for formalism. * To quote a review in the CERN Courier: "When a Physicist writes a book on probability and statistics, it is necessarily as an amateur." * "Prob. and Stat. in Experimental Physics", B. Roe Byron Roe takes a very emotional approach to statistics, and uses very intemperate language to describe points that his disagreements. Be careful and don't adopt his emotional approach. Specific points: 1) He confuses interpretation with definition. 2) Never distinguishes between knowledge and frequency interpretations of probability 3) He attempts to analyze knowledge probabilities as frequency probabilities. 4) He seldom clearly defines his concepts. 5) His notation is a little bit unusual. That said, this book has many strong points: 1) He provides detailed mathematical treatments for many topics. 2) He gives physics motivated "derivations" of many important points. 3) He touches on many techniques that are often skipped. * "Stat. Data Analysis", G. Cowan Takes a very pragmatic approach to statistics. Is pretty good about defining stuff, and sticks to rather standard notation. Book doesn't have many worked examples and tends to be aimed at a someone with a background in experimental physics. * Random Variables * Definition of a random variable (practical and theoretical) * Pseudo-random Variables (what we get on computers) * Ways to show random variables (notes) (good homework problem) * 1D Histograms * Software for the semester (cover first day, needed for first homework) * What's available (web page) * What's required * symbolic algebra * programing w/ * histograms * plots * Where to get it (web page) * demo of creating a histogram * Probability (C1.1) * Axiomatic definition * Basic mathematics * Law of total probability * Bayes' theorem * Representations of probability * probability density functions * cumulative probability distribution * Other definitions * Marginal probability * Conditional probability * Notation P(A) -- is a probability P(A|B) -- is the conditional probability of A given A F(x) -- The cumulative distribution f(x) -- a pdf f(x) = dF/dx f(x,y) -- a pdf f(x,y) = dF^2/dxdy f(x;y) -- A conditional density function (represents P(X|Y)) E{x} -- Expectation value of X var{x} -- variance of x, often \sigma^2_x cov{x_i,x_j} -- covariance of x (usually a matrix V_{ij}, or U_{ij}) * Interpretation of Probability * Probability as the limit of a frequency distribution * Probability as a representation of knowledge * The Red Queen Definition: It means what we mean it to mean (pragmatism) * Expectation Values (Estimators of Expectation Values) (C1.5) * Not really moments * Mode (most probable value) * definition from pdf * Median (middle value) * definition from pdf * definition from data * Mean (average) BR1.4 * definition from pdf * estimator from measurement * Variance (standard deviation) * definition from pdf * estimator from measurement * Standard deviation as estimator of uncertainty * Covariance (correlation coefficient) * definition from pdf * estimator from measurement * Basic Python * Basic syntax * Assigning a value * printing a value * writing a value to a file * The "if" statement * The "for" statement * Functions * defining a function * calling a function * returning a value * How to create a histogram * Functions of random variables * pdf of the function C1.5 * Mellin Convolution C1.4 * Fourier Convolution C1.4 * Error Analysis * Assumptions: * pdf can be well represented by mean and variance. Strictly true for a Gaussian, but is mostly true for any "peaked" pdf. Almost all interesting pdfs can be described by the mean and covariance. * Result of assumption: pdf of a random variable and a function of that variable can both be described by a mean and covariance. * Basic Theory (C1.6) * Specific formulas (BR3.2) * Handling Statistical and Systematic Errors * Probability Distributions * Discrete * Binomial * Multinomial * Poisson * Continuous * Uniform * Gaussian * Chi-Squared * Others: Lorentzian (Britt-Wigner), exponential, log-normal * The Monte Carlo Method * Why the MC method * Need test data we understand * Need to verify analysis sensitivity * Allows study of systematics * Mention function integration (not part of this class) * Methods to generate random distributions * Transformational Method * exp (x) * uniform angles * Acceptance/Rejection Method * Canned generators * Simulating Experiments * Simulating a linear dependence * Simulating a muon decay experiment * Simulating a flat background plus a signal (good homework linear background plus a signal) * Hypothesis testing * Chi-squared test * Student's t test * Trials factor * Least-squares fitting * Linear least-squares fits * fit to a constant * fit to a line * fit to a polynomial (homework problem) * Non-linear least-squares fits * Defining chi-squared for a non-linear model * Numeric minimization * Brute force (notes) * Simplex (notes) * Canned routines (scipy) * Determining parameter variance with least-squares fitting * Finding the Covariance * Defined by the second derivative * Delta-chi2 method * Confidence Intervals (tricky) * Resources * Use Cowan Chap 9 * Roe Chap 12 to highlight sloppy thinking. * Definitions * Confidence Interval as a member of a covering set (frequency def.) * Can't rigorously include effect of systematics * "Frequentist" analyzes often have a mish-mash of Bayesian systematic terms handled as ad-hoc frequentist probabilities, and with problems swept under the rug. Remember the goal is to increase our knowledge. Don't get too philosophical about a tool. * Confidence Interval as a credible interval (knowledge def.) * Systematics are handled naturally * Needs an assumption about "prior knowledge" * Must be very careful since assumed knowledge introduces bias into interval. Be very clear about assumptions. * Which to use: * Always report data, and assumptions * When practical use both * Report data with covering interval * Must report systematics separately * Make decisions with credible interval * Summarizes state of knowledge. Includes effect of systematic and statistical terms. * Estimating model dependent systematic errors * What are model dependent systematics * Example (background under a curve) * Possible example: * background: exp(x) + 1/x + constant * signal: Gaussian or Lorentzian * Homework (do work for my example)