# Tutoring statistics, confidence intervals are important.

A two-sided confidence interval for the population mean is given by

sample_mean – (standard_dev/n^{1/2})*sig_factor, sample_mean + (standard_dev/n^{1/2})*sig_factor

The sig_factor (significance factor) depends on the certainty (confidence level) with which we want the confidence interval to include the population mean; typically it’s around 2 (aka, 1.96) for 95% confidence.

The standard deviation might be known or might be calculated from the sample itself. If it’s known, the normal distribution is used; if calculated, then technically the t-distribution should be used (see point 3 below).

There are a few points that make the two-sided confidence interval for the population mean an elegant construct:

- Its lower and upper boundaries depend on the
*sample*size, but not the*population*size. - For sample size n≥31, the parent population needn’t be normal for the sample mean to be normally distriubted. This validates the confidence interval even for a non-normal population for n≥31. It’s a consequence of the Central Limit Theorem. (Actually, the rule of thumb is n≥30, but for the purpose of the next point, I like 31.)
- For n≥31, the t-distribution approximates the normal to around 4%, so the normal approximation can probably be used even for unknown population standard deviation.

Source:

Harnett, Donald L. and James L. Murphy. __Statistical Analysis for Business and Economics__, first Can. ed. Don Mills: Addison-Wesley, 1993.

Jack of Oracle Tutoring by Jack and Diane, Campbell River, BC.