# Confidence Intervals

### Confidence Intervals

Introduction

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter eg the mean, the estimated range being calculated from a collected sample of data.  The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter .   A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter.

The only way you can really get a statistical parameter of a population with 100% confidence is to test the whole population.   Generally the population is large and testing the whole population is costly and impracticable.  However it is possible to use a sample and to calculate a range within which the population parameter value is likely to fall.  Normally this is taken to be "95% likelyhood," and the range is called the 95% confidence interval.  It is also possible to produce 90%, 99%, 99.9%, confidence intervals for the unknown parameters.

Symbols
 f(x) = probability function. (values between 0 and 1) F(x) = probability distribution function. Xm = Sample mean var = sample variance Φ (x) = Probability distribution function.(Standardised probability ) μ = population /random variable mean σ 2 = population /random variable variance σ = population /random variable standard deviation xm = arithmetic mean of sample sx 2 = variance of sample sx = Standard deviation of sample

Confidence Interval ref.Normal Probability Distribution

It can be easily proved that for data that is "normally distributed" about 68.3% of the data will be within 1 standard deviation ( σ ) of the mean μ (i.e., within the range μ ± σ).   In general there is a relationship between the fraction of the included data and the deviation from the mean in terms of standard deviations e.g the data fraction is related to μ ± c.σ) as shown in the table below

 Fraction of Data values c 50,0% 0,674 68,3% 1,000 90,0% 1,645 95,0% 1,960 95,4% 2,000 98,0% 2,326 99,0% 2,576 99,7 3,000

For a sample of a normal population one would expect about 68% of the values to be within ± 1.00 of the sample mean xm
For a sample of a normal population one would expect about 95% of the values to be within ± 1.96 of the sample mean xm.

Example 1:

A random variable is normally distributed with a standard deviation of 5.   A random single sample from this distribution is 12,4 . Find the interval of values such that there is a 99% confidence that the population mean is with the interval range.

From the table above P(μ -2,58 σ < x < +2,58 σ )  =   0.99
Therefore P(μ -12,9 < x < + 12,9 )= 0.99
This implies P(12,4 -12,9 < μ < 12,4 -12,9 ) =0,99
That is -0,5 < μ < 25,3   =  with 99% confidence.
This is simply stating that based on a single sampled value of 12,4 then there is a 99% confidence that the population mean is within the rang -0,5 to 25,3.   This is a wide range and not very useful.  To obtain a more smaller interval a larger sample, ( greater n ) is required.  The distribution of the mean of this sample will be normally distribution with a variance of σ 2 /n (refer to notes below)

Example 2:

Obtain a 95% confidence interval for the mean of a normal distribution with a variance σ 2 = 9, i.e a standard deviation σ = 3
using a sample of n = 100 with a mean x m = 5:
For a 95% confidence interval c = 1,96.
The confidence interval for a 95% probability = P( xm - 1,96 .3 / 100    >   μ   ;>    xm + 1,96 .3 / 100 > )
That is there is a 95% confidence that the mean of the population will be within 4,412 and 5,588

Background Theory

Sample distribution of a population mean

Consider a single random variable X

Now x 1....x n are observed values of X.  The x i values can also be values of random variables X 1, X 2.. Xn. These have the same distributions as X but are independent because the sample values are independent.

Now it is clear that:

X = X 1 + X 2 +.......+X n

This is a normal distribution with a mean

μ = μ 1 + μ 2+....+μn

and a variance

σ 2 = σ 21 + σ 22+....+σ 2n

Considering a population with a mean μ and variance σ 2 .
Now taking a number of samples of size n from this population. Each sample has a mean x m and a variance s x .
It is useful obtain the distribution of the sample mean.

The mean of the sample distribution m (Xm ) = μ
The variance of the sample distribution mean var ( X m) = σ 2 / n
The Standard deviation of the sample distribution mean SD( X m) = σ / n

Central Limit Theorem

If X is a random variable with mean μ and variance σ 2 then the distribution of the sample mean approximates to a Normal distribution with mean μ and variance σ 2 /n as n -->

This is applicable for all distributions of X when n > 30
This is good for normal distribution for all values of n >0

The Central Limit theorem is the foundation for many statistical procedures, because the distribution of the population under study does not have to be Normal : the sample statistic will be tend to a normal distribution anyway.

This is very useful when it comes to inference e.g it permits hypothesis tests which assume normality even if the basis data seems to be non-normal( assuming reasonably large sample sizes.   This is because the tests use the sample mean , which according to the Central Limit Theorem will be approximately normally distributed.  Hypothesis Tests