Introduction
A confidence interval gives an estimated range of values which
is likely to include an unknown population parameter eg the mean, the estimated range
being calculated from a collected sample of data. The width of the confidence
interval gives us some idea about how uncertain we are about the unknown parameter
. A very wide interval may indicate that more data should
be collected before anything very definite can be said about the parameter.
The only way you can really get a statistical parameter of a population with 100% confidence
is
to test the whole population. Generally the population is large and testing the
whole population is costly and impracticable. However it is possible to
use a sample and to calculate a range within which the population parameter value is
likely to fall. Normally this is taken to be "95% likelyhood," and the
range is called the 95% confidence interval. It is also possible to
produce 90%, 99%, 99.9%, confidence intervals for the unknown parameters.
Symbols
f(x) = probability function. (values between 0 and 1)
F(x) = probability distribution function.
X_{m} = Sample mean
var = sample variance
Φ (x) = Probability distribution function.(Standardised probability )

μ = population /random variable mean
σ ^{2} = population /random variable variance
σ = population /random variable standard deviation
x_{m} = arithmetic mean of sample
s_{x} ^{2} = variance of sample
s_{x} = Standard deviation of sample

Confidence Interval ref.Normal Probability Distribution
It can be easily proved that for data that is "normally distributed"
about 68.3% of the data will be within 1 standard deviation ( σ )
of the mean μ (i.e., within the range μ ± σ).
In general there is a relationship between the fraction of the included data
and the deviation from the mean in terms of standard deviations
e.g the data fraction is related to μ ± c.σ) as shown in the table
below
Fraction of Data values  c 
50,0%  0,674 
68,3%  1,000 
90,0%  1,645 
95,0%  1,960 
95,4%  2,000 
98,0%  2,326 
99,0%  2,576 
99,7  3,000 
Examples of data spread: For a sample of a normal population one would expect about 68% of the
values to be within ± 1.00 of the sample mean x_{m} For a sample of a normal population one would expect about 95% of the
values to be within ± 1.96 of the sample mean x_{m}.
Example 1:
A random variable is normally distributed with a standard deviation of 5.
A random single sample from this distribution is 12,4 . Find the interval of values such that
there is a 99% confidence that the population mean is with the interval range.
From the table above P(μ 2,58 σ < x < +2,58 σ ) = 0.99
Therefore P(μ 12,9 < x < + 12,9 )= 0.99
This implies P(12,4 12,9 < μ < 12,4 12,9 ) =0,99
That is 0,5 < μ < 25,3 = with 99% confidence.
This is simply stating that based on a single sampled value of 12,4 then there
is a 99% confidence that the population mean is within the rang 0,5 to 25,3. This is
a wide range and not very useful. To obtain a more smaller interval a larger
sample, ( greater n ) is required. The distribution of the mean of this
sample will be normally distribution with a variance of σ ^{2} /n (refer to notes below)
Example 2:
Obtain a 95% confidence interval for the mean of a normal distribution
with a variance σ ^{2} = 9, i.e a standard deviation σ = 3
using a sample of n = 100 with a mean x _{m} = 5:
For a 95% confidence interval c = 1,96.
The confidence interval for a 95% probability = P( x_{m}  1,96 .3 / √ 100 > μ ;> x_{m} + 1,96 .3 / √ 100 > )
That is there is a 95% confidence that the mean of the population will be within 4,412 and 5,588
Background Theory
Sample distribution of a population mean
Consider a single random variable X
Now x _{1}....x _{n} are observed
values of X. The x _{i} values can also be values of random variables X _{1}, X _{2}..
X_{n}. These have the same distributions as
X but are independent because the sample values are independent.
Now it is clear that:
X = X _{1} + X _{2} +.......+X _{n}
This is a normal distribution with a mean
μ = μ _{1} + μ _{2}+....+μ_{n}
and a variance
σ ^{2} = σ ^{2}_{1} + σ ^{2}_{2}+....+σ ^{2}_{n}
Considering a population with a mean μ and
variance
σ ^{2} . Now taking a number of samples of size n from
this population. Each sample has a mean x _{m} and a variance s _{x} .
It is useful obtain the distribution of the sample mean.
The mean of the sample distribution m (X_{m} ) = μ
The variance of the sample distribution mean var ( X _{m}) = σ ^{2} / n
The Standard deviation of the sample distribution mean SD( X _{m}) = σ / √ n
Central Limit Theorem
If X is a random variable with mean μ and variance σ ^{2}
then the distribution of the sample mean approximates to a Normal distribution with mean μ
and variance σ ^{2} /n as n >
This is applicable for all distributions of X when n > 30
This is good for normal distribution for all values of n >0
The Central Limit theorem is the foundation for many statistical
procedures, because the distribution of the population under study does not
have to be Normal : the sample statistic will be tend to a normal distribution anyway.
This is very useful when it comes to inference e.g it permits hypothesis tests
which assume normality even if the basis data seems to be nonnormal( assuming reasonably large sample sizes.
This is because the tests use the sample mean , which according to the Central Limit Theorem
will be approximately normally distributed. Hypothesis Tests
