This page simply lists various identities used in statistics and
provides basic explanatory notes and links to pages with more detailed information
There are two primary branches of statistics : descriptive statistics and inferential statistics .
Descriptive Statistics is the the branch of statistics which relates to collecting, summarising and presenting data sets.
Inferential Statistics is the branch of statistics which analyses sample data to arrive at conclusions about a population
An example of descriptive statistics is the average age of people who voted for a party in a election.
An example of inferential statistics is taking a sample of 1000 english mechanical engineers to enable calculation of
the average salary of mechanical engineers in england.
Population : All members of a group under consideration.
Sample : The part of the population selected for analysis such as to obtain information about the population
Parameter: A numerical measure describing a characteristic of a population. example mean μ.
Statistic:A numerical measure describing a characteristic of a sample. example: arithmetic mean xm
Variable:A characteristic of an item that is analysed using statistical methods. example length, income
The above identities relates to samples and populations. Parameters of populations
as identified below including mean μ, variance σ 2,
and standard deviation σ are true values. These true values are ideal
assuming that the whole population (N) can be measured or the sample number (n) is infinite.
The expectation E(X) is the sum of all the products formed by multiplying
each event in a probability. E(X) = ∑ x i p i.
distribution by its corresponding probability.
| || || |
|Name ||Symbol ||Description|| Link|
|Population || ||The total of the variables under study. e.g. the
population of the UK :
The population of the world: The total number of fish in a lake.
|| Samples And Populations|
|Population Size ||N ||The total number in the population. e.g. The population of the UK. The upper case
N is also used for the number of elementary events for a random variable
|Population mean ||μ|| The sum of the value of each
item of the population divided by the number of items in the population. (The mean is synonymous with the Expection E(X).) |
|Population variance ||σ 2||The sum
of [square (the value of each item in the population - the value of the population mean )]
divided n (number in the population). This is used
to determine the standard deviation.|
|Population Standard Deviation ||σ ||The square
root of the variance. This is a measure of the spread of the population.
If every item in a population has a very similar dimension then the standard deviation
|Sample || || a random collection of units from the
population. Collect to establish information on the population as a whole.|
|Sample size ||n|| number of items in the sample|
|Sample mean ||xm || the sum of the value of each
item of the sample divided by the sample size|
|Discrete variable|| ||A variable which is counted.
Examples include population, number of coins, number of fish.|
|Continuous variable|| ||A variable which is measured .
Examples include length of a bar, height of a person, distance to a planet.|
|Sample variance|| sx 2 || The sum of
[square (the value of each item in the sample - the value of the sample mean )]divided
by (n-1) . n = number of items in the sample. This is used to determine the
|Sample Standard Deviation || σ || The square
Root of the variance. This is a measure of the spread of the sample.
If every item in a sample has a very similar dimension then the standard deviation
|Sample median || m || The sample median is the middle value
(in the case of an odd-sized sample), or average of the two middle values (in the case
of an even-size sample), when the values in a sample are arranged in ascending order.-
||Not used on this website|
|Sample mode || || The value in a sample which occurs
most often. A set of numbers with one mode is called a uni-modal set and a set with
two modes is called a bimodal.||Not used on this website|
|Range || || The difference between the largest and smallest value in a set of values for a variable.|
|Quartiles|| ||If a set of numbers e.g. a sample is arranged
in order of magnitude then the values which divide the set into four equal parts are
called quartiles. The lower quartile is the value of the item at (n+1)/4. The upper quartile value is the value of the
item at (3(n+1)/4 and the value of the middle quartile is the same as the median.||Not used on this website|
|Deciles|| ||If a set of numbers is arranged in order of
magnitude then the deciles are the values of the numbers which divide the set into
ten groups.||Not used on this website|
|Percentiles|| ||If a set of numbers is arranged in order of
magnitude then the percentiles are the values of the numbers which divide the set into
100 groups.||Not used on this website|
|Probability||p||A numerical measure of how likely it is that some event will occur.|
1 or 100% is the probability
of certainty. .0 or 0% is the probability that the event will definitely not occur
|Hypothesis|| ||A statistical hypothesis is an assumption about the distribution
of a random variable. Example: The mean height of children in a school is 1,2m is a hypothesis which may be accepted or rejected.
|Null Hypothesis|| Ho||
The null hypothesis, H0 represents a theory that has been put forward,
because it is considered to be true or because it is to be used as a basis but
has not been proved. Example Ho :The mean height of children in a school is 1,2m
|Alternative Hypothesis|| H1||
Is the alternative hypothesis which results on rejection of the null hypothesis:
Example.H1 : the height of children in a school is not 1,2m
|Permutation|| n P(r)||A permutation is an arrangement of (r) things with
the order being important. abcd is different to bcda.
||Permutations / Combinations|
|Combination|| n C(r)||A combination is an arrangement of (r)things with
the order being important. abcd is a combination the letter order is not
|Expectation||E(X) ||Synonymous with mean this is the sum of the products of each value in a distribution and its respective probability
|Uniform Distribution|| ||A distribution in which every possible
value of the variable has the same probability is called a
rectangular or uniform distribution.||
|Binomial Distribution|| ||This distribution relates to
the number of times a success occurs in n independent trials.
|Poisson Distribution|| ||The Poisson distribution is simply a
limiting case of the binomial distribution with p -> 0, and n - > infinity such
that the mean is m = np which approaches a finite value. Typically, a
Poisson random variable is a count of the number of events that occur in a certain
time interval or spatial area.|
|Hypergeometric Distribution|| ||Where the binomial
probability function involves sampling with replacement i.e. each trial is in dependent
the hypergeometric distribution involves sampling without replacement. The
trials are therefore not independent.|
|Normal Distribution|| ||The normal probability distribution
is the continuous distribution which is most representative of events that occur in the
natural world subject to countless variables. It is mostly used for continuous
variables and discrete variable with large samples.
|| Normal Distribution|
|t-test || ||When sample sizes are small, and the standard deviation of the population is unknown
it is normal to use the distribution of the t statistic.
|Chi-Squared test || ||The Chi-squared test is a tool which
enables determination how much a sample distribution can deviate from a population
if the hypothesis of equivalence is true.
|F- test || ||When comparing two samples it is often necessary to test the validity that the samples are from the same distribution.
The F ratio test is used for this purpose