# Statistics Fundamentals

### Statistics Fundamentals

Introduction

This page simply lists various identities used in statistics and provides basic explanatory notes and links to pages with more detailed information

There are two primary branches of statistics : descriptive statistics and inferential statistics .

Descriptive Statistics is the the branch of statistics which relates to collecting, summarising and presenting data sets.

Inferential Statistics is the branch of statistics which analyses sample data to arrive at conclusions about a population

An example of descriptive statistics is the average age of people who voted for a party in a election.
An example of inferential statistics is taking a sample of 1000 english mechanical engineers to enable calculation of the average salary of mechanical engineers in england.

The primary identities of statistics are

Population : All members of a group under consideration.

Sample : The part of the population selected for analysis such as to obtain information about the population

Parameter: A numerical measure describing a characteristic of a population. example mean μ.

Statistic:A numerical measure describing a characteristic of a sample.   example: arithmetic mean xm

Variable:A characteristic of an item that is analysed using statistical methods.  example length, income

The above identities relates to samples and populations. Parameters of populations as identified below including mean μ, variance σ 2, and standard deviation σ are true values.  These true values are ideal assuming that the whole population (N) can be measured or the sample number (n) is infinite.

The expectation E(X) is the sum of all the products formed by multiplying each event in a probability. E(X) = x i p i. distribution by its corresponding probability.

Notes
 Name Symbol Description Link Population The total of the variables under study. e.g. the population of the UK : The population of the world: The total number of fish in a lake. Samples And Populations Population Size N The total number in the population. e.g. The population of the UK.   The upper case N is also used for the number of elementary events for a random variable Population mean μ The sum of the value of each item of the population divided by the number of items in the population. (The mean is synonymous with the Expection E(X).) Population variance σ 2 The sum of [square (the value of each item in the population - the value of the population mean )] divided n (number in the population).  This is used to determine the standard deviation. Population Standard Deviation σ The square root of the variance.   This is a measure of the spread of the population.    If every item in a population has a very similar dimension then the standard deviation is small. Sample a random collection of units from the population.   Collect to establish information on the population as a whole. Sample size n number of items in the sample Sample mean xm the sum of the value of each item of the sample divided by the sample size Discrete variable A variable which is counted.   Examples include population, number of coins, number of fish. Continuous variable A variable which is measured .   Examples include length of a bar, height of a person, distance to a planet. Sample variance sx 2 The sum of [square (the value of each item in the sample - the value of the sample mean )]divided by (n-1) . n = number of items in the sample.  This is used to determine the standard deviation. Sample Standard Deviation σ The square Root of the variance.   This is a measure of the spread of the sample.    If every item in a sample has a very similar dimension then the standard deviation is small. Sample median m The sample median is the middle value (in the case of an odd-sized sample), or average of the two middle values (in the case of an even-size sample), when the values in a sample are arranged in ascending order.- Not used on this website Sample mode The value in a sample which occurs most often. A set of numbers with one mode is called a uni-modal set and a set with two modes is called a bimodal. Not used on this website Range The difference between the largest and smallest value in a set of values for a variable. Quartiles If a set of numbers e.g. a sample is arranged in order of magnitude then the values which divide the set into four equal parts are called quartiles.   The lower quartile is the value of the item at (n+1)/4. The upper quartile value is the value of the item at (3(n+1)/4 and the value of the middle quartile is the same as the median. Not used on this website Deciles If a set of numbers is arranged in order of magnitude then the deciles are the values of the numbers which divide the set into ten groups. Not used on this website Percentiles If a set of numbers is arranged in order of magnitude then the percentiles are the values of the numbers which divide the set into 100 groups. Not used on this website Probability p A numerical measure of how likely it is that some event will occur.1 or 100% is the probability of certainty.  .0 or 0% is the probability that the event will definitely not occur Probability Hypothesis A statistical hypothesis is an assumption about the distribution of a random variable. Example: The mean height of children in a school is 1,2m is a hypothesis which may be accepted or rejected. Hypothesis Null Hypothesis Ho The null hypothesis, H0 represents a theory that has been put forward, because it is considered to be true or because it is to be used as a basis but has not been proved. Example Ho :The mean height of children in a school is 1,2m Alternative Hypothesis H1 Is the alternative hypothesis which results on rejection of the null hypothesis: Example.H1 : the height of children in a school is not 1,2m Permutation n P(r) A permutation is an arrangement of (r) things with the order being important. abcd is different to bcda. Permutations / Combinations Combination n C(r) A combination is an arrangement of (r)things with the order being important. abcd is a combination the letter order is not important Expectation E(X) Synonymous with mean this is the sum of the products of each value in a distribution and its respective probability Expectation Uniform Distribution A distribution in which every possible value of the variable has the same probability is called a rectangular or uniform distribution. Discrete Distributions Binomial Distribution This distribution relates to the number of times a success occurs in n independent trials. Poisson Distribution The Poisson distribution is simply a limiting case of the binomial distribution with p -> 0, and n - > infinity such that the mean is m = np which approaches a finite value.  Typically, a Poisson random variable is a count of the number of events that occur in a certain time interval or spatial area. Hypergeometric Distribution Where the binomial probability function involves sampling with replacement i.e. each trial is in dependent the hypergeometric distribution involves sampling without replacement.   The trials are therefore not independent. Normal Distribution The normal probability distribution is the continuous distribution which is most representative of events that occur in the natural world subject to countless variables.  It is mostly used for continuous variables and discrete variable with large samples. Normal Distribution t-test When sample sizes are small, and the standard deviation of the population is unknown it is normal to use the distribution of the t statistic. t distribution- Chi-Squared test The Chi-squared test is a tool which enables determination how much a sample distribution can deviate from a population if the hypothesis of equivalence is true. Chi-Test F- test When comparing two samples it is often necessary to test the validity that the samples are from the same distribution.   The F ratio test is used for this purpose F-test