Let’s face it, a good statistics refresher is always worthwhile. There are times we all forget basic concepts and calculations. Therefore, I put together a document that could act as a statistics refresher and thought that I’d share it with the world. This is part one of a two part document that is still being completed. This refresher is based on Principles of Statistics by Balmer and Statistics in Plain English by Brightman.

### The Two Concepts of Probability

#### Statistical Probability

- Statistical probability pertains to the relative frequency with which an event occurs in the long run.
- Example:

Let’s say we flip a coin twice. What is the probability of getting two heads?

If we flip a coin twice, there are four possible outcomes, $latex [(H,H), (H,T), (T,H), (T,T)] $.

Therefore, the probability of flipping two heads is $latex \frac{(H,H)}{N} = \frac{1}{2}*\frac{1}{2} = \frac{1}{4} $

#### Inductive Probability

- Inductive probability pertains to the degree of belief which is reasonable to place on a proposition given evidence.
- Example:

I’m $latex 95\% $ certain that the answer to $latex 1 + 1 $ is between $latex 1.5 $ and $latex 2.5 $.

### The Two Laws of Probability

#### Law of Addition

- If $latex A $ and $latex B $ are mutually exclusive events, the probability that either $latex A $ or $latex B $ will occur is equal to the sum of their separate probabilities.

$latex \displaystyle P(A \space or \space B) = P(A) + P(B) $

#### Law of Multiplication

- If $latex A $ and $latex B $ are two events, the probability that both $latex A $ and $latex B $ will occur is equal to the probability that $latex A $ will occur multiplied by the conditional probability that $latex B $ will occur given that $latex A $ has occured.

$latex P(A \space and \space B) = P(A) * P(B|A) $

#### Conditional Probability

- The probability of $latex B $ given $latex A $, or $latex P(B|A) $, is the probability that $latex B $ will occur if we consider only those occasionson which $latex A $ also occurs. This is defined as $latex \frac{n(A \space and \space B)}{n(A)} $.

### Random Variables and Probability Distributions

#### Discrete Variables

- Variables which arise from counting and can only take integral values $latex (0, 1, 2, \ldots) $.
- A frequency distribution represents the amount of occurences for all the possible values of a variable. This can be represented in a table or graphically as a probability distribution.
- Associated with any discrete random variable, $latex X $, is a corresponding probability function which tells us the probability with which $latex X $ takes any value. The particular value that $latex X $ can take is characterized by $latex x $. Based on $latex x $, the probability that $latex X $ will take can be calculated. This measure is the probability function and is defined by $latex P(x) $.
- The cumulative probability function specifies the probability that $latex X $ is less than or equal to some particular value, $latex x $. This is denoted by $latex F(x) $. The cumulative probability function can be calculated by summing the probabilities of all values less than or equal to $latex x $.

$latex F(x) = Prob[X \leq x] $

$latex F(x) = P(0) + P(1) + \ldots + P(x) = \sum_{u \leq x} p(u) $

#### Continuous Variables

- Variables which arise from measuring and can take any value within a given range.
- Continuous variables are best graphically represented by a histogram, where the area of each rectangle represents the proportion of observations falling in that interval.
- The probability density function, $latex f(x) $, refers to the smooth continuous curve that is used to describe the relative likelihood a random variable to take on a given value. $latex f(x) $ can also be used to show the probability that the random variable will lie between $latex x_1 $ and $latex x_2 $.
- A continuous probability distribution can also be represented by its cumulative probability function, $latex f(x) $. which specified the probability that $latex X $ is less than or equal to $latex x $.
- A continuous random variable is said to be uniformly distributed between $latex 0 $ and $latex 1 $ if it is equally likely to lie anywhere in this interval but cannot lie outside it.

#### Multivariate Distributions

- The joint frequency distribution of two random variables is called a bivariate distribution. $latex P(x,y) $ denotes the probability that simultaneously $latex X $ will be $latex x $ and $latex Y $ will be $latex y $. This is expressed through a bivariate distribution table.

$latex P(x,y) = Prob[X == x \space and \space Y == y] $

- In a bivariate distribution table, the right hand margin sums the probabilities in different rows. It expresses the overall probability distribution of $latex x $, regardless of the value of $latex y $.

$latex p(x) = Prob[X == x] = \sum_{y} p(x,y) $

- In a bivariate distribution table, the bottom margin sums the probabilities in different columns. It expresses the overall probability distribution of $latex y $, regardless of the value of $latex x $.

$latex p(y) = Prob[Y == y] = \sum_{x} p(x,y) $

### Properties of Distributions

#### Measures of Central Tendancy

- The mean is measured by taking the sum divided by the number of observations.

$latex \bar{x} = \frac{x_1 + x_2 + \ldots + x_n}{n} = \sum_{i=1}^n \frac{x_i}{n} $

- The median is the middle observation in a series of numbers. If the number of observations are even, then the two middle observations would be divided by two.
- The mode refers to the most frequent observation.
- The main question of interest is whether the sample mean, median, or mode provides the most accurate estimate of central tendancy within the population.

#### Measures of Dispersion

- The standard deviation of a set of observations is the square root of the average of the squared deviations from the mean. The squared deviations from the mean is called the variance.

#### The Shape of Distributions

- Unimodal distributions have only one peak while multimodal distributions have several peaks.
- An observation that is skewed to the right contains a few large values which results in a long tail towards the right hand side of the chart.
- An observation that is skewed to the left contains a few small values which results in a long tail towards the left hand side of the chart.
- The kurtosis of a distribution refers to the degree of peakedness of a distribution.

### The Binomial, Poisson, and Exponential Distributions

#### Binomial Distribution

- Think of a repeated process with two possible outcome, failure ($latex F $) and success ($latex S $). After repeating the experiment $latex n $ times, we will have a sequence of outcomes that include both failures and successes, $latex SFFFSF $. The primary metric of interest is the total number of successes.
- What is the probability of obtaining $latex x $ successes and $latex n-x $ failures in $latex n $ repetitions of the experiment?

#### Poisson Distribution

- The poisson distribution is the limiting form of the binomial distribution when there are a large number of trials but only a small probability of success at each of them.

#### Exponential Distribution

- A continuous, positive random variable is said to follow an exponential distribution if its probability density function decreases as the values of $latex x $ go from $latex 0 $ to $latex \infty $. The probability declines from its highest levels at the initial values of $latex x $.

### The Normal Distribution

#### Properties of the Normal Distribution

- The real reason for the importance of the normal distribution lies in the central limit theorem, which states that the sum of a large number of independent random variables will be approximately normally distributed regardless of their individual distributions.
- A normal distribution is defined by its mean, $latex \mu $, and standard deviation, $latex \sigma $. A change in the mean shifts the distribution along the x-axis. A change in the standard deviation flattens it or compresses it while leaving its centre in the same position. The totral area under the curve is one and the mean is at the middle and divides the area into halves.
- One standard deviation above and below the mean of a normal distribution will include 68% of the observations for that variable. For two standard deviates, that value will be 95%, and for three standard deviations, that value will be 99%.

There you have it, a quick review of basic concepts in statistics and probability. Please leave comments or suggestions below. If you’re looking to hire a marketing scientist, please contact me at mathewanalytics@gmail.com

Pingback: Distilled News | Data Analytics & R

Pingback: Distilled News | Data Analytics & R