- Published on
- 15/05/2020
A Sample of Probability Distributions and Their Properties
Probability distributions can be broadly categorised into discrete and continuous distributions based on the nature of the random variable they model. Discrete probability distributions model random variables that take on a finite or countably infinite set of values, whereas continuous distributions apply to random variables that can take any value within a specified interval of the real number line. This article focuses on distributions, detailing their properties, probability mass functions, and statistical properties.
Discrete Probability Distributions
Binomial Distribution
The binomial distribution is used when an experiment consists of a fixed number of independent trials, where each trial results in only two possible outcomes, often categorized as success or failure. If the probability of success in each individual trial is denoted by and there are total trials, we describe the random variable , which represents the total number of successes, as a binomial distribution denoted by . To determine the probability of observing exactly successes across those trials, we use the probability mass function. This formula accounts for both the probability of the successes and failures occurring, as well as the different sequences in which they can appear:
The term is known as the binomial coefficient, calculated as , and it represents the total number of distinct ways to choose successes from available trials. The rest of the equation, , calculates the probability of one specific sequence of successes and failures.
Two key properties describe the center and spread of this distribution. The expected value of , which represents the mean number of successes we would expect over many repetitions of the trials, is simply the product of the number of trials and the probability of success as below:
Additionally, the variance, which quantifies how much the number of successes typically deviates from that mean, is given by:
As the probability of failure increases, or as increases, the potential spread of outcomes changes accordingly
As an example, consider a biased coin with its head having a probability when tossed. To find the probability of seeing exactly 2 heads in tosses, we need to use the probability mass function. so the binomial coefficient , which represents the number of ways to arrange 2 successes in 6 trials would be:
With this, we get:
The expected value of the average number of heads is . The variance, which measures the spread of these outcomes, is .
Let consider another example of a scenario where a manufacturing plant produces light bulbs with a known defect rate of 5%. If we randomly select a batch of bulbs to test, the probability of finding a defective bulb is . To find the probability of observing exactly 2 defective bulbs would mean 2 successes out of 10 trials which we calculate as:
The expected number of defective bulbs is . The variance for this distribution is .
Geometric Distribution
The geometric distribution models the number of independent trials required to achieve the first success. We assumes that each trial is independent and has a same probability of success . If we define the random variable as the specific trial number where the first success occurs, we say that follows a geometric distribution, denoted as .To find the probability that the first success occurs exactly on trial , we use the probability mass function. Meaning we must observe exactly consecutive failures before finally reaching a success on the -th attempt:
The calculates the probability of the initial failures, while accounts for the success that stops the sequence. This distribution is particularly useful for "waiting-time" problems, for example predicting the number of coin tosses needed to see the first head or the number of attempts required to pass a quality control test. The expected value, which provides the average number of trials one would expect to perform before succeeding, is given by:
This inverse relationship means that as the probability of success decreases, the expected number of attempts increases proportionally. The variance, which measures the spread or uncertainty of when that first success will happen, is calculated as:
This distribution also has a unique "memoryless" property, implying that the probability of success on the next trial does not depend on how many failures have already occurred.
Lets consider an example of rolling a fair six-sided die repeatedly until a "1" is seen. Since there is one success out of six possible outcomes, the probability of success is . Following the expected value formula, the average number of rolls needed to see that first "1" is . It is also often useful to distinguish between the total number of trials and the number of failures that occur before the first success. In this case, the average number of failures is calculated as , which simplifies to .
Negative Binomial Distribution
This models the number of trials needed to achieve a specific outcome, BUT the focus from a fixed number of attempts to a fixed number of successes. While a standard binomial distribution counts how many successes occur in trials, the negative binomial distribution reverses this logic by continuing the trials until the -th success observed. This makes it an extension of the geometric distribution; which models the trials until the first success, the negative binomial models the journey toward the -th success. This is invaluable in fields like sales or quality control, where one needs to predict how many "failures" or "rejections" will be faced before hitting a specific target of successful outcomes.
If we define as the number of failures that occur before reaching the -th success, we denote this as . The probability mass function for this distribution is:
The binomial coefficient is calculated as . It represents the total number of ways to arrange the first successes within the initial trials, ensuring the last trial is the -th success.
The mean number of failures before obtaining successes is:
And the variance is:
The negative binomial distribution can be expressed in several ways depending on which variable you are looking for and which outcome serves as the stopping criteria as shown below:
For the second alternative, the objective is to determine the total number of trials needed to reach a success threshold, the formula changes slightly to account for the fact that the total sample size is the random variable now. The forth version models the number of successes achieved before a failure threshold is reached.
The key difference is that, In a binomial distribution, you have a fixed number of trials and you count how many successes occur. In a negative binomial distribution, the number of successes (or failures) is fixed, and the number of trials is the random variable that continues until that target is met.
Poisson distribution
This models how many times an event occurs within a specific window of time or space. It's built on an assumption that events happen independently and at a constant average rate and number of occurrences. We say that , and the probability of observing exactly events is:
For poisson, the mean and variance are identical, both being equal to the average rate which implies that as the average number of events increases, the spread or uncertainty of the distribution increases at the exact same rate.
Here is the summary table for all the common discrete distributions above:
Continuous Random Variables and Probability Distributions
Uniform Distribution
This continuous distribution models scenarios where every outcome within a range is equally probable. Meaning, when a continuous random variable is defined over an interval , and no particular value is more likely than another, we say that . It's distribution is defined as:
Due to the probabilities being spread evenly, the PDF is a flat rectangle with a height of , with the total area under the curve is exactly 1. The expected value is:
The spread of the data, or the variance, is calculated with:
The example of this distribution is the random number generation where we must pick a value e.g between 0 and 1 with perfect neutrality. Another common case is modeling "maximum uncertainty" scenarios such as estimating waiting times with no prior data regarding a system's variability.
Normal (Gaussian) Distribution
The normal or Gaussian distribution is the most common continuous probability distribution in statistics, primarily due to the Central Limit Theorem (CLT). The CLT states that, the sum or average of a large number of independent random variables will tend toward a normal distribution, regardless of the original distribution's shape.
When a continuous random variable follows a normal distribution with a mean and a variance , we denote it as . It's probability density function (PDF) is "bell curve" given by:
The expected value of a normal distribution is and the variance . It's special version is denoted as , where the mean is 0 and the variance is 1.
We use the cumulative distribution function (CDF) to calculate the probabilities of the normal distribution. It always requires standardization of the variable into a -score with:
Exponential Distribution
This models the times between independent events that occur constantly at a fixed average rate. When a random variable follows an exponential distribution with rate parameter , it is denoted as . Its distribution is defined as:
The probability density starts at its maximum value and exponential decays as increases. It's expected value of the mean time between events is:
The variance is calculated with:
Common examples of this distribution include survival analysis, reliability modeling, and calculating the failure rates of components. It is also the standard model for the time between arrivals in a Poisson process, such as the time between phone calls at a service center.
Weibull Distribution
This generalizes the exponential distribution with an introduction of a shape parameter, allowing for more flexible modeling of lifetimes and failure rates. When a continuous random variable follows a Weibull distribution with shape parameter and scale parameter , it is denoted as . The distribution is defined as:
By adjusting the shape parameter , we can model problems with decreasing, constant, or increasing failure rates. Due this, the calculations for the mean and variance involves the Gamma function .
The expected value is:
It's variance is calculated as:
This distribution is very common in survival modeling. For example, when , it models "infant mortality" where failure rates decrease over time; when , it's just the exponential distribution (constant failure rate); and when , it models "wear-out" periods where the probability of failure increases as the problem ages.
Chi-Squared Distribution
The Chi-squared () distribution is statistical inference tool for particularly hypothesis testing and confidence intervals generation. Given independent standard normal variables, the sum of their squares follows a chi-squared distribution with degrees of freedom. This is denoted as , where:
Its distribution is defined as:
The shape of the distribution depends entirely on the degrees of freedom. It is usually skewed for small and becomes more symmetric as increases.
The expected value is:
And it's variance is calculated with:
This distribution provides a framework for goodness-of-fit tests, variance estimation, and determining the independence between categorical variables in contingency tables. It is also the basis for the -distribution used in ANOVA.
Beta Distribution
This probability distributions is defined on the interval . Because its domain is bounded, it is the common choice for modeling variables that represent proportions, probabilities, or percentages. When a random variable is parameterized by two shape parameters and , we denote it as . Its distribution is defined as:
The denominator is the Beta function is a normalization constant to ensure the total area under the PDF is 1 as:
By adjusting and , the distribution can take on a variety of shapes including uniform, U-shaped, bell-shaped, or skewed—making it highly adaptable.
The expected value is:
It's variance is calculated with:
The Beta distribution is key in Bayesian statistics, where it often serves as a conjugate prior for Bernoulli, Binomial, and Geometric distributions. It is also widely applied in A/B testing, election forecasting, and quality control to model the uncertainty surrounding the true success rate of a process.
Conclusion
We have explored the different discrete and continuous probability distributions, which are the foundation for most statistical analyses. There’s no need to memorize every detail; instead, keep these concepts in mind for future reference. Over time, you'll naturally learn the most appropriate distribution for a given problem.
For comments, please send me an email.