A Sample of Probability Distributions and Their Properties

Probability distributions can be broadly categorised into discrete and continuous distributions based on the nature of the random variable they model. Discrete probability distributions model random variables that take on a finite or countably infinite set of values, whereas continuous distributions apply to random variables that can take any value within a specified interval of the real number line. This article focuses on distributions, detailing their properties, probability mass functions, and statistical properties.

Discrete Probability Distributions

Binomial Distribution

The binomial distribution is used when an experiment consists of a fixed number of independent trials, where each trial results in only two possible outcomes, often categorized as success or failure. If the probability of success in each individual trial is denoted by $p$ and there are $n$ total trials, we describe the random variable $X$ , which represents the total number of successes, as a binomial distribution denoted by $X \sim \text{Bin}(n, p)$ . To determine the probability of observing exactly $x$ successes across those $n$ trials, we use the probability mass function. This formula accounts for both the probability of the successes and failures occurring, as well as the different sequences in which they can appear:

P(X = x) = \binom{n}{x} p^x (1 - p)^{n - x} \tag{1}

The term $\binom{n}{x}$ is known as the binomial coefficient, calculated as $\frac{n!}{x!(n - x)!}$ , and it represents the total number of distinct ways to choose $x$ successes from $n$ available trials. The rest of the equation, $p^x (1 - p)^{n - x}$ , calculates the probability of one specific sequence of $x$ successes and $n - x$ failures.

Two key properties describe the center and spread of this distribution. The expected value of $X$ , which represents the mean number of successes we would expect over many repetitions of the $n$ trials, is simply the product of the number of trials and the probability of success as below:

E(X) = np \tag{2}

Additionally, the variance, which quantifies how much the number of successes typically deviates from that mean, is given by:

V(X) = np(1 - p) \tag{3}

As the probability of failure $(1 - p)$ increases, or as $n$ increases, the potential spread of outcomes changes accordingly

As an example, consider a biased coin with its head having a probability $p = 0.1$ when tossed. To find the probability of seeing exactly 2 heads in $n = 6$ tosses, we need to use the probability mass function. so the binomial coefficient $\binom{6}{2}$ , which represents the number of ways to arrange 2 successes in 6 trials would be:

\binom{6}{2} = \frac{6!}{2!(6 - 2)!} = \frac{6 \times 5}{2 \times 1} = 15

With this, we get:

f(2, 6, 0.1) = 15 \times 0.1^2 \times 0.9^4 = 0.098415

The expected value of the average number of heads is $E(X) = 6 \times 0.1 = 0.6$ . The variance, which measures the spread of these outcomes, is $V(X) = 6 \times 0.1 \times 0.9 = 0.54$ .

Let consider another example of a scenario where a manufacturing plant produces light bulbs with a known defect rate of 5%. If we randomly select a batch of $n = 10$ bulbs to test, the probability $p$ of finding a defective bulb is $0.05$ . To find the probability of observing exactly 2 defective bulbs would mean 2 successes out of 10 trials which we calculate as:

\binom{10}{2} = \frac{10!}{2!(10 - 2)!} = \frac{10 \times 9}{2 \times 1} = 45

f(2, 10, 0.05) = 45 \times 0.05^2 \times 0.95^8 = 0.074635

The expected number of defective bulbs is $E(X) = 10 \times 0.05 = 0.5$ . The variance for this distribution is $V(X) = 10 \times 0.05 \times 0.95 = 0.475$ .

Geometric Distribution

The geometric distribution models the number of independent trials required to achieve the first success. We assumes that each trial is independent and has a same probability of success $p$ . If we define the random variable $X$ as the specific trial number where the first success occurs, we say that $X$ follows a geometric distribution, denoted as $X \sim \text{Geom}(p)$ .To find the probability that the first success occurs exactly on trial $x$ , we use the probability mass function. Meaning we must observe exactly $x-1$ consecutive failures before finally reaching a success on the $x$ -th attempt:

P(X = x) = (1 - p)^{x - 1} p \tag{4}

The $(1 - p)^{x - 1}$ calculates the probability of the initial failures, while $p$ accounts for the success that stops the sequence. This distribution is particularly useful for "waiting-time" problems, for example predicting the number of coin tosses needed to see the first head or the number of attempts required to pass a quality control test. The expected value, which provides the average number of trials one would expect to perform before succeeding, is given by:

E(X) = \frac{1}{p} \tag{5}

This inverse relationship means that as the probability of success decreases, the expected number of attempts increases proportionally. The variance, which measures the spread or uncertainty of when that first success will happen, is calculated as:

V(X) = \frac{1 - p}{p^2} \tag{6}

This distribution also has a unique "memoryless" property, implying that the probability of success on the next trial does not depend on how many failures have already occurred.

Lets consider an example of rolling a fair six-sided die repeatedly until a "1" is seen. Since there is one success out of six possible outcomes, the probability of success is $p = 1/6$ . Following the expected value formula, the average number of rolls needed to see that first "1" is $E(X) = \frac{1}{1/6} = 6$ . It is also often useful to distinguish between the total number of trials and the number of failures that occur before the first success. In this case, the average number of failures is calculated as $\frac{1-p}{p}$ , which simplifies to $\frac{1-(1/6)}{1/6} = 5$ .

Negative Binomial Distribution

This models the number of trials needed to achieve a specific outcome, BUT the focus from a fixed number of attempts to a fixed number of successes. While a standard binomial distribution counts how many successes occur in $n$ trials, the negative binomial distribution reverses this logic by continuing the trials until the $r$ -th success observed. This makes it an extension of the geometric distribution; which models the trials until the first success, the negative binomial models the journey toward the $r$ -th success. This is invaluable in fields like sales or quality control, where one needs to predict how many "failures" or "rejections" will be faced before hitting a specific target of successful outcomes.

If we define $X$ as the number of failures that occur before reaching the $r$ -th success, we denote this as $X \sim \text{NegBin}(r, p)$ . The probability mass function for this distribution is:

P(X = x) = \binom{x + r - 1}{r - 1} p^r (1 - p)^x \tag{7}

The binomial coefficient $\binom{x + r - 1}{r - 1}$ is calculated as $\frac{(x + r - 1)!}{x!(r - 1)!}$ . It represents the total number of ways to arrange the first $r - 1$ successes within the initial $x + r - 1$ trials, ensuring the last trial is the $r$ -th success.

The mean number of failures before obtaining $r$ successes is:

E(X) = \frac{r(1 - p)}{p} \tag{8}

And the variance is:

V(X) = \frac{r(1 - p)}{p^2} \tag{9}

The negative binomial distribution can be expressed in several ways depending on which variable you are looking for and which outcome serves as the stopping criteria as shown below:

\begin{array}{ll} \text{Goal} & \text{Formula } P(X=x) \\ \hline k \text{ failures, given } r \text{ successes} & \binom{k+r-1}{k} p^r (1-p)^k \\ n \text{ trials, given } r \text{ successes} & \binom{n-1}{r-1} p^r (1-p)^{n-r} \\ n \text{ trials, given } r \text{ failures} & \binom{n-1}{r-1} (1-p)^r p^{n-r} \\ k \text{ successes, given } r \text{ failures} & \binom{k+r-1}{k} (1-p)^r p^k \\ \end{array}

For the second alternative, the objective is to determine the total number of trials needed to reach a success threshold, the formula changes slightly to account for the fact that the total sample size is the random variable now. The forth version models the number of successes achieved before a failure threshold is reached.

The key difference is that, In a binomial distribution, you have a fixed number of trials $n$ and you count how many successes $k$ occur. In a negative binomial distribution, the number of successes (or failures) is fixed, and the number of trials is the random variable that continues until that target is met.

Poisson distribution

This models how many times an event occurs within a specific window of time or space. It's built on an assumption that events happen independently and at a constant average rate $\lambda$ and $k$ number of occurrences. We say that $X \sim \text{Poisson}(\lambda)$ , and the probability of observing exactly $k$ events is:

P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \tag{10}

For poisson, the mean and variance are identical, both being equal to the average rate $\lambda$ which implies that as the average number of events increases, the spread or uncertainty of the distribution increases at the exact same rate.

Here is the summary table for all the common discrete distributions above:

\begin{array}{lllll} \text{Distribution} & \text{Random Variable } X & \text{PMF} & \text{Mean } E(X) & \text{Variance } V(X) \\ \hline \text{Binomial} & \text{\# of successes in } n \text{ trials} & \binom{n}{x}p^x(1-p)^{n-x} & np & np(1-p) \\ \text{Geometric} & \text{Trials until 1st success} & (1-p)^{x-1}p & \frac{1}{p} & \frac{1-p}{p^2} \\ \text{Negative Binomial} & \text{Failures before } r\text{th success} & \binom{x+r-1}{r-1}p^r(1-p)^x & \frac{r(1-p)}{p} & \frac{r(1-p)}{p^2} \\ \text{Poisson} & \text{\# of events in interval} & \frac{e^{-\lambda}\lambda^x}{x!} & \lambda & \lambda \end{array}

Continuous Random Variables and Probability Distributions

Uniform Distribution

This continuous distribution models scenarios where every outcome within a range is equally probable. Meaning, when a continuous random variable $X$ is defined over an interval $[A, B]$ , and no particular value is more likely than another, we say that $X \sim \text{Uniform}(A, B)$ . It's distribution is defined as:

f(x) = \begin{cases} \frac{1}{B - A}, & A \leq x \leq B \\ 0, & \text{otherwise} \end{cases}

Due to the probabilities being spread evenly, the PDF is a flat rectangle with a height of $1/(B - A)$ , with the total area under the curve is exactly 1. The expected value is:

E(X) = \frac{A + B}{2} \tag{11}

The spread of the data, or the variance, is calculated with:

V(X) = \frac{(B - A)^2}{12} \tag{12}

The example of this distribution is the random number generation where we must pick a value e.g between 0 and 1 with perfect neutrality. Another common case is modeling "maximum uncertainty" scenarios such as estimating waiting times with no prior data regarding a system's variability.

Normal (Gaussian) Distribution

The normal or Gaussian distribution is the most common continuous probability distribution in statistics, primarily due to the Central Limit Theorem (CLT). The CLT states that, the sum or average of a large number of independent random variables will tend toward a normal distribution, regardless of the original distribution's shape.

When a continuous random variable $X$ follows a normal distribution with a mean $\mu$ and a variance $\sigma^2$ , we denote it as $X \sim \mathcal{N}(\mu, \sigma^2)$ . It's probability density function (PDF) is "bell curve" given by:

f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \tag{13}

The expected value of a normal distribution is $E(X) = \mu$ and the variance $V(X) = \sigma^2$ . It's special version is denoted as $Z \sim \mathcal{N}(0, 1)$ , where the mean is 0 and the variance is 1.

We use the cumulative distribution function (CDF) to calculate the probabilities of the normal distribution. It always requires standardization of the variable into a $Z$ -score with:

Z = \frac{X - \mu}{\sigma} \tag{14}

Exponential Distribution

This models the times between independent events that occur constantly at a fixed average rate. When a random variable $X$ follows an exponential distribution with rate parameter $\lambda$ , it is denoted as $X \sim \text{Exp}(\lambda)$ . Its distribution is defined as:

f(x) = \begin{cases} \lambda e^{-\lambda x}, & x \geq 0 \\ 0, & x < 0 \end{cases} \tag{15}

The probability density starts at its maximum value $\lambda$ and exponential decays as $x$ increases. It's expected value of the mean time between events is:

E(X) = \frac{1}{\lambda} \tag{16}

The variance is calculated with:

V(X) = \frac{1}{\lambda^2} \tag{17}

Common examples of this distribution include survival analysis, reliability modeling, and calculating the failure rates of components. It is also the standard model for the time between arrivals in a Poisson process, such as the time between phone calls at a service center.

Weibull Distribution

This generalizes the exponential distribution with an introduction of a shape parameter, allowing for more flexible modeling of lifetimes and failure rates. When a continuous random variable $X$ follows a Weibull distribution with shape parameter $\alpha$ and scale parameter $\beta$ , it is denoted as $X \sim \text{Weibull}(\alpha, \beta)$ . The distribution is defined as:

f(x) = \begin{cases} \frac{\alpha}{\beta} \left(\frac{x}{\beta}\right)^{\alpha - 1} e^{-(x/\beta)^\alpha}, & x > 0 \\ 0, & x \leq 0 \end{cases} \tag{19}

By adjusting the shape parameter $\alpha$ , we can model problems with decreasing, constant, or increasing failure rates. Due this, the calculations for the mean and variance involves the Gamma function $\Gamma(n)$ .

The expected value is:

E(X) = \beta \Gamma\left(1 + \frac{1}{\alpha}\right) \tag{20}

It's variance is calculated as:

V(X) = \beta^2 \left[ \Gamma\left(1 + \frac{2}{\alpha}\right) - \left(\Gamma\left(1 + \frac{1}{\alpha}\right)\right)^2 \right] \tag{21}

This distribution is very common in survival modeling. For example, when $\alpha < 1$ , it models "infant mortality" where failure rates decrease over time; when $\alpha = 1$ , it's just the exponential distribution (constant failure rate); and when $\alpha > 1$ , it models "wear-out" periods where the probability of failure increases as the problem ages.

Chi-Squared Distribution

The Chi-squared ( $\chi^2$ ) distribution is statistical inference tool for particularly hypothesis testing and confidence intervals generation. Given $X_1, X_2, \dots, X_k$ independent standard normal variables, the sum of their squares follows a chi-squared distribution with $k$ degrees of freedom. This is denoted as $X \sim \chi^2(k)$ , where:

X = \sum_{i=1}^{k} X_i^2 \tag{22}

Its distribution is defined as:

f(x) = \begin{cases} \frac{1}{2^{k/2} \Gamma(k/2)} x^{(k/2) - 1} e^{-x/2}, & x > 0 \\ 0, & x \leq 0 \end{cases} \tag{23}

The shape of the distribution depends entirely on the degrees of freedom. It is usually skewed for small $k$ and becomes more symmetric as $k$ increases.

The expected value is:

E(X) = k \tag{24}

And it's variance is calculated with:

V(X) = 2k \tag{25}

This distribution provides a framework for goodness-of-fit tests, variance estimation, and determining the independence between categorical variables in contingency tables. It is also the basis for the $F$ -distribution used in ANOVA.

Beta Distribution

This probability distributions is defined on the interval $[0, 1]$ . Because its domain is bounded, it is the common choice for modeling variables that represent proportions, probabilities, or percentages. When a random variable $X$ is parameterized by two shape parameters $\alpha$ and $\beta$ , we denote it as $X \sim \text{Beta}(\alpha, \beta)$ . Its distribution is defined as:

f(x) = \frac{x^{\alpha - 1} (1 - x)^{\beta - 1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1 \tag{26}

The denominator $B(\alpha, \beta)$ is the Beta function is a normalization constant to ensure the total area under the PDF is 1 as:

B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt \tag{27}

By adjusting $\alpha$ and $\beta$ , the distribution can take on a variety of shapes including uniform, U-shaped, bell-shaped, or skewed—making it highly adaptable.

The expected value is:

E(X) = \frac{\alpha}{\alpha + \beta} \tag{28}

It's variance is calculated with:

V(X) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \tag{29}

The Beta distribution is key in Bayesian statistics, where it often serves as a conjugate prior for Bernoulli, Binomial, and Geometric distributions. It is also widely applied in A/B testing, election forecasting, and quality control to model the uncertainty surrounding the true success rate of a process.

Conclusion

We have explored the different discrete and continuous probability distributions, which are the foundation for most statistical analyses. There’s no need to memorize every detail; instead, keep these concepts in mind for future reference. Over time, you'll naturally learn the most appropriate distribution for a given problem.

A Sample of Probability Distributions and Their Properties

Discrete Probability Distributions

Binomial Distribution

Geometric Distribution

Negative Binomial Distribution

Poisson distribution

Continuous Random Variables and Probability Distributions

Uniform Distribution

Normal (Gaussian) Distribution

Exponential Distribution

Weibull Distribution

Chi-Squared Distribution

Beta Distribution

Conclusion

Previous Article

Next Article