Integration Techniques and Numerical Integration for Machine Learning

In Secondary School Calculus, we are introduced to the basic concepts of integration, including The Mean Value Theorem, Fundamental Theorem of Calculus, and the Techniques of Integration. Although I provide a brief review of these concepts, they are analytical methods. Due to the fact that some functions may be difficult to integrate analytically, we use numerical integration methods to approximate their solution, which is the focus of this blog.

The Mean Value Theorem for Integrals

For any continuous function f(x) with an interval [a,b], there exist at least one point through the interval such that its value is the average of the function(x):

f(c) = \frac{1}{b - a} \int_a^b f(x) \, dx \tag{1}

Fundamental Theorem of Calculus (FTC)

This is called the fundamental theorem of calculus because it connects differentiation and integration; hence, we have a guarantee that for any continuous function, there is always an antiderivative (integral) for the derivative of that function. Moreover, the FTC can be rephrased into two parts (Part 1, "derivative thinking," and Part 2, "Integral thinking"). As commonly said, derivatives model the rate of change, while integrals model the accumulation of the change; both thoughts of FTC are relevant to machine learning.

Let's look at Part 2 of FTC, which is the simplest of the two:

\int_a^b f(x)\,dx = F(b) - F(a) \tag{2}

$f(x)$ is the derivative of $F(x)$ or the antiderivative of $f(x)$ . We shall look at the different methods for finding the antiderivative of different functions analytically in the next section, but to solidify the second part of FTC, let's look at a function say $x^2$ . Since $x^2$ is the derivative of $\frac{1}{3}x^3$ , for any given $a$ and $b$ we can calculate the area under the curve of $x^2$ as:

\int_a^b x^2\,dx = \frac{1}{3}b^3 - \frac{1}{3}a^3 \tag{3}

Common Techniques of Integration

Substitution Method: It's pretty a straight forward way of simplifying a complicated intergrand into a simpler one by substituting part of the integrand with a new variable for example with a U. Generally the substitutin algorithm is as follows:
Choose a substitution $u = g(x)$ such that the integral becomes easier to evaluate or $g'(x)$ appears in the integrand; Compute the differential $du = g'(x) dx$ ; Rewrite the integral in terms of $u$ and $du$ ; Integrate with respect to $u$ ; Substitute back to get the result in terms of $x$ .
For example, to integrate $\displaystyle\int 2x \cos(x^2) \, dx$ , we can let $u = x^2$ , then $du = 2x \, dx$ .

The integral therefore becomes:

\int \cos(u) \, du = \sin(u) + C = \sin(x^2) + C \tag{4}

Integration by Parts (IBP):

We use IBP when we have a function that is a product of two functions most especially when the functions are Power, Exponential, Trigonometric. The formula for IBP is given as:

\int u \, dv = uv - \int v \, du \tag{5}

With this formula the hardest part is choosing what part of the function to be u and what part to be dv. For Ugandan A'level maths, we are introduced to the mnemonic L.I.A.T.E (Logarithmic, Inverse trigonometric, Algebraic, Trigonometric, Exponential) to help us choose u. The function that comes first in the list is chosen to be u.

Thats' all it is for the analytical algorithm for IBP however LIATE can't cover all cases and no rule can. Let's look at an example of integrating $\int x e^x \, dx$ .
We choose $u = x$ (Algebraic) and $dv = e^x \, dx$ (Exponential). Then we compute $du = dx$ and $v = e^x$ . Applying the IBP formula, we have:

\int x e^x \, dx = x e^x - \int e^x \, dx = x e^x - e^x + C = e^x (x - 1) + C \tag{6}

Rational Functions:

Calculus volume 1 section 7.4 covers this topic in detail. If you are interested in learning more about this technique of integration, I recommend you read through that section.

Numerical Integration

Numerical integration techniques are useful for approximating definite integrals for situations when the antiderivative of the function is difficult or impossible to find analytically.

Trapezoidal Rule:

As much as trapezoid rule is a very simple "crude" method of numerically estimating the integrals, even with its inefficies it can useful and beats the more sophisticated methods for the certain problems like double expontial functions. The basic assumption is that the region under the curve of a function f(x) over an interval[a,b]. Here is how the method works, we approximate the integral of f(x) over [a,b] with:

\frac{b - a}{2}[f(a) + f(b)] \tag{7}

Which can further be improved by dividing the interval [a,b] into n subintervals of equal width $h = (b - a) / n$ , where n is a positive integer. The points dividing the intervals are given by:

I \approx \frac{h}{2}\Big[f(x_0) + 2\sum_{i = 1}^{n - 1} f(x_i) + f(x_n)\Big] \tag{8}

And formally summarized as:

I \approx \sum_{i=0}^{n-1} \frac{f(x_i) + f(x_{i+1})}{2}h \tag{9}

Assuming $M$ is the maximum possible absolute value of the second derivative of $f(x)$ within the integration range. If $f''(x)$ is small (the function is nearly linear), then $M$ is small, so the trapezoidal rule is very accurate. If $f''(x)$ is large (the function bends sharply), $M$ is large, so the error increases. we can estimate the error bound of how far off the trapezoid approximation might be from the true integral as:

|E(\Delta x)| \le \frac{(b - a)^3}{12n^2}M \tag{10}

Simpson's Rule:

The general idea for Simpson's rule is that instead of having so many subintervals like in trapezoidal rule to approximate the integral, to integrate a function over an interval [a,b], you find f(a), f(m), and f(b) where m is the midpoint of [a,b] and then find the quadratic curve that passes through these three points. The area under this curve is used to approximate the integral of f(x) over [a,b] by:

I \approx \frac{b - a}{6}[f(a) + 4f(m) + f(b)] \tag{11}

We can further improve the approximation with subintervals and the approximation becomes:

\int_a^b f(x)dx \approx \frac{h}{3}\sum_{k=1}^{N/2}[f(x_{2k-2}) + 4f(x_{2k-1}) + f(x_{2k})] \tag{12}

Monte Carlo Integration:

Generally, Monte Carlo method is a way of solving a problems by random sampling. For numerical integration, taking into consideration our function f(x) over an interval [a,b], to approximate $\displaystyle\int_a^b f(x) \, dx$ , since $X$ is uniformly distributed over [a,b], the probability distribution $p(x) = \frac{1}{b - a}$ . Therefore, we can rewrite the integral as an expectation with a Math trick of multiplying by $p(x)/p(x)$ which is 1:

\begin{aligned} \int_a^b f(x) \, dx &= \int_a^b \frac{f(x)}{p(x)} \, p(x) \, dx \\ &= (b - a)\int_a^b f(x) p(x) \, dx \\ &= (b - a) \mathbb{E}[f(X)] \end{aligned} \tag{13}

We can then estimate this expectation, using random sampling to draw from drawn from the interval [a,b]:

\mathbb{E}[f(X)] \approx \frac{1}{N} \sum_{i = 1}^{N} f(x_i) \approx (b - a) \frac{1}{N} \sum_{i = 1}^{N} f(x_i) \tag{14}

The problem with the above is that if the function has high variance(any other distribution), the estimate will be poor. To reduce the variance, we can use importance sampling by choosing a different probability distribution $q(x)$ that is more concentrated in the regions where f(x) is large. The integral can then be rewritten as:

\int_a^b f(x) \, dx = \int_a^b \frac{f(x)}{q(x)} \, q(x) \, dx = \mathbb{E}_{X \sim q} \Big[\frac{f(X)}{q(X)}\Big] \tag{15}

We can then estimate this expectation using random samples drawn from the distribution $q(x)$ :

\hat{I} = \frac{1}{n} \sum_{i = 1}^{n} \frac{f(X_i)}{q(X_i)}, \quad X_i \sim q(x) \tag{16}

Two other methods worth discussing later once we cover probability theory in the upcoming blogs are Gaussian Quadrature and Laplace approximation.

References:

Herman, E., & Strang, G. (2016). Calculus Volume 1. OpenStax. Retrieved from https://openstax.org/details/books/calculus-volume-1
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing (3rd ed.). Cambridge University Press.
Gentle, J. E. (2003). Random Number Generation and Monte Carlo Methods (2nd ed.). Springer.