Linear Algebra Concepts for Data Science and Machine Learning

Linear Algebra is a broad topic that one blog cannot cover in its entirety. We will start with review of Matrix operations, including multiplication, vector spaces, and other related concepts. Then, explore the more interesting aspects of linear algebra for machine learning. The point of this blog is not to present formal mathematics, there are plenty and plenty of books and resources out there that will do that for you if that is your interest.

Vector and Matrix Multiplication:

Some of the key properties of vectors and matrices worth remembering before even looking at the vector/matrix multiplication operations are; Non-commutative ie $\mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A}$ ; Associative ie matrix multiplication is associative hence $(\mathbf{A}\mathbf{B})\mathbf{C} = \mathbf{A}(\mathbf{B}\mathbf{C})$ ; Distributive ie matrix multiplication is distributive over matrix addition $\mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C}$ .

Vector-Vector Multiplication:

Given $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ with the same dimension, $\mathbf{u}^\top \mathbf{v} = \mathbf{v}^\top \mathbf{u}$ and both are guaranteed to always result in same scalar value.

However, when with vectors are of different dimensions ( $\mathbf{u} \in \mathbb{R}^m$ and $\mathbf{v} \in \mathbb{R}^n$ ), their product results into matrix $\mathbf{u}\mathbf{v}^\top$ that is refered as an outer product of the vectors.

\mathbf{u}\mathbf{v}^\top \in \mathbb{R}^{m \times n} = \begin{bmatrix} u_1 v_1 & u_1 v_2 & \cdots & u_1 v_n \\ u_2 v_1 & u_2 v_2 & \cdots & u_2 v_n \\ \vdots & \vdots & \ddots & \vdots \\ u_m v_1 & u_m v_2 & \cdots & u_m v_n \end{bmatrix}

These two operations and elementwise product are summarised in the table below:

\begin{array}{cccc} \text{Operation} & \text{Expression} & \text{Result} & \text{Shape} \\ \hline \text{Inner Product} & \mathbf{u}^\top \mathbf{v} & \text{Scalar} & 1 \times 1 \\ \text{Outer Product} & \mathbf{u}\mathbf{v}^\top & \text{Matrix} & m \times n \\ \text{Elementwise Product} & \mathbf{u} \odot \mathbf{v} & \text{Vector} & n \times 1 \end{array}

Matrix- Vector Multiplication:

Given a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and a vector $\mathbf{x} \in \mathbb{R}^n$ , their product $\mathbf{A}\mathbf{x}$ results in a new vector $\mathbf{b} \in \mathbb{R}^m$ . Each element $b_i$ is computed as the dot product of the $i^{th}$ row of $\mathbf{A}$ (denoted as $\mathbf{a}_{i:}$ ) with the vector $\mathbf{x}$ . This operation is the fundamental for a linear transformation:

b_i = \mathbf{a}_{i:}^\top \mathbf{x} = \sum_{j = 1}^{n} A_{ij} x_j \tag{1}

Which we can further visualize as full transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ :

\begin{array}{ccc} \begin{bmatrix} A_{11} & \cdots & A_{1n} \\ \vdots & \ddots & \vdots \\ A_{m1} & \cdots & A_{mn} \end{bmatrix} & \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix} & = & \begin{bmatrix} b_1 \\ \vdots \\ b_m \end{bmatrix} \end{array} \tag{2}

Matrix-Matrix Multiplication:

Given two matrices $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$ , their product $\mathbf{C} = \mathbf{A}\mathbf{B}$ results in a new matrix $\mathbf{C} \in \mathbb{R}^{m \times p}$ . Each element $C_{ij}$ is computed as the dot product of the $i^{th}$ row of $\mathbf{A}$ and the $j^{th}$ column of $\mathbf{B}$ :

C_{ij} = \mathbf{a}_{i:}^\top \mathbf{b}_{:j} = \sum_{k = 1}^{n} A_{ik} B_{kj} \tag{3}

Kronecker Product:

Given matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{p \times q}$ , their Kronecker product is an $\mathbf{A} \otimes \mathbf{B} \in \mathbb{R}^{mp \times nq}$ matrix. For example:

\begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \otimes \begin{bmatrix} 5 & 7 \\ 6 & 8 \end{bmatrix} = \begin{bmatrix} 1 \begin{bmatrix} 5 & 7 \\ 6 & 8 \end{bmatrix} & 3 \begin{bmatrix} 5 & 7 \\ 6 & 8 \end{bmatrix} \\ 2 \begin{bmatrix} 5 & 7 \\ 6 & 8 \end{bmatrix} & 4 \begin{bmatrix} 5 & 7 \\ 6 & 8 \end{bmatrix} \end{bmatrix}

= \begin{bmatrix} 5 & 7 & 15 & 21 \\ 6 & 8 & 18 & 24 \\ 10 & 14 & 20 & 28 \\ 12 & 16 & 24 & 32 \end{bmatrix} \in \mathbb{R}^{4 \times 4}

This is usually formally generalized as:

\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} a_{11}\mathbf{B} & \dots & a_{1n}\mathbf{B} \\ \vdots & \ddots & \vdots \\ a_{m1}\mathbf{B} & \dots & a_{mn}\mathbf{B} \end{bmatrix} \in \mathbb{R}^{mp \times nq} \tag{4}

A closely related concept is Vectorization, For a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ , the vectorization of $\mathbf{A}$ is the $mn \times 1$ column vector which is obtained by stacking $\mathbf{A}$ 's columns on top of each other:

\text{vec}(\mathbf{A}) = (a_{11} \dots a_{m1} \dots a_{1n} \dots a_{mn})^\top \tag{5}

Alternatively, the row vectorization of an $[n, m]$ matrix $\mathbf{X}$ is the $nm$ -vector that simply flattens $\mathbf{X}$ by concatenating its rows into a single vector:

\text{rvec}(X) = [x_{11}, x_{12}, \dots, x_{1m}, x_{21}, \dots, x_{22m}, \dots, x_{nm}]^T \tag{6}

Which is similar as numpy's $\text{np.reshape}(\mathbf{X}, [-1])$ and this relationship can be equivalently expressed in terms of a Kronecker product with basis vectors $\mathbf{e}_i \in \mathbb{R}^{n \times 1}$ :

\text{rvec}(X) = \sum_{i=1}^n \mathbf{e}_i \otimes X^T \mathbf{e}_i \tag{7}

Some of the common identies involving Kronecker products are summarized below:

\begin{array}{lc} \text{Property Name} & \text{Mathematical Expression} \\ \hline \text{Transpose} & (\mathbf{A} \otimes \mathbf{B})^\top = (\mathbf{A}^\top \otimes \mathbf{B}^\top) \\ \text{Inverse} & (\mathbf{A} \otimes \mathbf{B})^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ \text{Factorizing} & (\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) = (\mathbf{AC} \otimes \mathbf{BD}) \\ \text{Distributive (Left)} & (\mathbf{A} \otimes \mathbf{B}) + (\mathbf{A} \otimes \mathbf{C}) = \mathbf{A} \otimes (\mathbf{B} + \mathbf{C}) \\ \text{Distributive (Right)} & (\mathbf{A} \otimes \mathbf{B}) + (\mathbf{C} \otimes \mathbf{B}) = (\mathbf{A} + \mathbf{C}) \otimes \mathbf{B} \\ \text{Associative} & (\mathbf{A} \otimes \mathbf{B}) \otimes \mathbf{C} = \mathbf{A} \otimes (\mathbf{B} \otimes \mathbf{C}) \end{array}

Einstein Summation:

Assuming we had a sequence of numbers indexed as $1, 2, \dots, n$ , we could represent their sum as $\sum_{k=1}^{n} X_k$ , meaning we keep summing using our index from $1$ to $n$ . Also, if there was a second sequence $Y$ , the dot product of $X$ and $Y$ would be:

\langle X, Y \rangle = \sum_{k=1}^{n} x_k y_k \tag{8}

Instead of writing the equation with the $\sum$ sign, the Einstein notation is a simplification that implicitly removes the summation sign. That is, given an expression, we perform summations over all the values of the repeated index which occurs in pairs (one on the upper index position and the other on the lower index position).

So, with this implicit removing of the summation sign, we use $x^i$ which is also called contravariant vectors if the components transform in the 'opposite' way to the basis vectors. The $x_i$ notation also called covariant vector since omponents transform in the "same" way as the basis vectors. We us lowering ( $x_i = g_{ij} x^j$ ) and raising ( $x^i = g^{ij} x_j$ ) to convert a column vector into a row vector and vice versa.

\begin{array}{lcc} \text{Vector Type} & \text{Notation} \\ \hline \text{Contravariant Vector} & x^i \text{ (Upper Index)} \\ \text{Covariant Vector} & x_i \text{ (Lower Index)} \\ \text{Tensor} & A^i_j \text{ (Upper \& Lower)} \end{array}

Some of the common einsum operations for machine learning are summarized below:

\begin{array}{lcc} \text{Operation} & \text{Summation Notation} & \text{Einstein Notation} \\ \hline \text{Inner Product} & \langle x, y \rangle = \sum_{i=1}^n x_i y_i & x^i y_i \\ \text{Bilinear Form} & A(x, y) = \sum_{i=1}^n \sum_{j=1}^n A_{ij} x_i y_j & A_{ij} x^i y^j \\ \text{Linear Transformation} & (Ax)_i = \sum_{j=1}^n A_{ij} x_j & (Ax)^i = A^i_j x^j \\ \text{Matrix Multiplication} & (AB)_{ij} = \sum_{k=1}^n A_{ik} B_{kj} & (AB)^i_j = A^i_k B^k_j \\ \text{Trace of Matrix} & \text{tr} A = \sum_{i=1}^n A_{ii} & A^i_i \end{array}

Vector Spaces

To understand the fundamentals Vector Spaces and projection matrices, let's first review what linear Independence, span, basis, dimensions and linear Maps are.

Linear Independence happens when $c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \dots + c_n \mathbf{v}_n \neq 0;\ \text{except when}\ \forall c_i = 0$ . Meaning, if there are other values of $c_i$ that result in the zero vector, then the vectors are said to be linearly dependent. In general, given the vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$ , we can check whether they are linearly independent by merging the vectors as column matrix $\mathbf{A}$ . For example:

\mathbf{v}_1 = \begin{bmatrix}1 \\ 2 \\ 3 \end{bmatrix}, \quad \mathbf{v}_2 = \begin{bmatrix}4 \\ 5 \\ 6 \end{bmatrix}, \quad \mathbf{v}_3 = \begin{bmatrix}7 \\ 8 \\ 9 \end{bmatrix}

\mathbf{A} = \begin{bmatrix}1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix}

We can check if the columns of $\mathbf{A}$ are linearly independent, by solving the equation $\mathbf{A}\mathbf{c} = \mathbf{0}$ with Reduced Row Echelon Form (RREF) resulting into:

\text{RREF}(\mathbf{A}) = \begin{bmatrix}1 & 0 & -1 \\ 0 & 1 & 2 \\ 0 & 0 & 0 \end{bmatrix}

The pivot positions (the leading 1s in each row) appear in columns 1 and 2, so those are the pivot columns.The third column does not contain a pivot, so it is a free column, meaning $\mathbf{v}_3$ can be written as a linear combination of the pivot columns:

\mathbf{v}_3 = -1\mathbf{v}_1 + 2\mathbf{v}_2 \tag{9}

Since there are only two pivot columns, the rank of $\mathbf{A}$ is 2, not 3. This means that $\text{rank}(\mathbf{A}) < n$ , confirming that the vectors are linearly dependent.

By contrast, for example, given the vectors:

\mathbf{v}_1 = \begin{bmatrix}1 \\ 0 \\ 0 \end{bmatrix}, \quad \mathbf{v}_2 = \begin{bmatrix}0 \\ 1 \\ 0 \end{bmatrix}, \quad \mathbf{v}_3 = \begin{bmatrix}0 \\ 0 \\ 1 \end{bmatrix}

\mathbf{A} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Here, each column contains a pivot, there are no free columns, and $\text{rank}(\mathbf{A}) = 3$ . The null space contains only the zero vector $\mathbf{c} = \mathbf{0}$ , meaning the columns of $\mathbf{A}$ are linearly independent.

Span, Basis, and Dimension

I will use one of the examples of the book to explain what a span is:

\begin{bmatrix} 3 \\ 1 \\ 1 \end{bmatrix} \text{ in } \text{Span}\left\{\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\right\}

Consider the above, can we find scalars $x_1$ and $x_2$ that make the equation consistent?

\begin{bmatrix} 3 \\ 1 \\ 1 \end{bmatrix} = x_1 \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} + x_2 \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \tag{10}

So with $x_1 = 2$ and $x_2 = 1$ , being a consistent solution, therefore:

\begin{bmatrix} 3 \\ 1 \\ 1 \end{bmatrix} \text{ is in } \text{Span}\left\{ \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \right\}

The basis is a set of vectors $v_1$ , $v_2$ , ... $v_n$ that satisfy linearly independent and span that space. For each basis vector, its space contains the same number of vectors called the dimension of the space.

Fundamental subspaces

Null Space

Given matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{Av} = \mathbf{0}$ , the null space is a set of all vectors $\mathbf{v} \in \mathbb{R}^n$ that $\mathbf{A}$ maps to a zero vector.

Column Space

Given matrix $\mathbf{A}$ , the column space of $\mathbf{A}$ , is the vector space that spans the column-vectors of $\mathbf{A}$ -- a set of all possible linear combinations of the column vectors. That is, given a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ , we it as a function that maps vectors from $\mathbb{R}^n$ to vectors in $\mathbb{R}^m$ with $\mathbf{Ax} = \mathbf{b}$ . The interesting fact is that, th column space is simply the range mapping $\mathbf{Ax}$ meaning it is the set of all possible output vectors $\mathbf{b}$ .

Row Space

Given a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ with $m$ rows and $n$ columns, its row space is the vector space spanned by the row vectors of $\mathbf{A}$ . If we denote the rows of $\mathbf{A}$ as $\mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_m \in \mathbb{R}^n$ , therefore $\text{Row}(\mathbf{A}) = \text{span}\{\mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_m\}$ . Meaning the row space consists of all possible linear combinations of the row vectors:

\text{Row}(\mathbf{A}) = \left\{ c_1\mathbf{r}_1 + c_2\mathbf{r}_2 + \cdots + c_m\mathbf{r}_m \mid c_1, c_2, \ldots, c_m \in \mathbb{R} \right\} \tag{11}

The dimension of the row space is the rank of the matrix, $\text{rank}(\mathbf{A})$ , which is the maximum number of linearly independent rows in $\mathbf{A}$ and also $\text{Row}(\mathbf{A}) = \text{Col}(\mathbf{A}^\top)$ . Also consider $\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \in \mathbb{R}^{2 \times 3}$ , The row vectors are $\mathbf{r}_1 = [1, 2, 3]$ and $\mathbf{r}_2 = [4, 5, 6]$ . The row space is $\text{span}\{[1, 2, 3], [4, 5, 6]\} \subseteq \mathbb{R}^3$ .

Matrix Inverses

Generally, given two numbers $x=5$ and $y=1/5$ , multiplying them results in $xy = 1$ . In this case, we call $y$ the multiplicative inverse of $x$ . While division plays an important role in scalar arithematics, there is no such operation as matrix division. Instead, the matrix inverse serves as its counterpart. For a square matrix, such as a $2 \times 2$ matrix $\mathbf{A}$ , finding the inverse is straightforward:

\mathbf{A} = \begin{bmatrix} a & c \\ b & d \end{bmatrix} \tag{12}

It's inverse, denoted as $\mathbf{A}^{-1}$ , is calculated as:

\mathbf{A}^{-1} = \frac{1}{\text{det}(\mathbf{A})} \begin{bmatrix} d & -c \\ -b & a \end{bmatrix} \quad \text{or} \quad \frac{1}{|\mathbf{A}|} \begin{bmatrix} d & -c \\ -b & a \end{bmatrix} \tag{13}

There are some relationships that exist between a matrix $\mathbf{A}$ and its inverse $\mathbf{A}^{-1}$ . The identity property states that, the product of a square matrix and its inverse always results in the identity matrix $\mathbf{I}$ , such that $\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$ and $\mathbf{AA}^{-1} = \mathbf{I}$ . other relationship are summarized below:

\begin{array}{ccc} \text{Condition} & \text{Determinant} & \text{Name} \\ \hline |\mathbf{A}| \neq 0 & \text{Non-zero} & \text{Invertible (Non-singular)} \\ |\mathbf{A}| = 0 & \text{Zero} & \text{Singular (Non-invertible)} \end{array}

Deciding when to use matrix inversion is a tricky question generally, however in machine learning because we have take the derivatives of inverses and determinants in optimization its common to use inversion. While high-speed inversion algorithms exist, the general guideline is to avoid direct inversion whenever possible. To solve a linear system of the form $\mathbf{A}\mathbf{x} = \mathbf{b}$ for a square matrix $\mathbf{A}$ , utilizing matrix factorization methods is preferred because they are numerically stable and accurate than computing the inverse. However, when one has to solve for $x = \mathbf{A}^{-1}\mathbf{b}$ across many different vectors of $\mathbf{b}$ , precomputing and storing $\mathbf{A}^{-1}$ becomes computationally efficient.

Some Special Matrices

Diagonal Matrix

This is a matrix for which all the non-diagonal elements are 0. It is usually denoted $\mathbf{D} = \text{diag}(d_1, d_2, \dots, d_n)$ or in matrix notation as:

\mathbf{D} = \begin{pmatrix} d_1 & & & \\ & d_2 & & \\ & & \ddots & \\ & & & d_n \end{pmatrix} \tag{14}

We could also rewrite the above as $\mathbf{d} = \text{diag}(\mathbf{D})$ by just extracting the diagonal vector. One special example of a diagonal matrix is the identity matrix that is usually denoted $\mathbf{I} \in \mathbb{R}^{n \times n}$ .

Symmetric and Skew-Symmetric Matrices

First, the concept of symmetry only applies to square matrices. Matrix $\mathbf{A}$ is said to be symmetric if and only if its transpose equals the original matrix $\mathbf{A}$ , that is:

\mathbf{A} = \mathbf{A}^T \tag{15}

Meaning given $\mathbf{A}$ symmetric, the element at row $i$ , column $j$ must equal the element at row $j$ , column $i$ ( $a_{ij} = a_{ji}$ ).

\mathbf{A} = \begin{bmatrix} 5 & -2 & 1 & 3 \\ -2 & 0 & 4 & 7 \\ 1 & 4 & 9 & -5 \\ 3 & 7 & -5 & 6 \end{bmatrix}, \quad \text{the transpose } (\mathbf{A}^T) = \begin{bmatrix} 5 & -2 & 1 & 3 \\ -2 & 0 & 4 & 7 \\ 1 & 4 & 9 & -5 \\ 3 & 7 & -5 & 6 \end{bmatrix}

Since $\mathbf{A} = \mathbf{A}^T$ , we can conclude that the square matrix $\mathbf{A}$ is symmetric.

A simple tweak is when $\mathbf{A}^T = -\mathbf{A}$ , we say that matrix $\mathbf{A}$ is skew-symmetric. In a skew-symmetric matrix, the main diagonal elements must be $0$ , and $a_{ij} = -a_{ji}$ as shown below:

\mathbf{A} = \begin{bmatrix} 0 & -2 & 1 & 3 \\ 2 & 0 & -4 & -7 \\ -1 & 4 & 0 & 5 \\ -3 & 7 & -5 & 0 \end{bmatrix}, \quad \text{the transpose } (\mathbf{A}^T) = \begin{bmatrix} 0 & 2 & -1 & -3 \\ -2 & 0 & 4 & 7 \\ 1 & -4 & 0 & -5 \\ 3 & -7 & 5 & 0 \end{bmatrix}

Orthogonal Matrix

This is a square matrix whose columns and also the rows are orthonormal vectors. We say a vector e.g $q_1, \dots, q_n$ each derived from matrix $Q = \begin{bmatrix} | & | & & | \\ q_1 & q_2 & \dots & q_n \\ | & | & & | \end{bmatrix}$ to be orthonormal basis if and only if each vector is a unit vector (length = 1) and a set of orthonormal vectors $q_1, q_2, \dots, q_n$ form independent vectors.Therefore, given a square matrix $Q \in \mathbb{R}^{n \times n}$ , this matrix is orthogonal if its transpose is equal to its inverse:

Q^T = Q^{-1} \tag{16}

Additionally, orthogonal matrices have the following interesting properties $Q^T Q = Q Q^T = I$ . Where $Q^T Q = I$ implies that the columns of $Q$ are orthonormal and $Q Q^T = I$ implies that the rows of $Q$ are also orthonormal.

4. Positive Definite Matrices

4.1 quadratic forms and eigenvalues

Given a quadratic equation like $ax_1^2 + bx_1x_2 + cx_2^2$ , we can rewrite its associated symmetric matrix where the square terms form the diagonal and the cross-term are divided by 2 as:

\begin{bmatrix} a & b/2 \\ b/2 & c \end{bmatrix} \tag{17}

The same rule also apply for the more complex quadratic form e.g for $ax_1^2 + bx_2^2 - cx_3^2 + dx_1x_2 - fx_1x_3 + gx_2x_3$ , the associated symmetric matrix is constructed by placing the coefficients of the squared terms ( $x_1^2, x_2^2, x_3^2$ ) on the main diagonal and splitting the cross-term coefficients across the corresponding symmetric positions:

\begin{bmatrix} a & d/2 & -f/2 \\ d/2 & b & g/2 \\ -f/2 & g/2 & -c \end{bmatrix} \tag{18}

So, generally the rules for converting any quadratic form to a symmetric matrix $\mathbf{A}$ are; the diagonal elements ( $a_{ii}$ ) are the coefficients of the squared terms $x_i^2$ ; the off-diagonal elements ( $a_{ij}$ ) are half of the coefficient of the $x_ix_j$ term to ensure the matrix remains symmetric ( $a_{ij} = a_{ji}$ ).

Therefore, we can formally define the relationship between the quadratic form $Q(x)$ and symmetric matrix $\mathbf{A}$ as:

Q(x) = \mathbf{x}^T \mathbf{A} \mathbf{x} \tag{19}

We can prove this is correct by rewriting $\mathbf{x}^T \mathbf{A} \mathbf{x}$ for our first example of the quadratic form $ax_1^2 + bx_1x_2 + cx_2^2$ .

Given vector $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ and symmetric matrix $\mathbf{A} = \begin{bmatrix} a & b/2 \\ b/2 & c \end{bmatrix}$ :

\mathbf{Ax} = \begin{bmatrix} a & b/2 \\ b/2 & c \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} ax_1 + \frac{b}{2}x_2 \\ \frac{b}{2}x_1 + cx_2 \end{bmatrix}

\mathbf{x}^T(\mathbf{Ax}) = \begin{bmatrix} x_1 & x_2 \end{bmatrix} \begin{bmatrix} ax_1 + \frac{b}{2}x_2 \\ \frac{b}{2}x_1 + cx_2 \end{bmatrix}

= x_1\left(ax_1 + \frac{b}{2}x_2\right) + x_2\left(\frac{b}{2}x_1 + cx_2\right) \tag{20}

= ax_1^2 + \frac{b}{2}x_1x_2 + \frac{b}{2}x_2x_1 + cx_2^2

= ax_1^2 + bx_1x_2 + cx_2^2

This has an interesting connection with eigenvalues in that we can first calculate the eigenvalues e.g. given:

\mathbf{A} = \begin{bmatrix} 3 & -2 \\ -2 & 7 \end{bmatrix}

To find ( $\lambda$ ), we solve the characteristic equation $\det(\mathbf{A} - \lambda\mathbf{I}) = 0$ :

\det \begin{bmatrix} 3 - \lambda & -2 \\ -2 & 7 - \lambda \end{bmatrix} = 0

(3 - \lambda)(7 - \lambda) - (-2)(-2) = 0

21 - 3\lambda - 7\lambda + \lambda^2 - 4 = 0

\lambda^2 - 10\lambda + 17 = 0

\lambda = \frac{10 \pm \sqrt{(-10)^2 - 4(1)(17)}}{2(1)}

\lambda = \frac{10 \pm \sqrt{100 - 68}}{2} = \frac{10 \pm \sqrt{32}}{2}

\lambda = 5 \pm 2\sqrt{2}

\lambda_1 = 5 + 2\sqrt{2} \approx 7.828, \lambda_2 = 5 - 2\sqrt{2} \approx 2.172

Also, another interesting property with the calculated eigenvalues, which is useful in machine learning, is that we can find the basis for the eigenspace associated with each eigenvector.Given our previously calculated eigenvalues $\lambda_1 = 5 + 2\sqrt{2}$ and $\lambda_2 = 5 - 2\sqrt{2}$ , we find the basis vectors by solving the homogeneous system $(A - \lambda I)v = 0$ .

\left[ \begin{array}{cc|c} 3 - \lambda_1 & -2 & 0 \\ -2 & 7 - \lambda_1 & 0 \end{array} \right]

(3 - (5 + 2\sqrt{2}))x_1 - 2x_2 = 0

(-2 - 2\sqrt{2})x_1 - 2x_2 = 0

If we let $x_1 = 1$ , then:

v_1 = \begin{bmatrix} 1 \\ -1 - \sqrt{2} \end{bmatrix}

With the same approach, $\lambda_2 = 5 - 2\sqrt{2}$ would result into:

v_2 = \begin{bmatrix} 1 \\ \sqrt{2} - 1 \end{bmatrix}

Given the fact that matrix $\mathbf{A}$ is symmetric, these basis vectors are guaranteed to be orthogonal to each other.

4.2 Positive Definite

A symmetric matrix $\mathbf{A} \in \mathbb{S}^n$ is positive definite iff for all non-zero vectors $\mathbf{x} \in \mathbb{R}^n$ , $\mathbf{x}^T \mathbf{A} \mathbf{x} > 0$ , It is positive semidefinite or psd if $\mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0$ is possible. A symmetric matrix $\mathbf{A} \in \mathbb{S}^n$ is negative definite iff for all non-zero $\mathbf{x} \in \mathbb{R}^n$ , $\mathbf{x}^T \mathbf{A} \mathbf{x} < 0$ , it is negative semidefinite if $\mathbf{x}^T \mathbf{A} \mathbf{x} \leq 0$ is possible. Finally, a symmetric matrix $\mathbf{A} \in \mathbb{S}^n$ is indefinite if there exist $\mathbf{x}_1, \mathbf{x}_2 \in \mathbb{R}^n$ such that $\mathbf{x}_1^T \mathbf{A} \mathbf{x}_1 > 0$ and $\mathbf{x}_2^T \mathbf{A} \mathbf{x}_2 < 0$ as summarized below:

\begin{array}{lcl} \text{Classification} & \text{Notation} & \text{Condition (for non-zero } \mathbf{x}) \\ \hline \text{Positive Definite} & \mathbf{A} \succ 0 & \mathbf{x}^T \mathbf{A} \mathbf{x} > 0 \\ \text{Positive Semidefinite} & \text{psd} & \mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0 \\ \text{Negative Definite} & \mathbf{A} \prec 0 & \mathbf{x}^T \mathbf{A} \mathbf{x} < 0 \\ \text{Negative Semidefinite} & & \mathbf{x}^T \mathbf{A} \mathbf{x} \leq 0 \\ \text{Indefinite} & & \exists \mathbf{x}_1^T \mathbf{A} \mathbf{x}_1 > 0, \mathbf{x}_2^T \mathbf{A} \mathbf{x}_2 < 0 \end{array}

Matrix Factorizations

Selecting a matrix factorization method to use for a given problem is mainly dictated by the dimensions and properties of the matrix. Matrices are usually categorized into square and rectangular matrices. For square matrices that are invertible and well-conditioned, methods such as Gauss-Jordan elimination or LU decomposition are preferred.However, if a square invertible matrix is ill-conditioned - meaning that small change in the input data result in disproportionately large changes in the output. In these instances, Singular Value Decomposition (SVD) or regularization techniques, such as Tikhonov regularization are used to stabilize the solution. For square singular matrices that are non-invertible, as well as rectangular matrices, the standard approach is the Moore-Penrose pseudoinverse ( $\mathbf{A}^+$ ). This is particularly useful for overdetermined systems, where the matrix has more rows than columns ( $m > n$ ), and underdetermined systems, where the matrix has more columns than rows ( $n > m$ ). In these cases, the pseudoinverse provides the best-fit solution in a least-squares sense, calculated as $\mathbf{x} = \mathbf{A}^+ \mathbf{b}$ . For an overdetermined system, this often involves the normal equations where $\mathbf{x} = (\mathbf{A}^\top \mathbf{A})^{-1} \mathbf{A}^\top \mathbf{b}$ .

\begin{array}{ccc} \text{Matrix Type} & \text{Property} & \text{Recommended Method} \\ \hline \text{Square} & \text{Invertible (Well-conditioned)} & \text{LU Decomposition / Gauss-Jordan} \\ \text{Square} & \text{Invertible (Ill-conditioned)} & \text{SVD / Tikhonov Regularization} \\ \text{Square} & \text{Singular (Non-invertible)} & \text{Pseudoinverse } (\mathbf{A}^+) \\ \text{Rectangular} & \text{Over / Underdetermined} & \text{Pseudoinverse } (\mathbf{A}^+) \end{array}

With that in mind, we shall now cover the key matrice factorization methods:

LU Decomposition

Given a square matrix $\mathbf{A}$ we want to rewrite it as $\mathbf{A} = \mathbf{L} \mathbf{U}$ where $\mathbf{L}$ is lower triangular matrix $\mathbf{L}\in\mathbb{R}^{n\times n}$ and $\mathbf{U}$ upper triangular matrix $\mathbf{U}\in\mathbb{R}^{n\times n}$ .

We follow an elimination process that transforms the original matrix into an upper triangular form while recording the operational history in a lower triangular matrix. For example given:

\mathbf{A} = \begin{bmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{bmatrix}

First, we focus on the first column and use the pivot element $A_{11} = 2$ to eliminate the entries in the rows below it. We first calculate the multipliers by dividing the leading element of each target row by the pivot, resulting in $l_{21} = 4 / 2 = 2$ for the second row and $l_{31} = 8 / 2 = 4$ for the third row. We use multipliers to construct the $\mathbf{L}$ matrix later. We then execute the row operations by subtracting two times the first row from the second row ( $R_2 \leftarrow R_2 - 2R_1$ ) and four times the first row from the third row ( $R_3 \leftarrow R_3 - 4R_1$ ). This results in the first intermediate matrix, where the first column below the diagonal consists of zeros:

\mathbf{A}^{(1)} = \begin{bmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{bmatrix}

We move to the second column use the new pivot at $A_{22} = 1$ . To eliminate the value below this pivot in the third row, we compute the multiplier $l_{32} = 3 / 1 = 3$ . We then perform the row operation $R_3 \leftarrow R_3 - 3R_2$ , which involves subtracting three times the current second row from the third row. This calculation results in $5 - (3 \times 1) = 2$ for the final element, completing the transformation into the upper triangular matrix $\mathbf{U}$ :

\mathbf{U} = \begin{bmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{bmatrix}

We then, construct the lower triangular matrix $\mathbf{L}$ , that stored the records of the elimination steps. We begin with an identity matrix and then populate the empty positions below the main diagonal using the multipliers from the previous steps. That is, we place $l_{21}=2$ and $l_{31}=4$ in the first column and $l_{32}=3$ in the second column resulting in lower triangular matrix:

\mathbf{L} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{bmatrix}

QR Decomposition

For QR decomposition, our objective is to factorize a rectangular matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ or a square matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ into the product of an orthogonal matrix $\mathbf{Q}$ and an upper triangular matrix $\mathbf{R}$ . As a reminder, a square matrix $B$ is orthogonal if $\mathbf{B}^{T}\mathbf{B} = \mathbf{I}$ and $\mathbf{BB}^{T} = \mathbf{I}$ . And also, $\mathbf{B}^{-1} = \mathbf{B}^{T}$ . The columns and also the rows of an orthogonal matrix are orthonormal, meaning, their 2-norm (Euclidean length) is 1 as shown below:

\begin{bmatrix} a & b \\ -b & a \\ \end{bmatrix}, \quad a^2 + b^2 = 1 \tag{21}

While the Gram-Schmidt process is often used to in maths textbooks to introduce the concept of orthogonalization, it is unstable because it quickly degrades due to rounding errors. NumPy implement QR factorization using Householder reflections since each reflection preserves the Euclidean norm of the matrix columns through the "sign choice" that avoid numerical cancellation.

The general idea is to iteratively applies these reflections to smaller and smaller submatrices until only the upper triangular form remains. First, we isolate the first column of $\mathbf{A}$ denoted as $\mathbf{x}$ and our goal is to find a reflection that maps $\mathbf{x}$ to a multiple of the first unit vector $\mathbf{e}_1$ , thus zeroing out all entries below the first row. We compute the Householder vector $\mathbf{v}_1 = \mathbf{x} + \text{sign}(x_1)\|\mathbf{x}\|_2 \mathbf{e}_1$ . Using this vector, we construct the first Householder matrix $\mathbf{H}_1 = \mathbf{I} - 2\frac{\mathbf{v}_1 \mathbf{v}_1^\top}{\mathbf{v}_1^\top \mathbf{v}_1}$ .

Multiplying the original matrix by this reflection, the first column of the resulting matrix $\mathbf{H}_1 \mathbf{A}$ becomes:

\mathbf{H}_1 \mathbf{A} = \begin{bmatrix} -\text{sign}(x_1)\|\mathbf{x}\| & \dots \\ 0 & \dots \\ 0 & \dots \end{bmatrix} \tag{22}

We repeat the process to the submatrix formed by ignoring the first row and first column of the transformed matrix resulting into $(m-1) \times (n-1)$ submatrix. We calculate a new Householder vector $\mathbf{v}_2$ and a corresponding reflection matrix $\hat{\mathbf{H}}_2$ . To apply this to the full matrix, we put it into a larger identity structure $\mathbf{H}_2 = \text{diag}(1, \hat{\mathbf{H}}_2)$ which ensures that the second reflection zeros out the elements below the diagonal in the second column while leaving the zeros created in the first column.

This process continues for all columns until the matrix is fully reduced to an upper triangular form $\mathbf{R}$ . The final matrix $\mathbf{R}$ is the result of applying the sequence of all Householder reflections to the original matrix, such that $\mathbf{R} = \mathbf{H}_n \dots \mathbf{H}_2 \mathbf{H}_1 \mathbf{A}$ .

Because Householder matrices are orthogonal and symmetric ( $\mathbf{H}_k = \mathbf{H}_k^\top = \mathbf{H}_k^{-1}$ ), we can solve for $\mathbf{Q}$ by multiplying the reflections in reverse order. The orthogonal matrix $\mathbf{Q}$ is thus defined as $\mathbf{Q} = \mathbf{H}_1 \mathbf{H}_2 \dots \mathbf{H}_n$ . This guarantees that $\mathbf{Q}^\top \mathbf{Q} = \mathbf{I}$ and that the final product $\mathbf{QR}$ reconstructs the original matrix $\mathbf{A}$ .

Eigenvalues and Eigenvectors

Given a square matrix $\mathbf{A}$ , for the eigens, our goal is to look for directions where the linear transformation just acts as simple scaling without change of direction. That is, $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$ . The $\mathbf{v}$ represents the eigenvector (also referred to as a characteristic or latent vector), which is the direction that remains invariant under the transformation. The scalar $\lambda$ is the eigenvalue, which determines the factor by which that vector is scaled. To solve for these values, we rearrange the equation into a homogeneous linear system:

(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0} \tag{22}

For a non-trivial solution $\mathbf{v} \neq \mathbf{0}$ , the matrix $(\mathbf{A} - \lambda\mathbf{I})$ must be singular. That is the determinant of the matrix is $0$ , resulting into the characteristic equation:

\det(\mathbf{A} - \lambda\mathbf{I}) = 0 \tag{23}

Solving this polynomial equation provides the eigenvalues. Once these are known, each corresponding eigenvector is got by solving the resulting linear system for $\mathbf{v}$ .

Consider matrix $\mathbf{A}$ :

\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

First, we subtract $\lambda$ from the main diagonal elements of $\mathbf{A}$ :

\mathbf{A} - \lambda\mathbf{I} = \begin{bmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{bmatrix}

We follow by calculating the determinant of this matrix:

\det(\mathbf{A} - \lambda\mathbf{I}) = (4 - \lambda)(3 - \lambda) - (1)(2)

This results in the characteristic polynomial:

(12 - 4\lambda - 3\lambda + \lambda^2) - 2 = 0

\lambda^2 - 7\lambda + 10 = 0

We then factor by looking for two numbers that multiply to $10$ and add to $-7$ :

(\lambda - 5)(\lambda - 2) = 0

Setting each factor to zero, we find the eigenvalues for the matrix $\mathbf{A}$ :

\lambda_1 = 5, \quad \lambda_2 = 2

We can verify this with either the trace and determinant trick: ( $4 + 3 = 7$ and $\lambda_1 + \lambda_2 = 5 + 2 = 7$ ) or ( $(4 \times 3) - (1 \times 2) = 10$ and $\lambda_1 \times \lambda_2 = 5 \times 2 = 10$ .)

Eigen Decomposition

Eigendecomposition (also known as eigenvalue decomposition, spectral decomposition, or diagonalization) of a square matrix is a factorization of the form $\mathbf{A}^n = \mathbf{Q}\mathbf{\Lambda}^n\mathbf{Q}^{-1}$ . $\mathbf{Q}^{-1}$ is an invertible linear transformation, $\mathbf{\Lambda}$ is a scaling operation, and the inverse of the initial transformation $\mathbf{Q}$ . The matrix $\mathbf{Q}$ must be invertible to allow for the change of basis, and $\mathbf{\Lambda}$ is a diagonal matrix $\text{diag}(\vec{\lambda})$ where the diagonal entries are the eigenvalues. When the eigenvalues in $\mathbf{\Lambda}$ are negative, the eigenvectors in $\mathbf{Q}$ are therefire not required to be orthogonal.

Remember the eigenvalues $\lambda_1 = 5$ and $\lambda_2 = 2$ and their corresponding eigenvectors to calculate $\mathbf{A}^{10}$ for our previous example matrix $\mathbf{A} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$ . We would first solve for the eigenvectors by substituting the eigenvalues back into $(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$ . For $\lambda_1 = 5$ , this would result in $\begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix} \mathbf{v}_1 = \mathbf{0}$ , which yields $\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ . And for $\lambda_2 = 2$ , $\begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix} \mathbf{v}_2 = \mathbf{0}$ , which would be $\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}$ . Therefore the matrices $\mathbf{Q}$ and $\mathbf{\Lambda}$ would be:

\mathbf{Q} = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}, \quad \mathbf{\Lambda} = \begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix}

Applying the power to the diagonal elements of $\mathbf{\Lambda}$ would result in:

\mathbf{A}^{10} = \mathbf{Q} \begin{bmatrix} 5^{10} & 0 \\ 0 & 2^{10} \end{bmatrix} \mathbf{Q}^{-1}

Singular Value Decomposition (SVD)

Let first start with stating the wikipedia equation of SVD which is a factorization of an $m \times n$ complex matrix $\mathbf{M}$ into the form $\mathbf{M} = \mathbf{U\Sigma V^*}$ for complex matrices, or $\mathbf{M} = \mathbf{U\Sigma V^\top}$ if the matrix consists of real numbers. This definition generalizes eigendecomposition to any matrix, regardless of its shape or properties. A matrix is considered unitary if its conjugate transpose is also its inverse, satisfying the property $\mathbf{U}^*\mathbf{U} = \mathbf{I}$ . The matrix $\mathbf{U}$ is an $m \times m$ unitary matrix. It is also the left singular vectors that form an orthonormal basis of the output space. $\mathbf{V}$ is a $n \times n$ unitary matrix too, and its columns (equivalently, the rows of $\mathbf{V}^\top$ ) are the right singular vectors forming an orthonormal basis of the input space. $\mathbf{\Sigma}$ is an $m \times n$ diagonal matrix. The matrix $\mathbf{\Sigma} \in \mathbb{R}^{m \times n}$ is diagonal and contains non-negative singular values, ordered as $\sigma_1 \ge \sigma_2 \ge \cdots \ge 0$ .

\begin{array}{cccc} \text{Component} & \text{Dimension} & \text{Property} & \text{Role} \\ \hline \mathbf{U} & m \times m & \text{Orthogonal} & \text{Left-singular vectors (Output Basis)} \\ \mathbf{\Sigma} & m \times n & \text{Diagonal} & \text{Singular values (Scaling factors)} \\ \mathbf{V}^\top & n \times n & \text{Orthogonal} & \text{Right-singular vectors (Input Basis)} \end{array}

The numpy decomposition is performed using LAPACK routine which I fully don't understand, if you are interested in doing it on paper, the power method much more simple and is the basis for most of the advanced SVD algorithnms.

Cholesky Decomposition

Given a symmetric positive definite matrix $\mathbf{A}$ , Cholesky decomposition or "matrix square root" is a factorization $\mathbf{A} = \mathbf{R}^\top\mathbf{R}$ , where $\mathbf{R}$ is an upper triangular matrix. This factorization can also be rewritten as $\mathbf{A} = \mathbf{LL}^\top$ where $\mathbf{L} = \mathbf{R}^\top$ . It is always commonly used in the multivariate Gaussian.

Consider for example:

\mathbf{A} = \begin{bmatrix} 3.3821 & 0.8784 & 0.3613 & -2.0349 \\ 0.8784 & 2.0068 & 0.5587 & 0.1169 \\ 0.3613 & 0.5587 & 3.6656 & 0.7807 \\ -2.0349 & 0.1169 & 0.7807 & 2.5397 \end{bmatrix}

We start by iteratively solving for the lower triangular matrix $\mathbf{L}$ by calculating the square root of the first diagonal element, $L_{11} = \sqrt{3.3821} \approx 1.8390$ and the remaining entries in this column are found by dividing the corresponding elements of $\mathbf{A}$ by $L_{11}$ , resulting in $L_{21} = 0.8784 / 1.8390 \approx 0.4776$ , $L_{31} = 0.3613 / 1.8390 \approx 0.1965$ , and $L_{41} = -2.0349 / 1.8390 \approx -1.1065$ .

For the second column, we take into consideration what has happened in the first column before solving for the new diagonal. We calculate $L_{22} = \sqrt{2.0068 - (0.4776)^2} \approx 1.3337$ . The off-diagonal elements are then determined by $L_{32} = (0.5587 - (0.1965)(0.4776)) / 1.3337 \approx 0.3486$ and $L_{42} = (0.1169 - (-1.1065)(0.4776)) / 1.3337 \approx 0.4839$ .

In the third column, we continue the subtraction process to find $L_{33} = \sqrt{3.6656 - (0.1965^2 + 0.3486^2)} \approx 1.8722$ and the subsequent value $L_{43} = (0.7807 - ((-1.1065)(0.1965) + (0.4839)(0.3486))) / 1.8722 \approx 0.4441$ . The final element is derived by subtracting the squares of all preceding row elements in $\mathbf{L}$ from the last diagonal of $\mathbf{A}$ , giving $L_{44} = \sqrt{2.5397 - ((-1.1065)^2 + 0.4839^2 + 0.4441^2)} \approx 0.9413$ .

This results in the lower triangular matrix $\mathbf{L}$ :

\begin{bmatrix} 1.8390 & 0 & 0 & 0 \\ 0.4776 & 1.3337 & 0 & 0 \\ 0.1965 & 0.3486 & 1.8722 & 0 \\ -1.1065 & 0.4839 & 0.4441 & 0.9413 \end{bmatrix}

With the lower triangular matrix $\mathbf{L}$ and $\mathbf{LL}^\top$ , we can verify the original matrix $\mathbf{A}$ as:

\mathbf{L}\mathbf{L}^\top = \begin{bmatrix} 1.8390 & 0 & 0 & 0 \\ 0.4776 & 1.3337 & 0 & 0 \\ 0.1965 & 0.3486 & 1.8722 & 0 \\ -1.1065 & 0.4839 & 0.4441 & 0.9413 \end{bmatrix} \begin{bmatrix} 1.8390 & 0.4776 & 0.1965 & -1.1065 \\ 0 & 1.3337 & 0.3486 & 0.4839 \\ 0 & 0 & 1.8722 & 0.4441 \\ 0 & 0 & 0 & 0.9413 \end{bmatrix}

= \begin{bmatrix} 3.3821 & 0.8784 & 0.3613 & -2.0349 \\ 0.8784 & 2.0068 & 0.5587 & 0.1169 \\ 0.3613 & 0.5587 & 3.6656 & 0.7807 \\ -2.0349 & 0.1169 & 0.7807 & 2.5397 \end{bmatrix}

Some of the numpy functions for some of the linear algebra operations covered in this post include:

numpy.dot, numpy.inner, numpy.outer, numpy.matmul, numpy.tensordot, numpy.einsum, numpy.kron,
numpy.linalg.cholesky, numpy.linalg.qr, numpy.linalg.svd, numpy.linalg.eig,
numpy.linalg.eigvals, numpy.linalg.norm, numpy.linalg.cond, numpy.linalg.det,
numpy.linalg.matrix_rank, numpy.trace, numpy.linalg.solve, numpy.linalg.tensorsolve,
numpy.linalg.lstsq, numpy.linalg.inv, numpy.linalg.pinv, numpy.linalg.tensorinv,
numpy.diagonal

References:

Margalit, D., & Rabinoff, J. (2019). Interactive Linear Algebra. Georgia Institute of Technology. Retrieved from https://textbooks.math.gatech.edu/ila/ila.pdf
Mathuranathan. (2013). Cholesky factorization and MATLAB code. Gaussian Waves. Retrieved from https://www.gaussianwaves.com/2013/05/cholesky-factorization-and-matlab-code/

Linear Algebra Concepts for Data Science and Machine Learning

Vector and Matrix Multiplication:

Vector-Vector Multiplication:

Matrix- Vector Multiplication:

Matrix-Matrix Multiplication:

Kronecker Product:

Einstein Summation:

Vector Spaces

Span, Basis, and Dimension

Fundamental subspaces

Null Space

Column Space

Row Space

Matrix Inverses

Some Special Matrices

Diagonal Matrix

Symmetric and Skew-Symmetric Matrices

Orthogonal Matrix

4. Positive Definite Matrices

4.1 quadratic forms and eigenvalues

4.2 Positive Definite

Matrix Factorizations

LU Decomposition

QR Decomposition

Eigenvalues and Eigenvectors

Eigen Decomposition

Singular Value Decomposition (SVD)

Cholesky Decomposition

References:

Previous Article

Next Article