Linear Algebra and Calculus

The Nature of Mathematical Modeling (draft)

$ \def\%#1{\mathbf{#1}} \def\mat#1{\mathbf{#1}} \def\*#1{\vec{#1}} \def\ve#1{\vec{#1}} \def\ds#1#2{\cfrac{d #1}{d #2}} \def\dd#1#2{\cfrac{d^2 #1}{d #2^2}} \def\ps#1#2{\cfrac{\partial #1}{\partial #2}} \def\pp#1#2{\cfrac{\p^2 #1}{\p #2^2}} \def\p{\partial} \def\ba{\begin{eqnarray}} \def\ea{\end{eqnarray}} \def\eps{\epsilon} \def\del{\nabla} $

Background

  • Assumed for the rest of the text

Variables

integers

  • the set $\mathbb{Z}$
  • $\ldots,-1,0,1,\dots$

real numbers

  • the set $\mathbb{R}$
  • 1.0, 1/3, $\pi,\ldots$

complex numbers

  • the set $\mathbb{C}$
  • $z = x+iy$ = real + imaginary parts
  • $i^2=-1$
  • complex conjugate $z^*=x-iy$
  • magnitude $zz^*=x^2+y^2$
  • Euler's formula:
import sympy as sp
from sympy import I
from IPython.display import display,Math
theta = sp.symbols('theta',real=True)
z = sp.exp(I*theta)
display(Math(f"e^{{i\\theta}}={sp.latex(sp.expand_complex(z))}"))
$\displaystyle e^{i\theta}=i \sin{\left(\theta \right)} + \cos{\left(\theta \right)}$

quaternions

  • generalization of complex numbers
  • $q = a+bi+cj+dk$
  • $i^2=j^2=k^2=ijk=-1$
  • useful to represent 3D rotations

scalar

  • 0D, e.g. position $x$

vector

  • 1D, e.g. 3D point $\*p=(x,y,z)$
  • transpose $\*p^T=\begin{pmatrix}x\\y\\z\end{pmatrix}$

matrix

  • 2D, e.g. collection of points $\%M= \begin{bmatrix} x_1 & x_2 & x_3\\ y_1 & y_2 & y_3 \\ z_1 & z_2 & z_3 \end{bmatrix} $
  • transpose $\%M^T = \begin{bmatrix} x_1 & y_1 & z_1\\ x_2 & y_2 & z_2 \\ x_3 & y_3 & z_3 \end{bmatrix} $

  • identity matrix $\%I = \begin{pmatrix} 1 & 0 & \cdots \\ 0 & 1 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} $

tensor

  • $>$2D, generalizes matrices to higher dimensions

Polynomials

  • univariate
$$P(x) = c_0+c_1x+c_2x^2+\ldots$$
  • multivariate
$$P(x,y) = c_00+c_{01}x+c_{10}y+c_{11}xy+c_{02}x^2+c_{20}y^2+\ldots$$
  • polynomials will appear throughout this book
  • an important task will be finding the roots where $P(x)=0$
  • roots can be repeated, which is counted by their multiplicity

routines

  • numerical: NumPy roots
import numpy as np
c0 = 2
c1 = 10
c2 = 3
roots = np.roots([c2,c1,c0])
print(f"roots of P(x) = {c2}x^2+{c1}x+{c0}: {roots}")
def P(x):
    return c0+c1*x+c2*x**2
print(f"P(roots): {[P(root) for root in roots]}")
roots of P(x) = 3x^2+10x+2: [-3.12 -0.21]
P(roots): [np.float64(-3.552713678800501e-15), np.float64(8.326672684688674e-17)]
from IPython.display import display,Math
import sympy as sp
from sympy.abc import a,b,c,x
f = a+b*x+c*x**2
display(Math(f"\\mathrm{{polynomial:}}~~{sp.latex(f)}"))
result = sp.solve(f,x)
display(Math(f"\\mathrm{{roots:}}~~{sp.latex(result)}"))
values = {a:2,b:10,c:3}
roots = [root.subs(values) for root in result]
evals = [root.evalf() for root in roots]
display(Math(f"\\mathrm{{roots~of~}}{sp.latex(f.subs(values))}:~{evals}"))
$\displaystyle \mathrm{polynomial:}~~a + b x + c x^{2}$ $\displaystyle \mathrm{roots:}~~\left[ \frac{- b - \sqrt{- 4 a c + b^{2}}}{2 c}, \ \frac{- b + \sqrt{- 4 a c + b^{2}}}{2 c}\right]$ $\displaystyle \mathrm{roots~of~}3 x^{2} + 10 x + 2:~[-3.11963298118022, -0.213700352153109]$

Norms

L2

$$|\*a| = |\*a|_2 = \sqrt{\sum_i a_i^2}$$
  • Equals the familiar Euclidean distance

L1

$$|\*a|_1 = \sum_i |a_i|$$
  • Equals the Manhattan norm, measuring distance in city blocks
  • Will be important in machine learning

unit vector

$$\hat{a} = \dfrac{\*a}{|\*a|}$$

Determinant

  • Equal to:

    • the volume of a parallelepiped formed by the rows or columns of a square matrix
    • the scaling factor of a volume multiplied by a matrix defining a linear transformation
  • 2D

$$ \det \begin{pmatrix} a & b \\ c & d \end{pmatrix}= \begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad-bc$$
  • 3D
$$ \det \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix}= \begin{vmatrix} a & b & c \\ d & e & f \\ g & h & i \end{vmatrix} = a(ei-fh)-b(di-fg)+c(dh-eg) $$
  • D dimensions
$$ \det(\mathbf{M}) = \sum_{i_1,\ldots,i_D} \epsilon_{i_1\ldots i_d} m_{1,i_1}\cdots m_{D,i_D} $$
  • $\epsilon_{i_1\ldots i_d}$ is the Levi-Civita symbol, equal to 1 for even permutations of the indices, -1 for odd permutations, 0 otherwise

Products

dot (inner)

$$ \begin{eqnarray} \*a\cdot\*b &=& \sum_i a_i b_i \\ &=& |\*a||\*b|\cos\theta \\ &=& \*a^T\*b \end{eqnarray} $$
  • Measures the overlap between vectors, where $\theta$ is the angle between them
  • NumPy notation: a@b

cross

$$ \begin{eqnarray} \vec{a}\times\vec{b} &=& \begin{vmatrix} \hat{x} & \hat{y} & \hat{z} \\ a_x & a_y & a_z \\ b_x & b_y & b_z \end{vmatrix} \\ &=& |\vec{a}||\vec{b}|\sin(\theta)\hat{n} \end{eqnarray} $$
  • Measures the area of a parallelogram bounded by the vectors, where $\theta$ is the angle between them
  • $\hat{n}$ is a perpendicular unit vector; if your right-hand fingers curl from $\vec{a}$ to $\vec{b}$ your thumb points in the direction of $\hat{n}$

  • NumPy routine: cross

outer

$$ \begin{eqnarray} (\*a\otimes \*b)_{ij} &=& u_i v_j \\ \*a\otimes \*b &=& \*a\*b^T \end{eqnarray} $$
  • NumPy routine: outer ### matrix
$$ \begin{eqnarray} \%C &=& \%A\%B \\ c_{ij} &=& \sum_k a_{ik}b_{kj} \end{eqnarray} $$
  • NumPy notation: A@B

Inverse

$$\%M~\%M^{-1} = \%I$$
  • Limited to square matrices
  • Can be used to solve a linear system of equations
$$ \begin{eqnarray} \%M~\*a &=& \*b \\ \*a &=& \%M^{-1}\*b \end{eqnarray} $$
  • The determinant of the inverse of a matrix is equal to the inverse of its determinant. If the determinant vanishes then the matrix can not be inverted, which corresponds to a matrix transformation that reduces the dimension of the vector space.

algorithm

  • Can be found by Gaussian elimination (row reduction)
  • Is computationally expensive, requiring $O(N^3)$ operations
  • Can be simplified for sparse matrices, which frequently occur in numerical methods

routines

  • numerical: NumPy inv
import numpy as np
np.set_printoptions(suppress=True,precision=2)
M = np.array([[1,2],[3,4]])
print(f"M:\n{M}")
Minv = np.linalg.inv(M)
print(f"M inverse:\n{Minv}")
print(f"product:\n{M@Minv}")
M:
[[1 2]
 [3 4]]
M inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]
product:
[[1. 0.]
 [0. 1.]]
  • symbolic: SymPy inv
import sympy as sp
from sympy.abc import a,b,c,d
from IPython.display import display,Math
M = sp.Matrix([[a,b],[c,d]])
Minv = M.inv()
display(Math(f"\\%M={sp.latex(M)}"))
display(Math(f"\\%M^{{-1}}={sp.latex(Minv)}"))
display(Math(f"\\mathrm{{\\%M\\%M^{{-1}}=}}{sp.latex(M*Minv)}"))
display(Math(f"\\mathrm{{simplify:}}{sp.latex(sp.simplify(M*Minv))}"))
$\displaystyle \%M=\left[\begin{matrix}a & b\\c & d\end{matrix}\right]$ $\displaystyle \%M^{-1}=\left[\begin{matrix}\frac{d}{a d - b c} & - \frac{b}{a d - b c}\\- \frac{c}{a d - b c} & \frac{a}{a d - b c}\end{matrix}\right]$ $\displaystyle \mathrm{\%M\%M^{-1}=}\left[\begin{matrix}\frac{a d}{a d - b c} - \frac{b c}{a d - b c} & 0\\0 & \frac{a d}{a d - b c} - \frac{b c}{a d - b c}\end{matrix}\right]$ $\displaystyle \mathrm{simplify:}\left[\begin{matrix}1 & 0\\0 & 1\end{matrix}\right]$

Eigenvectors, Values

  • eigen in German means own, characteritic, proper
  • Named by [Hilbert:1904]

definition

  • An eigenvector $\*v$ of a matrix $\%M$, when mutiplied by the matrix, is equal to itself times an eigenvalue $\lambda$:
$$\%M\*v = \lambda \*v$$

interpretation

  • Multiplying a vector times a matrix is a linear transformation; the transformation of an eigenvector can change its scale, but does not rotate or shear the vector

algorithm

$$ \begin{eqnarray} \%M\*v &=& \lambda\*v \\ \%M\*v &=& \lambda\%I\*v \\ (\%M-\lambda\%I)\*v &=& 0 \end{eqnarray} $$

If $\%M-\lambda\%I$ can be inverted then:

$$ \begin{eqnarray} \*v &=& \left(\%M-\lambda\%I\right)^{-1} 0 \\ &=& 0 \end{eqnarray} $$

Therefore:

$$|\%M-\lambda\%I|=0$$

$|\%M-\lambda\%I|$ is called the characteristic polynomial, and its roots give the eigenvalues $\lambda_i$. The eigenvectors are then found by solving $\%M\*v = \lambda_i \*v$ for each eigenvalue, with an arbitrary scaling of their magnitude.

The product of the eigenvalues is equal to the determinant of a matrix.

diagonalization

If $\%V$ is a matrix with the eigenvectors of $\%M$ as its columns, and $\%\Lambda$ is a matrix with the eigenvalues of $\%M$ on its diagonal, then:

$$\%M\%V = \%V\%\Lambda$$

therefore:

$$\%V^{-1}\%M\%V = \%\Lambda$$

i.e. $\%M$ is transformed into a diagonal matrix. This defines a transformation to an independent set of coordinates, which we'll use for finding normal modes and principal components.

routines

  • numerical: NumPy eig
import numpy as np
M = np.array([[1,2],[3,4]])
vals,vects = np.linalg.eig(M)
print(f"M:\n{M}")
print(f"eigenvalues:\n{vals}")
print(f"eigenvectors:\n{vects}")
M:
[[1 2]
 [3 4]]
eigenvalues:
[-0.37  5.37]
eigenvectors:
[[-0.82 -0.42]
 [ 0.57 -0.91]]
import sympy as sp
from IPython.display import display,Math,Markdown
#sp.init_printing()
a,b,c,d = sp.symbols('a b c d')
M = sp.Matrix([[a,b],[c,d]])
result = M.eigenvects()
display(Math(f"M:{sp.latex(M)}"))
display(Markdown("eigenvalues, multiplicity, eigenvectors:"))
display(Math(sp.latex(result[0])))
display(Math(sp.latex(result[1])))
$\displaystyle M:\left[\begin{matrix}a & b\\c & d\end{matrix}\right]$

eigenvalues, multiplicity, eigenvectors:

$\displaystyle \left( \frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} - \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$ $\displaystyle \left( \frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}, \ 1, \ \left[ \left[\begin{matrix}- \frac{d}{c} + \frac{\frac{a}{2} + \frac{d}{2} + \frac{\sqrt{a^{2} - 2 a d + 4 b c + d^{2}}}{2}}{c}\\1\end{matrix}\right]\right]\right)$

Singular Vectors, values

  • Linear problem $\%M\*x=\*b$

  • The space of $\*b$ for which solvable is the range

  • The dimension of the range is the rank

  • $\%M\*x=0$ is the nullspace

  • If there is no nullspace the matrix is full rank

Singular Value Decomposition (SVD)

  • For an $H\times W$ matrix $\%M$, the singular value decomposition is:
$$\%M = \%U\%W\%V^T$$
  • $\%U$ and $\%V$ are orthonormal matrices: $\%U^T\%U = \%V\%V^T = \%I$

  • $\%I$ is $W\times W$ identity matrix

  • $\%W$ is a diagonal matrix with elements $w_i$, the singular values

  • The columns of $\%U$ associated with nonzero singular values form orthonormal basis for the range of $\%M$

  • The columns ov $\%V$ associated with zero singular values form an orthonormal basis for the nullspace of $\%M$

  • The singular values give the lengths of the principal axes of the hyper-ellipsoid defined by $\%M\*x$ where $\*x$ lies on the hyper-sphere $|\*x|^2=1$

  • NumPy routine: svd


Pseudoinverse

  • Singular and rectangular matrices can not be exactly inverted
  • The pseudoinverse $\%M^+$ applied to solving a linear system of equations has two cases:
    • if the equations are over-determined, so that there is no solution, it find the one the minimizes the least-squares error
    • if the equations are under-determined, so that there are many solutions, if finds the one with the minimum norm
  • The pseudoinverse is also called the Moore-Penrose or generalized inverse
  • It can be found with the SVD: $$\*x = \%V\%W^{-1}\%U^T\*b$$
    • In $\%W^{-1}$ set 1/0=0 for vanishing singular values (this is explained in Problem 2.)
  • NumPy routines: pinv, lstsq

Derivatives

definition

  • The derivative of a function measures the slope:
$$\ds{f(x)}{x} = \lim_{dx\rightarrow 0}\cfrac{f(x+dx)-f(x)}{dx}$$
  • The second derivative measures the curvature:
$$\cfrac{d^2f(x)}{dx^2} = \cfrac{d}{dx}\cfrac{df(x)}{dx}$$
  • Derivatives of variables with respect to time can be notated with dots:
$$\cfrac{dx}{dt} = \dot{x} ~~~ \cfrac{d^2x}{dt^2} = \ddot{x}$$
  • Derivatives of functions can be notated with a quotation mark:
$$\cfrac{df(x)}{dx} = f'(x) ~~~ \cfrac{d^2f(x)}{dx^2} = f''(x)$$
  • The derivative $f(x)$ evaluated at $x_0$ can be notated as:
$$\left.\ds{f(x)}{x}\right|_{x_0}$$

Taylor series

  • Derivatives can be used to expand a function $f(x)$ around a point $x_0$ as a polynomial in the displacement:
$$f(x_0+h) = f(x_0)+h\left.\ds{f}{x}\right|_{x_0}+\cfrac{h^2}{2}\left.\dd{f}{x}\right|_{x_0}+O(h^3)$$

where $O(h^3)$ means terms of order $h^3$ and higher. This is called a Taylor series; it can be verified by taking derivatives with respect to $h$ of both sides and evaluating at $h=0$. These will be important in the numerical methods to come.

import sympy as sp
from sympy.abc import x
from IPython.display import display,Math,Markdown
f = sp.exp(x)
series = sp.series(f,x,0,6)
display(Math(f"e^x={sp.latex(series)}"))
$\displaystyle e^x=1 + x + \frac{x^{2}}{2} + \frac{x^{3}}{6} + \frac{x^{4}}{24} + \frac{x^{5}}{120} + O\left(x^{6}\right)$

partial derivatives

  • The partial derivative of a function with multiple variables measures the slope in the direction of selected variables:
$$\ps{f(x,y)}{x} = \lim_{dx\rightarrow 0}\cfrac{f(x+dx,y)-f(x,y)}{dx}$$$$\cfrac{\partial^2 f(x,y)}{\partial y\partial x} = \ps{}{x}\ps{f(x,y)}{y}$$

chain rule

$$\ds{f\left(g(x)\right)}{x} = \ds{f}{g}\ds{g}{x}$$

symbolic differentiation

import sympy as sp
from sympy.abc import x,y,k,N
from IPython.display import display,Math
display(Math(f"\\dfrac{{d}}{{dx}}x^N=\
   {sp.latex(sp.simplify(sp.diff(x**N,x)))}"))
$\displaystyle \dfrac{d}{dx}x^N= N x^{N - 1}$
display(Math(f"\\dfrac{{d}}{{dx}}e^{{kx}}=\
   {sp.latex(sp.diff(sp.exp(k*x),x))}"))
$\displaystyle \dfrac{d}{dx}e^{kx}= k e^{k x}$
display(Math(f"\\dfrac{{d}}{{dx}}\\sin(x)=\
   {sp.latex(sp.diff(sp.sin(x)))}"))
$\displaystyle \dfrac{d}{dx}\sin(x)= \cos{\left(x \right)}$
display(Math(f"\\dfrac{{d^2}}{{dx^2}}\\sin(x)=\
   {sp.latex(sp.diff(sp.diff(sp.sin(x))))}"))
$\displaystyle \dfrac{d^2}{dx^2}\sin(x)= - \sin{\left(x \right)}$
display(Math(f"\\dfrac{{\\partial}}{{\\partial x}}\\sin(x^2y)=\
   {sp.latex(sp.diff(sp.sin(x**2*y),x))}"))
$\displaystyle \dfrac{\partial}{\partial x}\sin(x^2y)= 2 x y \cos{\left(x^{2} y \right)}$

del or nabla operator

$$\nabla = \left(\ps{}{x_1},\ldots,\ps{}{x_D}\right)$$

grad

$$\nabla f(x,y,z) = \ps{f}{x}\hat x+\ps{f}{y}\hat y+\ps{f}{z}\hat x$$
  • Grad (or gradient) points in the direction of steepest increase, with the magnitude giving the slope
import sympy as sp
from sympy.vector import CoordSys3D,gradient
from IPython.display import display,Math
p = CoordSys3D('p')
f = p.x**2+p.y**2+p.z**2
display(Math(f"\\nabla\\cdot(x^2+y^2+z^2)=\
   {sp.latex(gradient(f))}"))
$\displaystyle \nabla\cdot(x^2+y^2+z^2)= \left(2 \mathbf{{x}_{p}}\right)\mathbf{\hat{i}_{p}} + \left(2 \mathbf{{y}_{p}}\right)\mathbf{\hat{j}_{p}} + \left(2 \mathbf{{z}_{p}}\right)\mathbf{\hat{k}_{p}}$

div

$$\begin{eqnarray} \nabla\cdot\*f(\*x) &=& \nabla\cdot\left(a(\*x),b(\*x),c(\*x)\right)\\ &=& \ps{a}{x}+\ps{b}{y}+\ps{c}{z} \end{eqnarray} $$
  • Div (or divergence) measures whether a vector field is expanding or contracting at a point

curl

$$ \begin{eqnarray} \nabla\times\*f(\*x) &=& \nabla\times(a(\*x),b(\*x),c(\*x))\\ &=& \begin{vmatrix} \hat x&\hat y&\hat z\\ \ps{}{x}&\ps{}{y}&\ps{}{z}\\ a&b&c \end{vmatrix} \end{eqnarray} $$
  • The curl measures the circulation around a point

Integrals

  • The indefinite integral is the opposite of the derivative:
$$\int \cfrac{df(x)}{dx}~dx = f(x)$$
  • The definite integral is evaluted between limits:
$$\int_a^b \cfrac{df(x)}{dx}~dx = f(b)-f(a)$$
  • The definite integral gives the area under the curve of a function, and is equal to the limit of breaking it up into a sum over the area of infinitesimal rectangles:
$$\int_a^b f(x)~dx = \lim_{N\rightarrow \infty}\sum_{i=0}^N f(x_i)~dx$$

where $x_i = a+(b-a)i/N$ and $dx = (b-a)/N$

  • Integration by parts will be useful in calculations to come:
$$\int_a^b f(x)\ds{g(x)}{x}~dx = f(b)g(b)-f(a)g(a)-\int_a^b\ds{f(x)}{x}g(x)~dx$$
  • Convlution will be useful in filtering:
$$(f*g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t-\tau)~d\tau = \int_{-\infty}^{\infty} f(t-\tau)g(\tau)~d\tau$$

symbolic integration

import sympy as p
from IPython.display import display,Math
x,y = sp.symbols('x,y')
display(Math(f"\\int x^2~dx=\
   {sp.latex(sp.integrate(x**2))}"))
$\displaystyle \int x^2~dx= \frac{x^{3}}{3}$
display(Math(f"\\int_1^{{10}} x^2~dx=\
   {sp.latex(sp.integrate(x**2,(x,1,10)))}"))
$\displaystyle \int_1^{10} x^2~dx= 333$

References

  • [Hilbert:1904] David Hilbert, Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. (Erste Mitteilung) (Outline of a General Theory of Linear Integral Equations. (First Communication)), Nachrichten der Wissenschaftlichen Gesellschaft zu Göttingen (1904, pp. 49-91)
    • Foundational work
  • [Strang:1986] Strang, Gilbert. Introduction to applied mathematics. Vol. 16. Wellesley, MA: Wellesley-Cambridge Press, 1986.
    • Classic introduction

Problems

  1. Prove Euler's formula by comparing Taylor series of both sides.
  2. Use a numerical example to verify that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems.
  3. Harder: prove that the SVD pseudoinverse gives the minimum norm solution for underdetermined problems and the minimum residual for overdetermined problems. Consider what happens when a vector is added to the optimal solution.

(c) Neil Gershenfeld 1/23/26