Finite Elements¶
The Nature of Mathematical Modeling (draft)¶
$ \def\%#1{\mathbf{#1}} \def\mat#1{\mathbf{#1}} \def\*#1{\vec{#1}} \def\ve#1{\vec{#1}} \def\ds#1#2{\cfrac{d #1}{d #2}} \def\dd#1#2{\cfrac{d^2 #1}{d #2^2}} \def\ps#1#2{\cfrac{\partial #1}{\partial #2}} \def\pp#1#2{\cfrac{\p^2 #1}{\p #2^2}} \def\p{\partial} \def\ba{\begin{eqnarray*}} \def\ea{\end{eqnarray*}} \def\eps{\epsilon} \def\del{\nabla} \def\disp{\displaystyle} $
We have seen how to use finite differences to approximate partial differential equations on a lattice, and how to analyze and improve the stability and accuracy of these approximations. As powerful as these ideas are, there are two important cases where they do not directly apply: problems that are most naturally described in terms of a spatially inhomogeneous grid, and problems that are posed in terms of a variational principle. For example, in studying the deformations of an auto body, it can be most natural to describe it in terms of finding the minimum energy configuration instead of a partial differential equation, and for computational efficiency it is certainly important to match the location of the solution nodes to the shape of the body.
These limitations with finite differences can be solved by the use of finite element methods. They start with general analytical expansion techniques for finding approximate solutions to partial differential equations (the method of weighted residuals for problems that are posed by differential equations, and the Rayleigh--Ritz method for variational problems), and then find a numerical solution by using local basis functions with the spatial components of the field as the expansion weights. Instead of discretizing the space in which the problem is posed, this discretizes the form of the function used to represent the solution.
Because these problems are so important in engineering practice they consume an enormous number of CPU cycles. Finite element solvers are not easy to write; most people use dedicated packages. In addition to the core routines for solving large sparse matrix problems and systems of ordinary differential equations, these are used to specify the input geometry, generate meshes, assign material properties, and then visualize the output results.
Weighted Residuals¶
The method of weighted residuals converts a partial differential equation into a system of ordinary differential equations (or an algebraic equation if there is no time dependence). Let $u(\vec x,t)$ be a field variable (such as temperature) that depends on space and possibly on time, to be found as the solution of a given partial differential equation specified by a differential operator $D$, possibly with a source term $f$: $$ D[u(\ve{x},t)] = f(\ve{x},t) ~~~~. $$ An example is Poisson's equation $\del^2 u = \rho$. We will assume that $u$ is a scalar, but the following discussion is easily extended to vector $\ve{u}$.
If $\tilde{u}(\ve{x},t)$ is an approximate solution, its \trm{residual}{Residual} $R$ is defined to be the deviation from the correct value $$ R(\ve{x},t) = D[\tilde{u}(\ve{x},t)] - f(\ve{x},t) ~~~~. $$ Clearly a good approximate solution for $u(\ve{x},t)$ will make the residual as small as possible, but there are many ways to define ``small.''
The first step in developing finite differences was to consider $\ve{x}$ and $t$ to be discrete variables. Here, we will keep them continuous and instead write $u(\vec x,t)$ as a discrete sum of a set of expansion weights $a_i$ times (for now arbitrary) basis functions $\phi_i$ $$ u(\vec x,t) \approx \sum_i a_i(t)\phi_i(\vec x) $$ (Function Fitting will examine the general problem of expanding a function in terms of basis functions). We require that the $\phi_i$ be a complete set that can represent any function (up to the order of the approximation), but they need not be orthogonal. We will assume that the basis functions have been chosen to satisfy the boundary conditions; if this is not the case then there will be extra boundary equations in the following derivation.
The residual will be identically equal to zero only for the correct solution. For our approximate solution we will attempt the easier task of making the residual small in some average sense. There are many strategies for weighting the residual in the average to determine the best choice for the expansion coefficients $a_i$, the three most important ones for finite elements being:
Collocation¶
In collocation, the residual is set equal to zero at $N$ sites $$ R(\vec x_i) = 0 ~~~~(i = 1,\ldots,N) ~~~~. $$ This gives a system of $N$ equations for the $N$ unknown $a_i$'s. This is straightforward, but says nothing about the value of the residual away from the points where it is evaluated.
Least Squares¶
In the least squares method of weighted residuals, the square of the residual is integrated over the domain of the problem, and the expansion coefficients are sought that minimize this integral: $$ \ps{}{a_i} \int R(\vec x)^2~d\vec x = 0 ~~~~. $$
Galerkin¶
General Galerkin methods choose a family of weighting functions $w_i(\vec x)$ to use in evaluating the residual $$ \int R(\vec x) w_i(\vec x) ~ d\vec x = 0 ~~~~. $$ While any one such equation does not imply that the absolute value of the residual is small, if the set of weighting functions are nontrivially related then this can provide a tight constraint on the magnitude of the residual. The most common choice for the weighting functions is just the basis functions themselves $$ w_i(\vec x) = \ps{u}{a_i} = \phi_i(\vec x) ~~~~. $$ This is called the Bubnov--Galerkin method, or sometimes just the Galerkin method. In the Fourier--Galerkin method a Fourier expansion is used for the basis functions (the famous chaotic Lorenz set of differential equations were found as a Fourier-Galerkin approximation to atmospheric convection [Lorenz:1963]).
Of all these methods, the Galerkin techniques are the most common because of the convenient formulation they lead to.
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Markdown, display
x = [1,2,3]
yi = [0,1,0]
ymp = [1,0,1]
ui = [1.2,1.4,1.8]
fig, ax = plt.subplots()
ax.set_aspect(0.6)
plt.plot(x,yi,'k--')
plt.plot(x,ymp,'k--')
plt.plot(x,ui,'ko-')
ax.set_xticks(x,['','',''])
ax.set_yticks([0,1],['',''])
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.ylim(0,2)
def text(x,y,s):
plt.text(x,y,s,fontsize=16,math_fontfamily='cm')
text(0.95,1.3,'$u_{i-1}$')
text(0.95,0.75,'$\\varphi_{i-1}$')
text(0.9,-0.15,'$x_{i-1}$')
text(1.95,1.5,'$u_{i}$')
text(1.95,0.75,'$\\varphi_{i}$')
text(1.95,-0.15,'$x_{i}$')
text(2.9,1.9,'$u_{i+1}$')
text(2.9,0.75,'$\\varphi_{i+1}$')
text(2.9,-0.15,'$x_{i+1}$')
text(0.8,-.03,'$0$')
text(0.8,0.97,'$1$')
plt.show()
display(Markdown("*Figure 1: The 1D finite element 'hat' function*"))
Figure 1: The 1D finite element 'hat' function
These techniques for weighting residuals do not yet have anything to say about finite elements; they apply equally well to any family of basis functions $\phi_i$, and can be used to find analytical as well as numerical approximations. To be successful, however, the basis functions need to be chosen with care. In coming chapters we will see that global functions generally do a terrible job of fitting local behavior. The trick in finite elements is to recognize that the basis functions can be chosen so that they are nonzero only in small regions (the elements), and further can be defined so that the unknown expansion coefficients $a_i$ are just the values of the field variable $u$ (and as many derivatives as are needed for the problem) evaluated at desired locations, with the basis functions interpolating between these values. For example, in 1D the simplest such expansion is piecewise linear (the hat functions, shown in Figure 1): $$ \phi_i = \left\{ \begin{array}{c l} {\disp \frac{x - x_{i-1}}{x_i - x_{i-1}}} & x_{i-1} \le x < x_i \\ {\rule[-.15in]{0in}{.4in}\disp \frac{x_{i+1} - x}{x_{i+1} - x_i}} & x_i \le x < x_{i+1} \\ {\disp 0} & x < x_{i-1}~{\rm or}~ x \ge x_{i+1} \\ \end{array} \right. ~~~~. $$ Since $\phi_i(x_i) = 1$ and $\phi_j(x_i) = 0$ for all $j\ne i$, $$ \ba u(x_i) &=& \sum_i a_i \phi_i(x_i) \\ &=& a_i ~~~~, \ea $$ therefore the expansion coefficient $a_i$ at the element point $x_i$ is just the field value $u_i=u(x_i)$. In one element $x_i \le x < x_{i+1}$ the field is piecewise linearly interpolated as $$ u(x) = u_i \frac{x_{i+1} - x}{x_{i+1} - x_i} + u_{i+1} \frac{x - x_i}{x_{i+1} - x_i} ~~~~. $$
If a finite element expansion is used in one of the residual weighting strategies the result is a set of algebraic equations for the unknown coefficients $a_i$, or a set of ordinary differential equations if the problem is time dependent. Unlike finite differences, we are now free to put the approximation nodes $x_i$ wherever is appropriate for the problem.
Let's return to the simple first-order flux PDE $$ \ps{u}{t} = -v\ps{u}{x} $$ to see how the Galerkin method is applied. For each basis function $\phi_j$ there is a weighted residual equation integrated over the problem domain $$ \int \left(\ps{u}{t} + v\ps{u}{x}\right) \phi_j ~dx = 0 $$ (in this case the source term $f=0$). Plugging in $$ u(x,t) = \sum_i a_i(t) \phi_i(x) $$ gives $$ \sum_i \int \left( \ds{a_i}{t}\phi_i\phi_j + v a_i\phi_j\ds{\phi_i}{x} \right)~ dx = 0~~~~. $$ This can be written in matrix form as $$ {\bf A}\cdot \ds{\vec a}{t} + {\bf B}\cdot \vec a = \vec 0 $$ where $$ \mat{A}_{ij} = \int \phi_i \phi_j ~ dx $$ and $$ \mat{B}_{ij} = v\int \phi_j\ds{\phi_i}{x} ~ dx $$ are matrices that depend only on the basis functions, and the vector $\vec a$ is the set of expansion coefficients. This is now a system of ordinary differential equations that can be solved with the methods that we studied in Finite Differences. Since each basis function overlaps only with its immediate neighbors, the $\mat{A}$ and $\mat{B}$ matrices are very sparse and so they can be solved efficiently (this is the main job of a finite element package, and much of numerical linear algebra).
The linear interpolation of the hat functions breaks the solution space up into many elements, each of which has a single degree of freedom $u_i$. On the other hand, if the basis functions extend over the entire domain, then this can be viewed as a single large element that has many degrees of freedom (the $a_i$'s). There is a range of options between these extremes, and part of the art of applying finite elements is balancing the use of more-accurate large complicated elements with the use of less-accurate simple small elements. As with solving ODEs, the simplest linear elements are a poor approximation and it is usually advantageous to use more complex elements. It's also necessary to make sure that the elements have enough degrees of freedom to be able to satisfy the residual weighting equation, which for polynomial expansions will depend on the order of the PDE. For example, the bending of stiff plates has a term that depends on the fourth spatial derivative of the displacement. Since the fourth derivative of a linear element is zero, a linear element cannot satisfy this (other than the trivial solution $u=0$).
It is common to use polynomials for more complicated elements. Within a 1D element $x_i \le x < x_{i+1}$, the piecewise linear approximation is made up of a superposition of two \trm{shape functions}{Shape functions} $$ \psi_1 = \frac{x - x_i}{x_{i+1} - x_i} ~~~~ \psi_2 = \frac{x_{i+1} - x}{x_{i+1} - x_i} ~~~~, $$ the parts of the basis functions that are nonzero in the element. These functions have the desirable property of being equal to 1 at one of the element boundaries and vanishing at the other. In addition, they sum to 1 everywhere in the interval, so that if $u_i=u_{i+1}$ then the approximation is a constant over the interval. \trm{Lagrange interpolation}{Lagrange interpolation} generalizes this to a set of $N$ $N$th-order normalized polynomials defined to vanish at all but one of $N$ sites: $$ \ba \psi_1&=&\frac{(x_2-x)(x_3-x)\cdots(x_N-x)} {(x_2-x_1)(x_3-x_1)\cdots(x_N-x_1)} \\ \psi_2&=&\frac{(x_1-x)(x_3-x)\cdots(x_N-x)} {(x_1-x_2)(x_3-x_2)\cdots(x_N-x_2)} \\ &\vdots& \\ \psi_N&=&\frac{(x_1-x)(x_2-x)\cdots(x_{N-1}-x)} {(x_1-x_N)(x_2-x_N)\cdots(x_{N-1}-x_N)} ~~~~. \ea $$ These can be used in higher-order elements.
A more intuitive way to define polynomial basis functions is in terms of the value of the field and its derivatives at the boundary of the element. For example, in 1D for a cubic polynomial $$ u = a_0 + a_1 x + a_2 x^2 + a_3 x^3 $$ defined over the interval $0\le x< h$, a trivial calculation shows that the $a$'s are related to the boundary values $u_0, u_h$ and the boundary slopes $\dot u_0, \dot u_h$ by $$ \left(\begin{array}{c} u_0 \\ \dot u_0 \\ u_h \\ \dot u_h \end{array}\right) = \left[\begin{array}{c c c c} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & h & h^2 & h^3 \\ 0 & 1 & 2h & 3h^2 \end{array}\right] \left(\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array}\right) ~~~~. $$ Given desired values for the $u$'s and $\dot u$'s, this can be inverted to find the $a$'s. In particular, indexed in terms of $(u_0, \dot u_0, u_h, \dot u_h)$, the four shape functions in this representation are $(1,0,0,0)$, $(0,1,0,0)$, $(0,0,1,0)$, $(0,0,0,1)$. These give the element four degrees of freedom, but if we impose continuity on the function and slope across the element boundaries this is reduced to two degrees of freedom. Figure 2 shows how the element shape functions can be assembled into two basis functions, one having the value of $u_i$ for the expansion coefficient, and the other $\dot u_i$. These handy functions, used for Hermite interpolation, will return when we look at splines in .
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Markdown, display
h = 1
M = np.array([
[1,0,0,0],
[0,1,0,0],
[1,h,h*h,h*h*h],
[0,1,2*h,3*h*h]])
Minv = np.linalg.inv(M)
#
fig, ax = plt.subplots()
ax.set_aspect(0.8)
plt.xlim(0,2)
plt.ylim(-.2,1.1)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines['bottom'].set_position(('data', 0))
ax.set_xticks([0,1,2],['','',''])
ax.set_yticks([0,1],['',''])
#
v = (1,0,0,0)
a = Minv@v
x = np.linspace(0,1,100)
y = a[0]+a[1]*x+a[2]*x**2+a[3]*x**3
plt.plot(x,y,'k')
plt.plot(1+x,y,'k--')
#
v = (0,1,0,0)
a = Minv@v
x = np.linspace(0,1,100)
y = a[0]+a[1]*x+a[2]*x**2+a[3]*x**3
plt.plot(x,y,'k')
plt.plot(1+x,y,'k--')
#
v = (0,0,1,0)
a = Minv@v
x = np.linspace(0,1,100)
y = a[0]+a[1]*x+a[2]*x**2+a[3]*x**3
plt.plot(x,y,'k--')
plt.plot(1+x,y,'k')
#
v = (0,0,0,1)
a = Minv@v
x = np.linspace(0,1,100)
y = a[0]+a[1]*x+a[2]*x**2+a[3]*x**3
plt.plot(x,y,'k--')
plt.plot(1+x,y,'k')
#
def text(x,y,s):
plt.text(x,y,s,fontsize=16,math_fontfamily='cm')
text(-0.1,-.03,'$0$')
text(-0.1,0.5,'$u$')
text(-0.1,0.97,'$1$')
text(0.9,0.29,'$u=0$')
text(0.9,0.15,'$\\dot u=1$')
text(0.9,0.85,'$u=1$')
text(0.9,0.71,'$\\dot u=0$')
text(0.03,-0.15,'$x_{i-1}$')
text(1,-0.15,'$x_i$')
text(1.9,-0.15,'$x_{i+1}$')
#
plt.show()
display(Markdown("*Figure 2: 1D finite element polynomial basis functions defined in terms of the value and derivative of the function at the element boundaries.*"))
Figure 2: 1D finite element polynomial basis functions defined in terms of the value and derivative of the function at the element boundaries.
The generalization of these elements to higher dimensions is straightforward. For example, in 2D a triangular element with corners at $(x_0,y_0)$, $(x_1,y_1)$, and $(x_2,$ $y_2)$ can be interpolated by three bilinear basis functions $$ \ba \phi_0 &=& \frac{(x_1-x)(y_2-y)}{(x_1-x_0)(y_2-y_0)} \n\\ \phi_1 &=& \frac{(x_2-x)(y_0-y)}{(x_2-x_1)(y_0-y_1)} \n\\ \phi_2 &=& \frac{(x_0-x)(y_1-y)}{(x_0-x_2)(y_1-y_2)} ~~~~. \ea $$ To use these elements it is necessary to cover space with a triangulation (a mesh of triangles); such mesh generation is a major task of a finite element environment. In 3D a tetrahedron is the primitive element, and so forth.
It is useful to integrate by parts in the Galerkin method to reduce the order of the highest spatial derivative and therefore the required element order. For example, if we start with a 1D wave equation $$ \pp{u}{t} = v^2 \pp{u}{x} $$ defined in the interval (0,1), the Galerkin expansion is $$ \sum_i \int_0^1 \left( \dd{a_i}{t}\phi_i\phi_j - v^2 a_i\phi_j\dd{\phi_i}{x} \right)~ dx = 0~~~~. $$ The second term can be integrated by parts: $$ \sum_i \dd{a_i}{t}\underbrace{\int_0^1 \phi_i\phi_j~dx}_{\disp \mat{A}_{ij}} + \sum_i a_i \underbrace{\left[ \int_0^1 v^2 \ds{\phi_i}{x} \ds{\phi_j}{x} ~ dx - \left. v^2 \phi_j \ds{\phi_i}{x} \right|_0^1 \right]}_{\disp \mat{B}_{ij}} = 0 $$ $$ {\bf A}\cdot \dd{\vec a}{t} + {\bf B}\cdot \vec a = \vec 0 ~~~~. $$ This is now a second-order matrix differential equation for the vector of expansion coefficients $\vec a$, and because of the integration by parts the maximum spatial derivative is single rather than double (and so first-order elements can be used). In Differential and Difference Equations we turned a string of harmonic oscillators into a wave equation; here we've turned a wave equation back into (generalized) harmonic oscillators.
Since this is a linear problem we can go further and assume a periodic time dependence $$ \vec a(t) = \vec a_0 e^{i\omega t} $$ to find an eigenvalue problem for the modes $$ \omega^2{\bf A}\cdot \vec a_0 = {\bf B}\cdot \vec a_0 ~~~~. $$ In higher dimensions, Green's theorem can be used to reduce the order of the equation by relating area or volume integrals to integrals over boundaries.
Rayleigh--Ritz Variational Methods¶
In Variational Principles we saw how many physical problems are most naturally posed in terms of a variational integral, and then saw how to use Euler's equation to find a differential equation associated with a variational principle. In this section we will look at finite element methods that start directly from a variational integral ${\cal I}$ and find the field distribution that makes an integral extremal
$$ \delta {\cal I} = \delta \int F[u(\vec x,t)]~d\vec x = 0 ~~~~. $$$F$ might be the energy, or action, or time, and there can be other integral constraints added with Lagrange multipliers, such as the path length.
One of the most important applications of finite elements is to structural mechanics problems, for which the integral to be minimized is the potential energy in a structure. For example, ignoring shear, the potential energy $V$ of a beam bent from its equilibrium configuration $u(x)=0$ by a lateral force $f(x)$ is given in terms of the elasticity modulus $E$ and the area moment of inertia $I$ by
$$ \tag{beam} V = \int_0^L\left(\frac{1}{2}EI\left(\dd{u}{x}\right)^2-u(x)f(x)\right)~dx~~~~. $$The first term is the elastic energy stored in the beam by bending, and the second term is the work done against the applied force. The equilibrium of the beam is given by the variation $\delta V=0$.
As with the method of weighted residuals, we will approximate $u$ by an expansion in basis functions
$$ u(\vec x,t) = \sum_i a_i(t)\phi_i(\vec x) ~~~~, $$and try to find the best choice for the $a_i$'s. If the integral is extremal, then its partial derivative with respect to all of the $a_i$'s must vanish:
$$ \ps{{\cal I}}{a_i} = 0 ~~~~ (i=1,\ldots,N) $$(with the equilibrium stability determined by whether this is a maximum or a minimum). This Rayleigh--Ritz set of equations applies whether the $\phi_i(\vec x)$ are defined globally or locally, but this becomes a prescription for a finite element method if each $\phi_i$ is nonzero only in a local neighborhood and has the field variables and as many derivatives as needed as the expansion coefficients (as we saw with the method of weighted residuals). If a problem already has a discrete set of coordinates (such as the generalized coordinates of a Lagrangian), the Rayleigh--Ritz method can be used directly without needing an approximate expansion in basis functions.
For example, in 1D a mass on a spring has a potential energy $kx^2/2$, and if there is an applied force $-F$ the work done in moving against the force is $-Fx$, so the equilibrium position is easily found to be
$$ \ps{}{x}\left(\frac{1}{2}kx^2-Fx\right) = 0 ~~~~ \Rightarrow ~~~~ x = \frac{F}{k} ~~~~. $$For the more difficult continuous case of the (beam) equation, plugging in the expansion and asking that the energy be extremal gives
$$ \ba 0 &=& \ps{}{a_j} \int_0^L \left[ \frac{1}{2} EI \left( \sum_i a_i \dd{\phi_i}{x} \right)^2 - \sum_i a_i\phi_i(x) f(x)\right] dx \\ &=& \sum_i a_i \underbrace{\int_0^L E I \dd{\phi_i}{x} \dd{\phi_j}{x} ~dx}_{\disp\mat{A}_{ij}} - \underbrace{\int_0^L \phi_j(x) f(x) ~ dx}_{\disp\ve{b}_j} \\ &=& \mat{A}\cdot\ve{a} - \ve{b} \\ \Rightarrow \ve{a} &=& \mat{A}^{-1}\cdot\ve{b} ~~~~. \ea $$Because of the second derivative, quadratic elements are needed.
The Rayleigh--Ritz method has converted this variational problem into an algebraic one. More complex finite element problems result in nonlinear equations to find the coefficients; solving these requires the search techniques to be covered in Search.
Finite Element Packages¶
References¶
- [Ames:2014] Ames, William F. Numerical methods for partial differential equations. Academic press, 2014.
- While Ames does not have much on the practical use of finite elements, he has a good introduction to the basic approximations used.
- [Bower:2009] Bower, Allan F. Applied mechanics of solids. CRC press, 2009.
- Good intro to solid mechanics problems, and their solution with finite elements.
- [Cook:2007] Cook, Robert D. Concepts and applications of finite element analysis. John wiley & sons, 2007.
- There are many finite element texts with similar titles; this has a good balance between rigor and practice.
- [Lorenz:1963] Lorenz, Edward N. "The mechanics of vacillation." Journal of Atmospheric Sciences 20, no. 5 (1963): 448-465.
- The birth of chaos.
Problems¶
- Consider the damped wave equation:
$$
\pp{u}{t} = v^2 \pp{u}{x} + \gamma \ps{}{t} \pp{u}{x} ~~~~.
$$
Take the solution domain to be the interval $[0,1]$.
- Use the Galerkin method to find an approximating system of differential equations.
- Evaluate the matrix coefficients for linear hat basis functions, using elements with a fixed size of $h$.
- Now find the matrix coefficients for Hermite polynomial interpolation basis functions, once again using elements with a fixed size of $h$.
- Extra credit: Simulate the string
- Model the bending of a beam under an applied load using the (beam) equation.
- Use Hermite polynomial interpolation, and boundary conditions fixing the displacement and slope at one end, and applying a force at the other end.
- Compare your result to bending a beam in a finite-element package.
(c) Neil Gershenfeld 2/25/26