Square root of 2 by 2 matrix using Cayley–Hamilton theorem

By Martin McBride, 2025-11-21
Tags: matrix matrix algebra determinant square root trace
Categories: matrices
Level:


The square of a matrix R is just R multiplied by R. We can define the square root of a matrix as the inverse function, like this:

Square root definition

That is, if S is the square of R, then R is the square root of S. And just like regular square roots, a matrix can have more than one square root.

One important thing to notice is that R and S must both be square matrices, with the same shape. R must be square because only a square matrix can be multiplied by itself. And a square matrix multiplied by itself creates a matrix of the same order, so S must be square and of the same shape as R.

There are various ways to find the square root of a matrix, but for the case of a 2 by 2 matrix, there is actually a fairly simple formula we can use. We will introduce that formula here and then derive it.

Formula for the square root of a 2 by 2 matrix

The square root R of a 2 by 2 matrix S can be written as:

Square root formula

Where:

Square root formula

And:

Square root formula

In these formulas, |S| is the determinant of S, Tr(s) is the trace of S, and I is the unit matrix (see the next section for a recap of what these terms mean).

Notice that both terms are square roots, and the positive and negative values each give a valid solution. This means that a 2 by 2 matrix might have up to 4 square roots. However, sometimes multiple roots might have the same values, leading to fewer than 4 distinct roots.

Some useful values

We will start by calculating some values that will be useful in deriving the formula. First, let's name the elements of the 2 by 2 matrix R:

Useful values

The determinant of a matrix is a single number that is calculated from the elements of a matrix. For a 2 by 2 matrix, the determinant is calculated by combining the 4 elements as follows:

Useful values

The trace of a matrix is simply the sum of all the elements on the leading diagonal. For a 2 by 2 matrix, it is given by:

Useful values

We will make use of the rule that the determinant of the product of two matrices is equal to the product of the determinants of the two matrices:

Useful values

Using this rule, we can show that the determinant of S is equal to the square of the determinant of R:

Useful values

This means that, if R is the square root of S, then the determinant of R must be equal to plus or minus the square root of the determinant of S. This is the quantity we previously called δ:

Useful values

Also, as a reminder, the 2 by 2 unit matrix I is equal to:

Useful values

The characteristic equation

We are going to make use of the characteristic equation of R. We won't make direct use of it, but it plays a part in the Cayley–Hamilton theorem, which we will use next. The characteristic equation tells us that λ is an eigenvalue of the matrix X if and only if:

Characteristic equation

We can substitute the known values for R and I into the general equation:

Characteristic equation

Then we can add the two matrices to simplify the determinant:

Characteristic equation

We can then expand the 2 by 2 determinant:

Characteristic equation

Simplifying the terms gives:

Characteristic equation

Now we know from earlier the (a + d) is equal to Tr(R), and also that (ad - bc) is equal to |R| which in turn is equal to δ. This gives us a quadratic equation in λ:

Characteristic equation

We could solve this equation to find the eigenvalues, but we aren't going to do that here. Instead, we are going to use the Cayley-Hamilton theorem.

The Cayley-Hamilton theorem

We won't look at the Cayley-Hamilton theorem in detail here, but it can be summarised as follows: every square matrix satisfies its own characteristic equation.

What does this mean? Well, if a matrix M has a characteristic equation of the form:

Cayley-Hamilton theorem

Then Cayley-Hamilton says that if we replace λ with M, the equation will still be satisfied:

Cayley-Hamilton theorem

There is a small wrinkle here. λ is a scalar, so the original characteristic equation is a scalar equation. But M is a matrix, so we need to use a matrix equation. We can't add the scalar p to the other matrix terms. The theorem requires us to first multiply p by the unit matrix, as shown.

We can apply this to our matrix R. Substituting our previous values into the equation:

Cayley-Hamilton theorem

Finding R in terms of S

Our solution is now closer than it might look. We aim to find R, the square root, for any given S. The problem is that our equations currently contain only terms in R. But R and S are related. If we could express some of those terms using S instead, we might be able to solve for R.

There is one thing we can do straight away. The previous equation had a term in R squared, and of course we know that is equal to S:

Finding R in terms of S

The next obvious term to look at is Tr(R). Can we convert this to something else, perhaps something involving Tr(S)? We know from earlier that Tr(R) is a + d. Can we find a similar expression for Tr(S)? Well we know that S is R squared, so we can find S in terms of the values a to d by matrix multiplication:

Finding R in terms of S

Tr(S) is just the sum of the two terms in the leading diagonal:

Finding R in terms of S

Now the and are quite interesting. We know that Tr(R) is a + d, so squaring that will give us quite a similar expression:

Finding R in terms of S

Comparing the previous expressions gives the following relationship between Tr(R) and Tr(S):

Finding R in terms of S

But, of course, ad - bc is the determinant of R. And we already know that the values of |R| for the solutions of the C-H equation are our old friend δ, in its positive and negative forms:

Finding R in terms of S

Putting this back into the previous equation gives:

Finding R in terms of S

We can now take the square root to find Tr(R). Once again, we must consider the positive and negative cases. This turns out to be the value 𝜏 that we defined right at the start:

Finding R in terms of S

Finding the solution

If we go back to our previous solution to the C-H equation using R:

Cayley-Hamilton theorem

Making the substitutions for and Tr(R):

Finding R in terms of S

This can be easily rearranged to prove the square root formula:

Square root formula

An example

Let's verify this with an example. We will find the root of the following matrix:

Example

The matrix has been deliberately chosen as the square of a reasonably simple matrix, so we don't have to deal with messy radicals when we calculate R. But it isn't a trivial case, so it is a fair test.

We can find the trace and determinant of S. We won't go through this in detail, it can ce easily verified using an online matrix calculator:

Example

We can then calculate the positive values of δ and 𝜏:

Example

Putting these values into the square root formula gives:

Example

This is the value of R we used to create S, so we know it is the correct square root.

What if we choose the negative value of δ? This will also affect the value of 𝜏:

Example

Performing the same calculation as before, we get:

Example

This is a different matrix, but if we square it, we get the same result, S.

We must also consider the negative values of 𝜏. It can be -6 when δ is 5, or -4 when δ is -5. Since 𝜏 only appears as the denominator, changing its sign effectively negates the whole matrix. So S has 4 square roots, the two given above and their negatives.

Related articles



Join the GraphicMaths Newsletter

Sign up using this form to receive an email when new content is added:

Popular tags

adder adjacency matrix alu and gate angle answers area argand diagram binary maths cardioid cartesian equation chain rule chord circle cofactor combinations complex modulus complex numbers complex polygon complex power complex root cosh cosine cosine rule countable cpu cube decagon demorgans law derivative determinant diagonal directrix dodecagon e eigenvalue eigenvector ellipse equilateral triangle erf function euclid euler eulers formula eulers identity exercises exponent exponential exterior angle first principles flip-flop focus gabriels horn galileo gamma function gaussian distribution gradient graph hendecagon heptagon heron hexagon hilbert horizontal hyperbola hyperbolic function hyperbolic functions infinity integration integration by parts integration by substitution interior angle inverse function inverse hyperbolic function inverse matrix irrational irrational number irregular polygon isomorphic graph isosceles trapezium isosceles triangle kite koch curve l system lhopitals rule limit line integral locus logarithm maclaurin series major axis matrix matrix algebra mean minor axis n choose r nand gate net newton raphson method nonagon nor gate normal normal distribution not gate octagon or gate parabola parallelogram parametric equation pentagon perimeter permutation matrix permutations pi pi function polar coordinates polynomial power probability probability distribution product rule proof pythagoras proof quadrilateral questions quotient rule radians radius rectangle regular polygon rhombus root sech segment set set-reset flip-flop simpsons rule sine sine rule sinh slope sloping lines solving equations solving triangles square square root squeeze theorem standard curves standard deviation star polygon statistics straight line graphs surface of revolution symmetry tangent tanh transformation transformations translation trapezium triangle turtle graphics uncountable variance vertical volume volume of revolution xnor gate xor gate