Matrix Multiplication – chrwittm.github.io

Since matrix multiplication is a big thing for deep learning and visualizations like http://matrixmultiplication.xyz/ were a bit to fast for me to properly re-understand what I learned in highschool, I decided dive in more systematically without falling into the trap of learning lots of math before continuing with deep learning. This will be short and sweet:

Taking a few lessons on Khan Academy.
Implementing matrix multiplication in python from scratch (with numpy and pytorch)

This has been done a million times before already, but nonetheless, let me explain what I learned along the way.

Takeaways from Khan Academy

One thing I was struggling with was to intuit the dimensions of the target matrix.

Let’s assume two matrixes: \(A = (m \times n)\) and \(B = (n \times k)\)

This picture from Khan Academy sums it all up for me:

Illustration by Khan Academy CC BY-NC-SA 3.0 US
Note: All Khan Academy content is available for free at (www.khanacademy.org)“

Therefore, a matrix multiplication is defined if the number of columns of matrix \(A\) matches the number of rows of matrix \(B\).

The resulting matrix \(C = A \times B\) has the same number of rows as matrix \(A\) and the same number of columns as matrix \(B\).

Implementing Matrix Multiplication in Python from scratch

Once that was done, I decided to implement matrix multiplication in python. I found this tutorial which provided me with the task and some guidance along the way, especially on a few things in python.

As a starting point, here are 2 matrixes that we want to multiply (example from tutorial sightly adjusted):

import numpy as np
np.random.seed(27)
A = np.random.randint(1,10,size = (4,3))
B = np.random.randint(1,10,size = (3,2))
print(f"Matrix A:\n {A}\n")
print(f"Matrix B:\n {B}\n")

Matrix A:
 [[4 9 9]
 [9 1 6]
 [9 2 3]
 [2 2 5]]

Matrix B:
 [[7 4]
 [4 1]
 [6 4]]

This is the final result, we want to re-implement from scratch:

A@B

array([[118,  61],
       [103,  61],
       [ 89,  50],
       [ 52,  30]])

Indexing in Python

Maybe this is too obvious for many, but I find it worth noting, that the sequence in which python addresses arrays (or tensors) is first by row, than by column. What do I mean by saying that?

When you want to index into an array, you do this by array_name[row:column], for example A[1,2] return 6, it is the second line (which is index 1 when starting to count at 0), and the third column (which is index 1 when starting to count at 0):

A[1,2]

Is there a way to not only remember this, but to also understand this? Yes, I think so: The most basic array (tensor) is a list (rank 1 tensor), which we can think of as one row of numbers. Therefore, the first index represents the row. You can think of a 2-dimensional array (a rank 2 tensor) as adding the columns to a row of numbers (by adding more rows), therefore the second index represents the columns. Hence to access an element in a 2D-array (rank-2 tensor), this is done by array_name[row:column].

Why do we think about indexing? First, to determine if a matrix multiplication is defined, we need to find the dimensions of the matrixes, and later on we need to access the matrix content for the calculation.

To access a complete row or column, we use:

For a row: array_name[row, : ] or the short form array_name[row]
For a column: array_name[ : ,column]

This means: We access a specific row or column by index, and from the other dimension, we access all elements. For example:

# accessing the first row of matrix A

A[0] #same as A[0,:]

array([4, 9, 9])

# accessing the first column of matrix B

B[:,0]

array([7, 4, 6])

Constructing a target matrix of zeros

The \(C\) target matrix has the same number of rows as A and the same number of columns of B, so in our example that is a matrix with 4 rows and 2 columns:

np.zeros((4, 2), dtype = int)

array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0]])

The number of rows is the length of a column, therefore, to get the number of rows of matrix A, we can write:

len(A[:,0]) #i.e. the length of the first column

Similarly, the number of elements in a row if the number of columns, Therefore, the number of columns of B is:

len(B[0]) #the number of entries in the first row

While to above is correct, there is a more elegant way to write this. Each array (tensor) has an attribute .shape which tells us how many rows and columns an array has (notice the sequence in the tuple: (row,column)):

print(A.shape)
print(B.shape)

(4, 3)
(3, 2)

Therefore, we can re-write:

print(f'Number of rows in matrix A: {A.shape[0]}') 
print(f'Number of columns in matrix B: {B.shape[1]}')

Number of rows in matrix A: 4
Number of columns in matrix B: 2

Now we can generically construct the target matrix \(C\):

C = np.zeros((A.shape[0], B.shape[1]), dtype = int)
C.shape

(4, 2)

Exercise: Implement Matrix Multiplication with numpy arrays

Implement a function multiply_matrix(A,B) which does the following:

Accept two matrices, A and B, as inputs.
Check if matrix multiplication between A and B is valid, if not raise an error.
If valid, multiply the two matrices A and B, and return the product matrix C.

def multiply_matrix(A,B):
    
    if A.shape[1] != B.shape[0]:
        raise ValueError('Number of columns of A and number of rows of B do not match')
    
    C = np.zeros((A.shape[0], B.shape[1]), dtype=int)

    for row in range(C.shape[0]):
        for column in range(C.shape[1]):
            for step in range(A.shape[1]):
                C[row, column] += A[row, step] * B[step, column]
    
    return C

C1 = multiply_matrix(A, B)
C1

array([[118,  61],
       [103,  61],
       [ 89,  50],
       [ 52,  30]])

C2 = A@B
assert np.array_equal(C1, C2)

Exercise: Implement Matrix Multiplication with tensors

Just for the fun of it, let’s re-implement the same with pytorch tensors. It turns out it same, same, but a little different:

import torch

torch.manual_seed(27) #https://pytorch.org/docs/stable/notes/randomness.html
X = torch.randint(1,10,size = (4,3)) #https://pytorch.org/docs/stable/generated/torch.randint.html
Y = torch.randint(1,10,size = (3,2))
print(f"Matrix X:\n {X}\n")
print(f"Matrix Y:\n {Y}\n")

Matrix X:
 tensor([[1, 1, 5],
        [8, 8, 6],
        [1, 7, 1],
        [4, 4, 1]])

Matrix Y:
 tensor([[7, 6],
        [3, 7],
        [9, 5]])

Z = torch.zeros((4, 2), dtype = int)
Z

tensor([[0, 0],
        [0, 0],
        [0, 0],
        [0, 0]])

Z.shape

torch.Size([4, 2])

def multiply_matrix_torch(A,B):
    
    if A.shape[1] != B.shape[0]:
        raise ValueError('Number of columns of A and number of rows of B do not match')
    
    C = torch.zeros((A.shape[0], B.shape[1]), dtype=int)

    for row in range(C.shape[0]):
        for column in range(C.shape[1]):
            for step in range(A.shape[1]):
                C[row, column] += A[row, step] * B[step, column]
    
    return C

Z1 = multiply_matrix_torch(X, Y)
Z1

tensor([[ 55,  38],
        [134, 134],
        [ 37,  60],
        [ 49,  57]])

Z2 = X@Y
assert torch.equal(Z1, Z2) == True

That concludes the “exploration” of matrix multiplication, I learned a lot along the way :).