Since matrix multiplication is a big thing for deep learning and visualizations like http://matrixmultiplication.xyz/ were a bit to fast for me to properly re-understand what I learned in highschool, I decided dive in more systematically without falling into the trap of learning lots of math before continuing with deep learning. This will be short and sweet:
Therefore, a matrix multiplication is defined if the number of columns of matrix \(A\) matches the number of rows of matrix \(B\).
The resulting matrix \(C = A \times B\) has the same number of rows as matrix \(A\) and the same number of columns as matrix \(B\).
Implementing Matrix Multiplication in Python from scratch
Once that was done, I decided to implement matrix multiplication in python. I found this tutorial which provided me with the task and some guidance along the way, especially on a few things in python.
As a starting point, here are 2 matrixes that we want to multiply (example from tutorial sightly adjusted):
Maybe this is too obvious for many, but I find it worth noting, that the sequence in which python addresses arrays (or tensors) is first by row, than by column. What do I mean by saying that?
When you want to index into an array, you do this by array_name[row:column], for example A[1,2] return 6, it is the second line (which is index 1 when starting to count at 0), and the third column (which is index 1 when starting to count at 0):
A[1,2]
6
Is there a way to not only remember this, but to also understand this? Yes, I think so: The most basic array (tensor) is a list (rank 1 tensor), which we can think of as one row of numbers. Therefore, the first index represents the row. You can think of a 2-dimensional array (a rank 2 tensor) as adding the columns to a row of numbers (by adding more rows), therefore the second index represents the columns. Hence to access an element in a 2D-array (rank-2 tensor), this is done by array_name[row:column].
Why do we think about indexing? First, to determine if a matrix multiplication is defined, we need to find the dimensions of the matrixes, and later on we need to access the matrix content for the calculation.
To access a complete row or column, we use:
For a row: array_name[row, : ] or the short form array_name[row]
For a column: array_name[ : ,column]
This means: We access a specific row or column by index, and from the other dimension, we access all elements. For example:
# accessing the first row of matrix AA[0] #same as A[0,:]
array([4, 9, 9])
# accessing the first column of matrix BB[:,0]
array([7, 4, 6])
Constructing a target matrix of zeros
The \(C\) target matrix has the same number of rows as A and the same number of columns of B, so in our example that is a matrix with 4 rows and 2 columns:
np.zeros((4, 2), dtype =int)
array([[0, 0],
[0, 0],
[0, 0],
[0, 0]])
The number of rows is the length of a column, therefore, to get the number of rows of matrix A, we can write:
len(A[:,0]) #i.e. the length of the first column
4
Similarly, the number of elements in a row if the number of columns, Therefore, the number of columns of B is:
len(B[0]) #the number of entries in the first row
2
While to above is correct, there is a more elegant way to write this. Each array (tensor) has an attribute .shape which tells us how many rows and columns an array has (notice the sequence in the tuple: (row,column)):
print(A.shape)print(B.shape)
(4, 3)
(3, 2)
Therefore, we can re-write:
print(f'Number of rows in matrix A: {A.shape[0]}') print(f'Number of columns in matrix B: {B.shape[1]}')
Number of rows in matrix A: 4
Number of columns in matrix B: 2
Now we can generically construct the target matrix \(C\):
C = np.zeros((A.shape[0], B.shape[1]), dtype =int)C.shape
(4, 2)
Exercise: Implement Matrix Multiplication with numpy arrays
Implement a function multiply_matrix(A,B) which does the following:
Accept two matrices, A and B, as inputs.
Check if matrix multiplication between A and B is valid, if not raise an error.
If valid, multiply the two matrices A and B, and return the product matrix C.
def multiply_matrix(A,B):if A.shape[1] != B.shape[0]:raiseValueError('Number of columns of A and number of rows of B do not match') C = np.zeros((A.shape[0], B.shape[1]), dtype=int)for row inrange(C.shape[0]):for column inrange(C.shape[1]):for step inrange(A.shape[1]): C[row, column] += A[row, step] * B[step, column]return CC1 = multiply_matrix(A, B)C1
def multiply_matrix_torch(A,B):if A.shape[1] != B.shape[0]:raiseValueError('Number of columns of A and number of rows of B do not match') C = torch.zeros((A.shape[0], B.shape[1]), dtype=int)for row inrange(C.shape[0]):for column inrange(C.shape[1]):for step inrange(A.shape[1]): C[row, column] += A[row, step] * B[step, column]return CZ1 = multiply_matrix_torch(X, Y)Z1