Lesson 3 took me a while to rework, because it churned out quite a few interesting projects:
- Visualizing Gradient Descent in 3D
- Working on the Kaggle Titanic Competition
- Implementing a model for recognizing digits of the MNIST-dataset
Alongside these 3 main activities, I also started a blog on my machine learning journey, which is based on Quarto.
Let me summarize what I have done and learned.
Gradient Descent
The main focus of lesson 3 for me was/is gradient descent, which I found pretty easy to understand on a high level:
- Calculate the predictions and the loss (forward-pass)
- Calculate the derivatives of the parameters (i.e. how does changing the parameters change the loss) (backward-pass)
- Update the parameters (via the learning rate)
- Don’t forget to initialize the gradients
- Restart
However, in the actual implementation and its simplicity, there is a lot of magic, which I tried to unpack for myself. Working through Jeremy’s notebook “How does a neural net really work?”, I tried to not only think through the concept, but also to visualize it. The result is available as
- a blog post
- a forum post
- a Kaggle notebook in which you can easily also play with the visualizations interactively (by copying and running the notebook)
- a GitHub notebook
I was excited and honored to read that Ben, a fellow Fast.AI student, created even better visualizations building on my work. I highly recommend playing with it and also checking out his other projects.
While I truly love the Fast.AI content, I also need to mention the great video “The spelled-out intro to neural networks and backpropagation: building micrograd” from Andrej Karpathy, which dives at least one level deeper. If there is one key takeaway from this video, it is this one: “A single gradient tells us the effect changing a parameter has on the loss”. This insight is powerful, and somehow it tends to get lost from my point of view, because you either have a very complex model with many, many parameters so that this is difficult to grasp, or you have a seemingly simple model in which you mix up the slope of the quadratic with a gradient. Implementing the visualization of gradient descent was also about building the intuition of what is actually going on under to hood.
The Titanic Competition
Inspired by the Excel-based version of the Titanic Competition, I decided to enter the kaggle competition. As with many good project, this resulted in a few other mini-projects:
- The logistics on how a kaggle competition actually works, which included installing kaggle on my local machine. The Live-Coding Sessions of the 2022 Fast.AI course are probably quite underrated (at least looking at the number of views they get on Youtube). I find them a great addition to the official lessons because the tackle side problems like installing kaggle (in Live-Coding Session 7 and the related official topic in the forums) which otherwise would set you back some hours (or more). A big shout-out for these sessions!
- Revisiting matrix multiplication. Apart from the math, this also was about some python basics for me. While the result of implementing matrix multiplication from scratch has probably been done a million times, it still taught me some valuable lessons.
In the actual Titanic Competition, I did not focus too much on submitting a perfect result, but I rather aimed at re-visiting/solidifying the topic of gradient descent by replicating the actual lesson content. I built for following 2 notebooks
- The first one uses a Fast.AI tabular learner to create a baseline while getting the know the data.
- Next, I re-implemented the Excel-based version from the video in python in this notebook.
While my final high score of 77.2% is far away from perfect, I decided to come back to this competition another time, focusing more on the content, not just on gradient descent (like this time).
MNIST, the ‘Hello World’ of Computer Vision
As Jeremy points out at the end of lesson 3, this lesson corresponds to chapter 4 of the book. Indeed, it covers very similar topics, but the example used is the light version of the MNIST dataset (which only contains 3s and 7s). Following the recommendation for further research, I implemented a model for the complete MNIST dataset. As predicted: “This was a significant project and took you quite a bit of time to complete! I needed to do some of my own research to figure out how to overcome some obstacles on the way”.
After all the previous activities around gradient descent, the actual mechanics of what needed to be done were not too difficult. Nonetheless, I found the competition to be hard, because of the actual technicalities of the python implementation. Put differently, I think I could have easily written a good specification on how to solve the MNIST competition, but actually doing it yourself is a different thing.
Seemingly simple tasks like converting the csv-files to images, converting a PIL image to a Fast.AI PIL image, or getting the tensors in the right shape took me some time to implement in python. I am still struggling with python as a language but I am seeing good progress, and the only way to improve is to keep coding.
Wrapping-up Lesson 3
While I could improve the results of my projects, both for Titanic and MNIST, it feels like it would be some way over-optimizing. I did not enter the competitions to win, but to learn about gradient descent. Having spent the last 8 weeks with my lesson 3-projects (and allowing myself to get somewhat side-tracked), I feel it is time to move on to the next lesson. I am looking forward to the next challenging projects!