Shortest solutions for CS231n 2021-2024
Convolutional Neural Networks for Visual Recognition
Stanford - Spring 2021-2023
These are my solutions for the CS231n course assignments offered by Stanford University (Spring 2021). Solutions work for further years like 2022, 2023. Inline questions are explained in detail, the code is brief and commented (see examples below). From what I investigated, these should be the shortest code solutions (excluding open-ended challenges). In assignment 2, DenseNet is used in PyTorch notebook and ResNet in TensorFlow notebook.
Check out the solutions for CS224n. They contain more comprehensive explanations than others.
It is advised to run in Colab, however, you can also run locally. To do so, first, set up your environment - either through conda or venv. It is advised to install PyTorch in advance with GPU acceleration. Then, follow the steps:
pip install -r requirements.txt
.ipynb
files to:
%cd cs231n/datasets/
!bash get_datasets.sh
%cd ../../
%cd cs231n
!python setup.py build_ext --inplace
%cd ..
I've gathered all the requirements for all 3 assignments into one file requirements.txt so there is no need to additionally install the requirements specified under each assignment folder. If you plan to complete TensorFlow.ipynb, then you also need to additionally install Tensorflow.
Note: to use MPS acceleration via Apple M1, see the comment in #4.
It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable
Your Answer
1D
case, we can define it as follows ( $\hat{y}$ - score, $i$ - any class, $c$ - correct class, $\Delta$ - margin):
$$f(x)=\max(0, x),\ \text{ where } x=\hat{y}_i-\hat{y}_c+\Delta$$
Let's now see how our $\max$ function fits the definition of computing the gradient. It is the formula we use for computing the gradient numerically when, instead of implementing the limit approaching to $0$, we choose some arbitrary small $h$:
$$\frac{df(x)}{dx}=\lim_{h \to 0}\frac{\max(0,x+h)-\max(0,x)}{h}$$
Now we can talk about the possible mismatches between numeric and analytic gradient computation:
0
. Analytic computation produces an exact result (as precise as computation precision allows) while numeric solution only approximates the result.
1D
example of numeric gradient fail
0
. However, if we choose our $h=10^{-8}$, then the numeric computation would yield 0.9
.
0
(when $i\ne c$). But that still means, if we add $\Delta$, there is the same chance for $x$ to result on the edge.
def conv_forward_naive(x, w, b, conv_param):
"""A naive implementation of the forward pass for a convolutional layer.
The input consists of N data points, each with C channels, height H and
width W. We convolve each input with F different filters, where each filter
spans all C channels and has height HH and width WW.
Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.
During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
along the height and width axes of the input. Be careful not to modfiy the original
input x directly.
Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
out = None
###########################################################################
# TODO: Implement the convolutional forward pass. #
# Hint: you can use the function np.pad for padding. #
###########################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
P1 = P2 = P3 = P4 = conv_param['pad'] # padding: up = right = down = left
S1 = S2 = conv_param['stride'] # stride: up = down
N, C, HI, WI = x.shape # input dims
F, _, HF, WF = w.shape # filter dims
HO = 1 + (HI + P1 + P3 - HF) // S1 # output height
WO = 1 + (WI + P2 + P4 - WF) // S2 # output width
# Helper function (warning: numpy version 1.20 or above is required for usage)
to_fields = lambda x: np.lib.stride_tricks.sliding_window_view(x, (WF,HF,C,N))
w_row = w.reshape(F, -1) # weights as rows
x_pad = np.pad(x, ((0,0), (0,0), (P1, P3), (P2, P4)), 'constant') # padded inputs
x_col = to_fields(x_pad.T).T[...,::S1,::S2].reshape(N, C*HF*WF, -1) # inputs as cols
out = (w_row @ x_col).reshape(N, F, HO, WO) + np.expand_dims(b, axis=(2,1))
x = x_pad # we will use padded version as well during backpropagation
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
###########################################################################
# END OF YOUR CODE #
###########################################################################
cache = (x, w, b, conv_param)
return out, cache