Linear Algebra
Define a function dot_product(v, w) that takes two equal-length lists of
numbers v and w and returns their dot product.
Criteria
The dot product is defined as the sum of the products of the corresponding
elements of the two sequences. For example, the dot product of [1, 2, 3]
and [4, 5, 6] is 1*4 + 2*5 + 3*6 = 32.
Define a function matrix_multiply(A, B) that takes two matrices A and B
and returns their product.
Criteria
The matrices A and B are represented as lists of lists, where each inner
list represents a row of the matrix. The number of columns in A must equal
the number of rows in B. The resulting matrix should have dimensions
equal to the number of rows in A and the number of columns in B.
Define a function transpose(M) that takes a matrix M and returns its
transpose.
Criteria
The matrix M is represented as a list of lists, where each inner list
represents a row of the matrix. The transpose of M is obtained by
swapping its rows and columns.
The softmax function is defined as follows on vector inputs:
\[\softmax{\vect{s},\alpha} = \vect{p} \text{, with } \vect{p}_i = \frac{e^{\alpha s_i}}{\sum_{j} e^{\alpha s_j}}\]where \(\vect{s}\) is a vector of scores, \(\alpha\) is the softmax parameter.1
Implement the softmax function in Python using numpy.
Notes
-
In computer implementations the temperature parameter is not handled by the softmax function itself. So write a “bare” softmax function and handle the temperature in your calls.
-
To improve numerical stability, it is common to normalize the logits by subtracting the maximum logit value from each logit before applying the exponential function. This helps prevent overflow issues when dealing with large logit values. Once you are done with your solution, integrate this functionality as well.
Library
from scipy.special import softmax
Extend your previous implementation of the softmax function to handle
multi-dimensional inputs. Your function should take an additional argument
axis that specifies the axis along which to compute the softmax. If the user
does not provide an axis, compute the softmax over the entire input. Remember
that axis=-1 means the last axis for numpy arrays.
-
In some contexts the inverse of the softmax parameter \(\alpha\) is used and it is called the “temperature” parameter \(\tau = 1/\alpha\). ↩