Define a function dot_product(v, w) that takes two equal-length lists of numbers v and w and returns their dot product.

Criteria

The dot product is defined as the sum of the products of the corresponding elements of the two sequences. For example, the dot product of [1, 2, 3] and [4, 5, 6] is 1*4 + 2*5 + 3*6 = 32.

Define a function matrix_multiply(A, B) that takes two matrices A and B and returns their product.

Criteria

The matrices A and B are represented as lists of lists, where each inner list represents a row of the matrix. The number of columns in A must equal the number of rows in B. The resulting matrix should have dimensions

equal to the number of rows in A and the number of columns in B.

Define a function transpose(M) that takes a matrix M and returns its transpose.

Criteria

The matrix M is represented as a list of lists, where each inner list represents a row of the matrix. The transpose of M is obtained by swapping its rows and columns.

The softmax function is defined as follows on vector inputs:

\[\softmax{\vect{s},\alpha} = \vect{p} \text{, with } \vect{p}_i = \frac{e^{\alpha s_i}}{\sum_{j} e^{\alpha s_j}}\]

where \(\vect{s}\) is a vector of scores, \(\alpha\) is the softmax parameter.1

Implement the softmax function in Python using numpy.

Notes
  • In computer implementations the temperature parameter is not handled by the softmax function itself. So write a “bare” softmax function and handle the temperature in your calls.

  • To improve numerical stability, it is common to normalize the logits by subtracting the maximum logit value from each logit before applying the exponential function. This helps prevent overflow issues when dealing with large logit values. Once you are done with your solution, integrate this functionality as well.

Library
from scipy.special import softmax

Extend your previous implementation of the softmax function to handle multi-dimensional inputs. Your function should take an additional argument axis that specifies the axis along which to compute the softmax. If the user does not provide an axis, compute the softmax over the entire input. Remember that axis=-1 means the last axis for numpy arrays.

  1. In some contexts the inverse of the softmax parameter \(\alpha\) is used and it is called the “temperature” parameter \(\tau = 1/\alpha\).