Prerequisite Review & refresh
∑ Mathematics
Deep learning is applied linear algebra and calculus with a probabilistic flavor. A mathematician's depth is not required, but these ideas should feel familiar so the course can move quickly from notation to networks.
Multiple-choice questions on the topic itself. Pick an answer, then reveal it. If several are unclear, work through the review above first.
1. If A is 3x4 and B is 4x2, the product AB has shape:
- 4x4
- 3x2
- 2x3
- undefined
Show answer
Correct: B. The inner dimensions (4) match, and the result takes the outer dimensions: 3x2.
2. Matrix multiplication is:
- commutative (AB = BA)
- generally not commutative
- only defined for square matrices
- the same as the elementwise product
Show answer
Correct: B. In general AB does not equal BA; order matters, and the elementwise (Hadamard) product is a different operation.
3. The transpose of a product (AB) equals:
- A B transposed each
- B-transpose times A-transpose
- AB
- BA
Show answer
Correct: B. Transposing a product reverses the order: (AB)^T = B^T A^T.
4. The dot product of two non-zero vectors is zero when they are:
- parallel
- of unit length
- orthogonal (perpendicular)
- equal
Show answer
Correct: C. u . v = |u||v|cos(theta); it is zero when the angle is 90 degrees.
5. The L2 (Euclidean) norm of the vector (3, 4) is:
- 7
- 5
- 12
- 25
Show answer
Correct: B. sqrt(3^2 + 4^2) = sqrt(25) = 5.
6. For a fair six-sided die, the expected value of one roll is:
- 3
- 3.5
- 6
- 1
Show answer
Correct: B. (1+2+3+4+5+6)/6 = 21/6 = 3.5.
7. Variance measures:
- the average value
- the most frequent value
- the spread around the mean
- the maximum value
Show answer
Correct: C. Variance is the expected squared deviation from the mean.
8. Two events A and B are independent when:
- P(A and B) = P(A) + P(B)
- P(A and B) = P(A) P(B)
- they are mutually exclusive
- P(A given B) = 0
Show answer
Correct: B. Independence means the joint probability factorizes into the product of the marginals.
9. Bayes' rule writes P(A given B) in terms of:
- P(B given A), P(A), P(B)
- P(A) + P(B)
- P(A) P(B) only
- P(A - B)
Show answer
Correct: A. P(A|B) = P(B|A) P(A) / P(B).
10. The partial derivative of f(x, y) = x^2 y with respect to x treats:
- both x and y as variables
- y as a constant
- x as a constant
- f as constant
Show answer
Correct: B. A partial derivative w.r.t. x holds the other variables (y) constant, giving 2xy.
11. The gradient of a scalar function points in the direction of:
- steepest descent
- zero change
- steepest ascent
- the x-axis
Show answer
Correct: C. The gradient points toward the greatest rate of increase; its negative is the steepest-descent direction.
12. The chain rule gives the derivative of f(g(x)) as:
- f'(x) g'(x)
- f'(g(x)) times g'(x)
- f'(g(x))
- g'(f(x))
Show answer
Correct: B. Differentiate the outer function at the inner value, times the derivative of the inner function.
13. To minimize a function, gradient descent moves a parameter:
- along the gradient
- opposite the gradient
- perpendicular to the gradient
- randomly
Show answer
Correct: B. It steps in the negative-gradient direction, scaled by the learning rate.
14. If the learning rate is far too large, gradient descent typically:
- converges faster with no downside
- diverges or oscillates
- stops immediately
- ignores the gradient
Show answer
Correct: B. Overshooting the minimum makes the loss oscillate or blow up.
15. A function is convex if:
- it has many local minima
- any local minimum is also global
- its gradient is always zero
- it is always increasing
Show answer
Correct: B. For convex functions every local minimum is global, which makes optimization reliable.