Prerequisite Review & refresh
🧮 Basic Machine Learning Concepts
This course assumes an introductory machine-learning course. Deep learning reuses its vocabulary, models, losses, splits, and the overfitting story, so the network material lands on familiar ground.
Multiple-choice questions on the topic itself. Pick an answer, then reveal it. If several are unclear, work through the review above first.
1. Predicting a house price (a continuous number) is a:
- classification task
- regression task
- clustering task
- ranking task
Show answer
Correct: B. Regression predicts continuous values; classification predicts discrete classes.
2. A typical loss for binary classification is:
- mean squared error
- binary cross-entropy
- R-squared
- accuracy
Show answer
Correct: B. Cross-entropy (log loss) matches probabilistic classification; accuracy is a metric, not a loss.
3. The difference between a loss and a metric is:
- they are the same
- the loss is optimized in training; the metric is the reported measure
- the metric is always the loss
- the loss is only for testing
Show answer
Correct: B. Training minimizes the differentiable loss; the metric is the human-facing measure, and the two can differ.
4. Overfitting is indicated by:
- high training error and high test error
- low training error and high test error
- low error on both
- high error on training only
Show answer
Correct: B. The model memorizes the training data (low train error) but generalizes poorly (high test error).
5. The validation set is used to:
- train the weights
- tune hyperparameters and select models
- report the final result
- increase the data size
Show answer
Correct: B. The validation set guides model and hyperparameter choices; the test set is the untouched final measure.
6. k-fold cross-validation:
- trains once on all data
- splits data into k folds and rotates the validation fold
- only works for images
- removes the need for a test set
Show answer
Correct: B. Each fold serves as validation once; the results are averaged for a more robust estimate.
7. L2 regularization (weight decay) penalizes:
- the sum of absolute weights
- the sum of squared weights
- the number of layers
- the learning rate
Show answer
Correct: B. L2 penalizes squared weight magnitude; L1 penalizes absolute values.
8. Compared with L2, L1 regularization tends to produce:
- denser weights
- sparser weights (more exact zeros)
- larger weights
- no effect
Show answer
Correct: B. The L1 penalty drives some weights exactly to zero, giving sparse solutions.
9. The bias-variance trade-off says that:
- bias and variance are unrelated
- reducing one often increases the other
- both always decrease together
- variance is always zero
Show answer
Correct: B. Simple models have high bias and low variance; complex models the reverse. The goal balances them.
10. A model that is too simple for the data has:
- high variance
- high bias (underfitting)
- perfect accuracy
- too many parameters
Show answer
Correct: B. Underfitting is high bias: the model cannot capture the underlying pattern.
11. Why can accuracy be misleading on imbalanced data?
- it cannot be computed
- a trivial majority-class predictor can score high
- it requires probabilities
- it only works for regression
Show answer
Correct: B. With 95% of one class, always predicting that class scores 95% while missing the minority entirely.
12. Precision is defined as:
- TP / (TP + FN)
- TP / (TP + FP)
- (TP + TN) / all
- FP / all
Show answer
Correct: B. Precision is true positives over predicted positives; recall is TP / (TP + FN).
13. The F1 score is:
- the sum of precision and recall
- the harmonic mean of precision and recall
- accuracy minus error
- the area under the ROC curve
Show answer
Correct: B. F1 = 2 P R / (P + R), balancing precision and recall in one number.
14. A parameter and a hyperparameter differ in that:
- they are identical
- parameters are learned from data; hyperparameters are set before training
- hyperparameters are learned by backprop
- parameters are chosen by hand
Show answer
Correct: B. Weights are parameters learned in training; learning rate, depth, etc. are hyperparameters set beforehand.
15. Data leakage is:
- losing data files
- information from the test set or future leaking into training
- a memory error
- having too little data
Show answer
Correct: B. For example, scaling with statistics computed over the whole dataset before splitting inflates results.