Introduction to Deep Learning · HIT

Prerequisite   Review & refresh

🧮 Basic Machine Learning Concepts

This course assumes an introductory machine-learning course. Deep learning reuses its vocabulary, models, losses, splits, and the overfitting story, so the network material lands on familiar ground.

Supervised learning

Error, cost, and loss

Overfitting and generalization

Regularization

The bias-variance dilemma

Other essentials

Readiness check

Self-check questions

Multiple-choice questions on the topic itself. Pick an answer, then reveal it. If several are unclear, work through the review above first.

1. Predicting a house price (a continuous number) is a:

  1. classification task
  2. regression task
  3. clustering task
  4. ranking task
Show answer
Correct: B. Regression predicts continuous values; classification predicts discrete classes.

2. A typical loss for binary classification is:

  1. mean squared error
  2. binary cross-entropy
  3. R-squared
  4. accuracy
Show answer
Correct: B. Cross-entropy (log loss) matches probabilistic classification; accuracy is a metric, not a loss.

3. The difference between a loss and a metric is:

  1. they are the same
  2. the loss is optimized in training; the metric is the reported measure
  3. the metric is always the loss
  4. the loss is only for testing
Show answer
Correct: B. Training minimizes the differentiable loss; the metric is the human-facing measure, and the two can differ.

4. Overfitting is indicated by:

  1. high training error and high test error
  2. low training error and high test error
  3. low error on both
  4. high error on training only
Show answer
Correct: B. The model memorizes the training data (low train error) but generalizes poorly (high test error).

5. The validation set is used to:

  1. train the weights
  2. tune hyperparameters and select models
  3. report the final result
  4. increase the data size
Show answer
Correct: B. The validation set guides model and hyperparameter choices; the test set is the untouched final measure.

6. k-fold cross-validation:

  1. trains once on all data
  2. splits data into k folds and rotates the validation fold
  3. only works for images
  4. removes the need for a test set
Show answer
Correct: B. Each fold serves as validation once; the results are averaged for a more robust estimate.

7. L2 regularization (weight decay) penalizes:

  1. the sum of absolute weights
  2. the sum of squared weights
  3. the number of layers
  4. the learning rate
Show answer
Correct: B. L2 penalizes squared weight magnitude; L1 penalizes absolute values.

8. Compared with L2, L1 regularization tends to produce:

  1. denser weights
  2. sparser weights (more exact zeros)
  3. larger weights
  4. no effect
Show answer
Correct: B. The L1 penalty drives some weights exactly to zero, giving sparse solutions.

9. The bias-variance trade-off says that:

  1. bias and variance are unrelated
  2. reducing one often increases the other
  3. both always decrease together
  4. variance is always zero
Show answer
Correct: B. Simple models have high bias and low variance; complex models the reverse. The goal balances them.

10. A model that is too simple for the data has:

  1. high variance
  2. high bias (underfitting)
  3. perfect accuracy
  4. too many parameters
Show answer
Correct: B. Underfitting is high bias: the model cannot capture the underlying pattern.

11. Why can accuracy be misleading on imbalanced data?

  1. it cannot be computed
  2. a trivial majority-class predictor can score high
  3. it requires probabilities
  4. it only works for regression
Show answer
Correct: B. With 95% of one class, always predicting that class scores 95% while missing the minority entirely.

12. Precision is defined as:

  1. TP / (TP + FN)
  2. TP / (TP + FP)
  3. (TP + TN) / all
  4. FP / all
Show answer
Correct: B. Precision is true positives over predicted positives; recall is TP / (TP + FN).

13. The F1 score is:

  1. the sum of precision and recall
  2. the harmonic mean of precision and recall
  3. accuracy minus error
  4. the area under the ROC curve
Show answer
Correct: B. F1 = 2 P R / (P + R), balancing precision and recall in one number.

14. A parameter and a hyperparameter differ in that:

  1. they are identical
  2. parameters are learned from data; hyperparameters are set before training
  3. hyperparameters are learned by backprop
  4. parameters are chosen by hand
Show answer
Correct: B. Weights are parameters learned in training; learning rate, depth, etc. are hyperparameters set beforehand.

15. Data leakage is:

  1. losing data files
  2. information from the test set or future leaking into training
  3. a memory error
  4. having too little data
Show answer
Correct: B. For example, scaling with statistics computed over the whole dataset before splitting inflates results.

📚Refresher resources

Refresh
Refresh

← All prerequisites Course home

PreviousPython Foundations & Advanced Features