ExploreDatabase – Your one-stop study guide for interview and semester exam preparations with solved questions, tutorials, GATE MCQs, online quizzes and notes on DBMS, Data Structures, Operating Systems, AI, Machine Learning and Natural Language Processing.
✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Model Validation in Machine Learning – 10 HOT MCQs with Answers | Cross-Validation, Hold-Out & Nested CV Explained
1. A data scientist performs 10-fold cross-validation and reports 95% accuracy. Later, they find that data preprocessing was applied after splitting. What does this imply?
A. Accuracy is still valid
B. Accuracy may be optimistically biased
C. Folds were too small
D. It prevents data leakage
Answer: B
Explanation:Preprocessing after splitting can leak info from validation folds into training folds, inflating accuracy. That is, preprocessing after splitting can systematically overestimate model performance due to data leakage.
When data preprocessing—such as scaling, normalization, or feature selection—is applied after splitting (i.e., on the entire dataset before dividing into folds), information from the validation/test set can inadvertently leak into the training process. This leakage inflates the measured performance, causing results like the reported 95% accuracy to be higher than what the model would achieve on truly unseen data. This is a well-known issue in cross-validation and machine learning validation.
Correct procedure of data preprocessing in cross-validation
Proper practice is to split the data first, then apply preprocessing separately to each fold to avoid biasing results.
For each fold:
Split → Training and Validation subsets
Fit preprocessing only on training data
Transform both training and validation sets
Train model
Evaluate
2. Which validation strategy most likely overestimates model performance?
A. Nested cross-validation
B. Random train/test split without stratification
C. Cross-validation on dataset used for feature selection
D. Stratified k-fold
Answer: C
Explanation:Feature selection before CV leaks validation data info, inflating scores.
If you perform feature selection on the entire dataset before cross-validation, the model
has already “seen” information from all samples (including what should be test data).
This causes data leakage,
which makes accuracy look higher than it truly is,
hence the performance is overestimated.
More explanation:
This happens because when feature selection is carried out on the entire dataset before performing cross-validation, information from test folds leaks into the training process. This makes accuracy estimates unrealistically high and not representative of unseen data.
Feature selection should always be nested inside the cross-validation loop — i.e., done within each training subset.
3. After tuning using 5-fold CV, how should you report final accuracy?
A. CV average
B. Retrain on full data and test on held-out test set
C. Best fold score
D. Validation score after tuning
Answer: B Explanation: Always test final tuned model on unseen test set.
The correct way to report final accuracy after tuning using 5-fold cross-validation (CV) is to retrain the model on the full dataset with the best found hyperparameters and then evaluate it on a held-out test set.
What is 5-fold cross validation?
5-fold cross-validation is a technique used to estimate model performance and tune hyperparameters during the development phase. It divides the training data into 5 folds, trains the model 5 times (each time using 4 folds for training and 1 fold for validation), and averages the results to get a more robust performance estimate.
Steps
5-fold CV is used to estimate the model's performance and tune hyperparameters without overfitting to the training data.
The accuracy scores obtained from each fold are averaged (CV average) to estimate expected model performance, but these scores are based on training subsets and cannot be considered final.
After selecting the best hyperparameters from CV, you typically retrain the model on the entire training data to leverage all available data.
The final accuracy should then be reported based on evaluation on a separate held-out test set that was not used in any training or validation to provide an unbiased estimate of performance.
Why Retrain on Full Data while using cross-validation?
Training on all available data maximizes the information the model learns from and typically results in better performance than models trained on only 80% of the training data.
Why Use a Held-Out Test Set?
A separate test set ensures you have an unbiased estimate of how the model will perform on truly unseen data. If you report CV scores as final accuracy, you're reporting performance on data that was used (indirectly) in tuning decisions, which can lead to optimistic estimates.
4. Why might Leave-One-Out CV lead to high variance?
A. Too little training data
B. Needs resampling
C. Fold too large
D. Almost all data used for training
Answer: D Explanation: Small change in one sample affects result → high variance.
What is LOOCV?
In Leave-One-Out Cross-Validation (LOOCV), we use n folds, where n = number of samples.
For each iteration, we train the model on n − 1 samples and test it on the single remaining sample.
This is repeated n times and the results are averaged.
Why is it high variance?
Each training set is almost the same with only one sample changing between folds. That means the model sees nearly all the data each time, so each trained model is very similar, but each test case (the one left-out point) can cause a big swing in the error if the model slightly mispredicts it.
As a result, the estimated performance for each fold fluctuates heavily depending on which single observation is left out. When you average them, the mean may still vary a lot between datasets — hence high variance in the performance estimate.
Example
Suppose you have a dataset of 100 SUVs (Toyota Fortuner, Volkswagen Taigun, Mercedes-Benz GLS, etc.).
You train your model 99 times, each time leaving one SUV out as the test case.
For the first run you train on 99 SUVs and test on 1 (say, a Range Rover Evoque), then repeat for all 100 cars so every car gets to be the “left-out” test case once.
Most SUVs in your dataset might be mid-range models (₹20–30 lakhs), but a few might be luxury SUVs (like a Range Rover at ₹90 lakhs).
Because LOOCV tests on just one car at a time, if that single car happens to be a rare or unusual model (e.g., the only electric SUV), or has outlier features (very high horsepower, unique brand, etc.), the model trained on the other 99 cars may not generalize well to that one.
That single prediction will produce a large error, which strongly affects that fold’s test score. The average of these highly variable fold scores becomes unstable — small changes in the dataset (or presence/absence of a few outliers) can lead to large changes in the reported CV score.
5. When should Time Series CV be used?
A. Independent samples
B. Predicting future from past
C. Imbalanced data
D. Faster training
Answer: B
Explanation:
Time Series CV preserves temporal order to avoid lookahead bias.
Use Time Series Cross-Validation when the data have a temporal order,
and you want to predict future outcomes from past patterns without data leakage.
Time Series Cross-Validation (TSCV) is used when data points are ordered over time — for example, stock prices, weather data, or sensor readings.
The order of data matters.
Future values depend on past patterns.
You must not shuffle the data, or it will leak future information.
Unlike standard k-fold cross-validation, TSCV respects the chronological order and ensures that the model is trained only on past data and evaluated on future data, mimicking real-world forecasting scenarios.
6. Performing many random 80/20 splits and averaging accuracy is called:
A. Bootstrapping
B. Leave-p-out
C. Monte Carlo Cross-Validation
D. Nested CV
Answer: C
Explanation:Monte Carlo validation averages performance over multiple random splits.
Monte Carlo Cross-Validation (also known as Repeated Random Subsampling Validation) involves randomly splitting the dataset into training and testing subsets multiple times (e.g., 80% training and 20% testing).
The model is trained and evaluated on these splits repeatedly, and the results (such as accuracy) are averaged to estimate the model's performance.
This differs from k-fold cross-validation because the splits are random and may overlap — some data points might appear in multiple test sets or not appear at all in some iterations.
When is Monte Carlo Cross-Validation useful?
You have limited data but want a more reliable performance estimate.
You want flexibility in training/test split sizes.
The dataset is large, and full k-fold CV is too slow.
You don’t need deterministic folds.
The data are independent and identically distributed (i.i.d.).
7. Model performs well in CV but poorly on test set. Why?
A. Too many folds
B. Overfitting during tuning
C. Underfitted model
D. Large test set
Answer: B
Explanation:
Repeated tuning on the same cross-validation folds can cause overfitting to validation data.
A model that performs well in cross-validation but poorly on the test set is often
overfitted to the validation folds.
It has learned fold-specific noise or patterns that don’t generalize to unseen data.
When a model performs well on cross-validation (CV) but poorly on the test set, it is often
because the model has overfitted to the validation data during hyperparameter tuning.
In CV, the model and hyperparameters are repeatedly adjusted to optimize performance on
the validation folds. This can lead to a model that is too closely tailored to the
specific folds used in CV, capturing noise or patterns that do not generalize outside those folds.
How overfitting might be caused in cross-validation?
We use cross-validation (CV) to choose hyperparameters (e.g., best learning rate, number of trees, etc.).
Each time, we train and validate the model on different CV folds, and we pick the hyperparameters that give the best CV score.
Because many hyperparameter combinations are tried, the final set may end up accidentally tuned to the specific folds used in CV rather than the true data pattern.
When we finally test the model on a completely unseen test set, performance drops — showing that the CV score was over-optimistic.
8. Which gives most reliable generalization estimate with extensive tuning?
A. Single 80/20 split
B. Nested CV
C. Stratified 10-fold
D. Leave-One-Out
Answer: B
Explanation:Nested CV separates tuning and evaluation, avoiding bias. When you perform extensive hyperparameter tuning, use Nested Cross-Validation to get the most reliable, unbiased estimate of true generalization performance.
How does Nested CV handle optimistic bias?
In standard cross-validation, if the same data is used both to tune hyperparameters and to estimate model performance, it can lead to an optimistic bias. That is, the model "sees" the validation data during tuning, which inflates performance estimates but does not truly represent how the model will perform on new unseen data.
Nested CV solves this by separating the tuning and evaluation processes into two loops:
Inner loop: Used exclusively to tune the model's hyperparameters by cross-validation on the training data.
Outer loop: Used to evaluate the generalized performance of the model with the tuned hyperparameters on a held-out test fold that was never seen during the inner tuning.
This structure ensures no data leakage between tuning and testing phases, providing a less biased, more honest estimate of how the model will perform in real-world scenarios.
When to use Nested Cross-Validation?
Nested CV is computationally expensive. It is recommended especially when you do extensive hyperparameter optimization to avoid overfitting in model selection and get a realistic estimate of true model performance.
9. Major advantage of k-fold CV over simple hold-out?
A. Ensures higher accuracy
B. Eliminates overfitting
C. Uses full dataset efficiently
D. Requires less computation
Answer: C Explanation: k-fold cross-validation allows each data sample to serve as both training and validation data,
making efficient use of the entire dataset.
Other advantages include:
Provides a more reliable estimate of model performance.
Reduces variance in model evaluation compared to a single train/test split.
Ensures better use of limited data when datasets are small.
10. What best describes the purpose of model validation?
A. Improve training accuracy
B. Reduce dataset size
C. Reduce training time
D. Measure generalization to unseen data
Answer: D Explanation: Validation estimates generalization performance before final testing.
Top Machine Learning MCQs with Answers | AI, Data Science & Python Interview Questions
Introduction:
Welcome to the complete index of Machine Learning MCQs with Answers — your one-stop resource for quick revision, interview preparation, and AI certification practice. This page organizes topic-wise MCQs on essential concepts such as Python for Data Science, Supervised and Unsupervised Learning, Support Vector Machines (SVM), Decision Trees, Deep Learning, Regression, Feature Selection, and Model Evaluation. Whether you are preparing for a Machine Learning interview, pursuing a Data Science certification course, or exploring online AI training, these quizzes will strengthen your theoretical and practical knowledge. Bookmark this page for continuous updates and new question sets covering the latest AI, SQL, and Python optimization techniques.
✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Top 10 New MCQs on SVM Concepts (2025 Edition) | Explore Database
Top 10 New MCQs on SVM Concepts (2025 Edition)
1. Which of the following best describes the margin in an SVM classifier?
A. Distance between two closest support vectors
B. Distance between support vectors of opposite classes
C. Distance between decision boundary and the nearest data point of any class
D. Width of the separating hyperplane
×
Answer: C
Explanation:The margin is the perpendicular distance from the decision boundary to the closest data point (support vector). SVM aims to maximize this margin.
2. In soft-margin SVM, the penalty parameter C controls what?
A. The kernel function complexity
B. The balance between margin width and classification errors
C. The learning rate during optimization
D. The dimensionality of transformed space
×
Answer: B
Explanation:Parameter C determines how much misclassification is tolerated. A large C → fewer violations, smaller margin; a small C → allows more violations, larger margin.
A Soft Margin SVM is a type of Support Vector Machine that allows some misclassification or margin violations in order to achieve better generalization when data is not perfectly linearly separable. In simple words, Soft Margin SVM finds the best possible separating hyperplane that balances maximum margin and minimum classification error.
Overfitting vs Underfitting: Large C - risk of overfitting, small C - risk of underfitting.
3. Which of the following statements about the kernel trick in SVM is true?
A. It explicitly computes higher-dimensional feature mappings
B. It avoids computing transformations by using inner products in the feature space
C. It can only be applied to linear SVMs
D. It reduces the number of support vectors required
×
Answer: B
Explanation:The kernel trick enables SVMs to work in high-dimensional spaces without explicitly computing the transformed features. It uses kernel functions to calculate inner products in that space, making non-linear separation computationally efficient.
What is kernel trick?
A Soft Margin SVM is a type of Support Vector Machine that allows some misclassification or margin violations in order to achieve better generalization when data is not perfectly linearly separable. In simple words: Soft Margin SVM finds the best possible separating hyperplane that balances maximum margin and minimum classification error.
4. Which step is unique to non-linear SVMs?
A. Feature normalization
B. Slack variable introduction
C. Kernel trick application
D. Margin maximization
×
Answer: C
Explanation:The kernel trick allows mapping non-linearly separable data into a higher-dimensional space without explicitly computing transformations — used only in non-linear SVMs.
Why kernel trick is unique to non-linear SVMs only?
The kernel trick is used only in non-linear SVMs because linear SVMs already work directly in the input space — there’s no need to map the data to a higher-dimensional space to make it linearly separable.
The kernel trick is used only in non-linear SVMs because linear SVMs already find a separating hyperplane in the original space, whereas non-linear data needs implicit mapping to higher dimensions.
In other words:
• Linear SVM → data is already separable by a straight hyperplane → no mapping needed.
• Non-linear SVM → data is not separable in the original space → kernel trick is applied to find a linear boundary in a higher-dimensional feature space.
5. If the data is perfectly linearly separable, what is the ideal value of C?
A. Very small (close to 0)
B. Moderate (around 1)
C. Very large (→ ∞)
D. Exactly equal to margin value
×
Answer: C
Explanation: A large C ensures no margin violations
(hard-margin SVM), which is suitable when data is perfectly separable.
What are the problems with very large C
value?
The
SVM with very large C value will try to classify
every training point correctly, no matter how narrow or overfitted the
margin becomes. This lead to;
Very small margin (The model
sacrifices margin width to perfectly fit all training points.)
Overfitting risk (It fits even
noisy or outlier points, harming generalization to new data.)
Hard margin behavior (the Soft
Margin SVM effectively becomes a Hard Margin SVM, demanding perfect
separation.)
Unstable model (Small
changes in data may cause large shifts in the decision boundary)
6. Which optimization problem does SVM solve during training?
A. Minimization of loss function via gradient descent
B. Maximization of likelihood function
C. Quadratic optimization with linear constraints
D. Linear programming without constraints
×
Answer: C
Explanation:SVM training is a quadratic optimization problem where a convex quadratic function is minimized under linear constraints.
Why the optimization problem is "Quadratic optimization with linear constraints"?
During SVM training, the goal is to find the best separating hyperplane between two classes. That means we need to find w and b such that the classifier f(x) = w⋅x + b correctly separates the classes with the largest possible margin.
The objective is to maximize the margin (2/||w||). To maximize this margin, we need to minimize L2 norm squared of the weight vector w (i.e. ∣∣w∣∣2). So the optimization problem becomes a quadratic function of w (i.e., ∣∣w∣∣2/2 ). Hence the name quadratic optimization.
To ensure correct classification, every data point (xi, yi) must satisfy: yi(w.xi + b) >= 1. This means:
Points of the positive class lie on one side of the hyperplane.
Points of the negative class lie on the other side, at least one margin unit away.
These constraints are linear in terms of w and b
.
7. What is the primary reason for using a kernel function in SVM?
A. To increase training speed
B. To handle non-linear relationships efficiently
C. To reduce the number of features
D. To minimize overfitting automatically
×
Answer: B
Explanation:Kernels implicitly project input data into a higher-dimensional space where linear separation becomes possible for non-linear relationships.
8. In SVM, support vectors are:
A. All training samples
B. Only samples lying on the margin boundaries
C. Samples inside the margin or misclassified
D. Both B and C
×
Answer: D
Explanation:Support vectors are data points that either lie on the margin boundaries or violate the margin (inside it or misclassified).
Some points lie exactly on the margin boundary. These are support vectors on the margin.
Some points may lie inside the margin or even be misclassified (in soft margin SVM). These are support vectors violating the margin.
9. When the gamma (γ) parameter of an RBF kernel is too high, what typically happens?
A. The decision boundary becomes smoother
B. Model generalizes better
C. Model overfits by focusing on nearby points
D. Model underfits with large bias
×
Answer: C
Explanation:High γ makes each point’s influence very localized, leading to an overly complex boundary and overfitting.
What is gamma (γ) parameter?
γ (gamma) is a hyperparameter that controls how quickly similarity decreases with distance.
Small γ - Large influence radius (Each data point affects a wide region, leading to smoother, more general decision boundaries)
Large γ - Small influence radius (Each data point affects only nearby points, leading to tighter, more complex decision boundaries)
.
10. Which of the following metrics is most relevant for evaluating SVM on imbalanced datasets?
A. Accuracy
B. Precision and Recall
C. Log-loss
D. Margin width
×
Answer: B
Explanation:On imbalanced data, accuracy is misleading. Precision and recall (or F1-score) better reflect SVM’s real performance.
For deeper understanding, learners can explore machine learning training with placement opportunities or online SVM courses.
✔ Scroll down and test yourself — answers are hidden under the “View Answer” button.
Top 10 Machine Learning Testing Stage MCQs with Answers (2025 Updated)
Top 10 Machine Learning Testing Stage MCQs with Answers (2025 Updated)
1. What is the primary purpose of the testing stage in a machine learning workflow?
A. To tune model hyperparameters
B. To evaluate model performance on unseen data
C. To collect additional labeled data
D. To select the best optimization algorithm
×
Answer: B
Explanation:It helps confirm whether your model has truly learned general patterns — not just memorized the training examples. The testing stage is needed to verify the model’s reliability, fairness, and readiness for deployment ensuring that what you built in training will work in the real world.
2. During testing, why must the test dataset remain untouched during training and validation?
A. It helps speed up model convergence
B. It ensures the model learns from all available data
C. It prevents data leakage and gives an unbiased estimate of performance
D. It improves the model’s interpretability
×
Answer: C
Explanation:The test dataset must remain completely untouched during training and validation because its sole purpose is to measure how well your trained model performs on new, unseen data just like in the real world.
If the test data is used (even indirectly) during training or validation, the model may “learn” patterns or information from it. This is called data leakage, and it causes the model to appear more accurate than it truly is — leading to overestimated performance.
3. If a model performs well on validation data but poorly on test data, what does this most likely indicate?
A. Data leakage in training
B. Overfitting to the validation set
C. Underfitting to the training set
D. Insufficient regularization in test data
×
Answer: B
Explanation:When a model performs well on validation data but poorly on test data, it usually means that the model has overfitted to the validation set. That is, it has learned patterns that are too specific to the validation data, instead of learning general patterns that apply to new, unseen data.
Analogy:Imagine preparing for an exam by practicing only past question papers (validation set). You ace those, but when you get new questions (test set), you struggle because you memorized patterns, not concepts.
Training data: To train the model — adjust weights and learn relationships between features and target.
Validation data: To fine-tune hyperparameters, choose model configurations, and decide when to stop training.
Testing data:To evaluate the final model’s performance on completely unseen data.
4. Which metric is least
A. Precision
B. Recall
C. Accuracy
D. F1-score
×
Answer: C
Explanation:When a classification dataset is imbalanced, meaning one class (say, “negative”) has far more samples than the other (“positive”), then accuracy becomes a misleading metric.
It may show high values even when the model completely fails to detect the minority class.
We may use Precision, Recall, F1-score, or AUC instead for fair evaluation.
Example: Suppose you have a dataset of 10,000 samples with two classes — “Yes” and “No.” If the dataset is imbalanced with 9,900 “Yes” samples and only 100 “No” samples, a model that simply predicts “Yes” for every instance will achieve an accuracy of 99%.
At first glance, that seems excellent — but in reality, the model fails to detect even a single “No” case. This means it completely ignores the minority class, even though the reported accuracy looks perfect.
5. In model evaluation, what does a large difference between training and test accuracy typically indicate?
A. The model is well-calibrated
B. The model is overfitting
C. The model is generalizing well
D. The dataset is balanced
×
Answer: B
Explanation:A large difference between training and test accuracy (especially when training accuracy is much higher) signals overfitting. This means that the model has learned patterns specific to the training data instead of general trends that apply to new data.
Overfitting = Model performs much better on training data than on unseen data. This means the model memorized rather than learned.
Will there be a possibility that test accuracy is much higher than training accuracy?
No. A large gap, where the test accuracy is significantly higher than training accuracy is not possible and if happens that’s a red flag and usually means something is wrong. But a small difference (test slightly higher) is possible and sometimes expected (due to dropout, regularization, or randomness).
6. Which of the following statements about test data is TRUE?
A. Test data should be augmented the same way as training data
B. Test data should be collected after the model is deployed
C. Test data should be used for hyperparameter tuning
D. Test data should come from the same distribution as training data but remain unseen
×
Answer: D
Explanation:Test data should come from the same distribution as training data because,
Generalization - The goal of machine learning is to generalize, i.e., perform well on new data drawn from the same population as the training data. If the test data is from a different distribution, you’re not measuring generalization — you’re measuring domain shift or transfer performance (a different problem). For example, if you train a machine learning model to predict the height of the person using Indian data and test the model with European data the accuracy drops. This drop is not due to bad model but due to the distribution differs (human biological variation).
Fair performance estimation - Using data from the same distribution ensures the test accuracy reflects how the model will behave on future, similar data (i.e., from the same source). If distributions differ, test results may underestimate or overestimate performance — giving a false impression of model quality.
Same distributionensures test data represents the same problem domain.
Remain unseenensures unbiased, realistic evaluation of model generalization.
7. In cross-validation, what plays the role of the test set in each fold?
A. The validation split of each fold
B. The training split of each fold
C. The combined training and validation splits
D. A completely new dataset
×
Answer: A
Explanation:In cross-validation, each fold’s validation split acts as the test set for that round, giving a fair way to test every data point exactly once.
Cross-validation:Cross-validation (often k-fold cross-validation) is a technique to evaluate a model’s performance more reliably, especially when the dataset is small. Instead of having one fixed “train-test” split, cross-validation reuses the data multiple times by dividing it into k parts (called folds).
It is called validation split because in each iteration, the fold that is left out is not used for training. The model is trained on the remaining folds and evaluated on this left-out fold. This left-out fold acts like a test set in that iteration
8. Which evaluation method best simulates real-world testing conditions for time-series models?
A. Random K-fold cross-validation
B. Leave-one-out validation
C. Rolling window validation
D. Stratified sampling
×
Answer: C
Explanation:In time-series problems (example: stock prices by date, weather readings etc.), data points are ordered in time. So, future values depend on past values. This means you can’t randomly shuffle the data or use ordinary k-fold cross-validation (which mixes past and future samples).
Rolling Window Validation (also called Walk-Forward Validation) is designed specifically for time-series models. It simulates how models are used in the real world: The model is trained on past data, Then tested on future data that occurs later in time.
9. Why is the test stage essential before model deployment in real applications?
A. It confirms that the model architecture is optimal
B. It ensures low training loss
C. It verifies generalization ability under unseen scenarios
D. It automatically adjusts hyperparameters
×
Answer: C
Explanation:The test stage is the final evaluation phase of a machine learning workflow. After a model is trained (and tuned using validation data), it’s tested on a completely unseen dataset called the test set.
This stage checks how well the model will perform on new, real-world data that it hasn’t seen during training or validation.
10. What is a common mistake made during the testing phase of ML models?
A. Using standard metrics like RMSE
B. Using separate data splits
C. Measuring inference speed
D. Using test data for model selection
×
Answer: D
Explanation:The most common mistake during the testing phase is using the test data to make modeling decisions (model selection or hyperparameter tuning).
This leads to data leakage and overestimates true performance.
The test phase is the final, unbiased evaluation of your trained model. It measures how well your model generalizes to unseen data. The test set is not supposed to influence the model in any way.
Model selection means deciding on which model architecture to use (e.g., Random Forest vs. Neural Network) and which hyperparameters perform best (e.g., learning rate, number of layers, etc.). This selection process should happen during validation, not testing.
However, a common mistake is: Checking performance on the test set repeatedly while tuning models, and then picking the one that performs best on the test set.