Questions
Questions
Multiple choice

Question at position 3 Here’s a snippet from a hypothetical study: “Our dataset comprised 100,000 healthy individuals with blood markers, cognitive performance measures and clinical diagnosis (“Alzheimer’s disease” or “cognitively normal”). The number of individuals was well-balanced between the two diagnostic categories. Missing data was filled in using an iterative procedure in which missing values were predicted using existing values. The dataset was then divided into training and test sets in and 80%:20% ratio. We train a machine learning model in the training set to predict whether an individual has Alzheimer’s disease or is cognitively normal based on the individual’s blood markers and cognitive performance measures. Cross-validation was performed in the training set to obtain the best hyperparameters. The model was then retrained using the best hyperparameters in the training set, and then evaluated in the test set, yielding 90% sensitivity and 90% specificity.” Which of the following is correct? Select all correct answers below. There can be more than one correct answer. There is a penalty for selecting the wrong answer.The reported prediction performance is likely to be an under-estimate of true accuracyThe hyperparameter selection procedure is valid The hyperparameter selection procedure is invalidThe reported prediction performance is likely to be an over-estimate of true accuracyBlackTom Analysis

Options
A.The reported prediction performance is likely to be an under-estimate of true accuracy
B.The hyperparameter selection procedure is valid
C.The hyperparameter selection procedure is invalid
D.The reported prediction performance is likely to be an over-estimate of true accuracy
View Explanation

View Explanation

Verified Answer
Please login to view
Step-by-Step Analysis
Re-state the scenario in your own words: a large dataset of 100,000 healthy individuals with biomarkers and cognitive measures was split 80/20 into training and test sets. Missing data were filled via an iterative imputation method. Within the training set, cross-validation was used to select hyperparameters, then the model was retrained on the training data with those best hyperparameters and finally evaluated on the held-out test set, reporting 90% sensitivity and 90% specificity. Option 1: The reported prediction performance is likely to be an under-estimate of true accuracy This option would imply that the model’s perf......Login to view full explanation

Log in for full answers

We've collected over 50,000 authentic exam questions and detailed explanations from around the globe. Log in now and get instant access to the answers!

More Practical Tools for Students Powered by AI Study Helper

Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!