Questions
Questions

2254 BIOSC 1544 SEC1000 Exam #2

Single choice

Consider the following Python code, which implements both regression and classification models to predict molecular activity: import random import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.ensemble import RandomForestClassifier random.shuffle(compound_data) split_idx = int(len(compound_data) * 0.6) train_set, test_set = compound_data[:split_idx], compound_data[split_idx:] features = ["logP", "num_hbd", "num_hba", "mw", "num_rotatable_bonds"] target_reg = "pKi" target_cls = "status"  # "Active" or "Inactive" X_train_reg = [[mol[feat] for feat in features] for mol in train_set] y_train_reg = [mol[target_reg] for mol in train_set] X_test_reg = [[mol[feat] for feat in features] for mol in test_set] y_test_reg = [mol[target_reg] for mol in test_set] X_train_cls = X_train_reg  # Using same features as regression y_train_cls = [1 if mol[target_cls] == "Active" else 0 for mol in train_set] X_test_cls = X_test_reg y_test_cls = [1 if mol[target_cls] == "Active" else 0 for mol in test_set] reg_model = LinearRegression() reg_model.fit(X_train_reg, y_train_reg) print("R² on test set:", reg_model.score(X_test_reg, y_test_reg)) cls_model = RandomForestClassifier() cls_model.fit(X_train_cls, y_train_cls) print("Accuracy on test set:", cls_model.score(X_test_cls, y_test_cls)) Based on this code, which of the following statements is most accurate?

Options
A.A model like cls_model is preferred if the data set contains only active compounds, but no inactives.
B.A model like reg_model is better than one like cls_model because it provides more detailed predictions rather than just a binary label.
C.A model like reg_model is preferred when predicting continuous activity values (e.g., pKi), while a model like cls_model is useful when distinguishing between active and inactive compounds.
D.cls_model is better than reg_model because accuracy is easier to interpret than R².
View Explanation

View Explanation

Verified Answer
Please login to view
Step-by-Step Analysis
We start by restating the problem setup in our own words to ground the discussion: the code trains two models on molecular data using the same feature set, but for different targets and purposes — a regression model (LinearRegression) to predict a continuous activity value (pKi), and a classification model (RandomForestClassifier) to predict a binary status (Active/Inactive). The question asks which statement best describes when to use each type of model given this setup. Option 1: "A model like cls_model is preferred if the data set contains only active compounds, but no inactives." This is flawed because a classifier requires examples of both classes to learn a decision boundary; if there are no inactives, the model cannot learn what distinguishes Active from Inactive, and ......Login to view full explanation

Log in for full answers

We've collected over 50,000 authentic exam questions and detailed explanations from around the globe. Log in now and get instant access to the answers!

More Practical Tools for Students Powered by AI Study Helper

Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!