题目
题目

2254 BIOSC 1544 SEC1000 Exam #2

单项选择题

Consider the following Python code, which implements both regression and classification models to predict molecular activity: import random import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.ensemble import RandomForestClassifier random.shuffle(compound_data) split_idx = int(len(compound_data) * 0.6) train_set, test_set = compound_data[:split_idx], compound_data[split_idx:] features = ["logP", "num_hbd", "num_hba", "mw", "num_rotatable_bonds"] target_reg = "pKi" target_cls = "status"  # "Active" or "Inactive" X_train_reg = [[mol[feat] for feat in features] for mol in train_set] y_train_reg = [mol[target_reg] for mol in train_set] X_test_reg = [[mol[feat] for feat in features] for mol in test_set] y_test_reg = [mol[target_reg] for mol in test_set] X_train_cls = X_train_reg  # Using same features as regression y_train_cls = [1 if mol[target_cls] == "Active" else 0 for mol in train_set] X_test_cls = X_test_reg y_test_cls = [1 if mol[target_cls] == "Active" else 0 for mol in test_set] reg_model = LinearRegression() reg_model.fit(X_train_reg, y_train_reg) print("R² on test set:", reg_model.score(X_test_reg, y_test_reg)) cls_model = RandomForestClassifier() cls_model.fit(X_train_cls, y_train_cls) print("Accuracy on test set:", cls_model.score(X_test_cls, y_test_cls)) Based on this code, which of the following statements is most accurate?

选项
A.A model like cls_model is preferred if the data set contains only active compounds, but no inactives.
B.A model like reg_model is better than one like cls_model because it provides more detailed predictions rather than just a binary label.
C.A model like reg_model is preferred when predicting continuous activity values (e.g., pKi), while a model like cls_model is useful when distinguishing between active and inactive compounds.
D.cls_model is better than reg_model because accuracy is easier to interpret than R².
查看解析

查看解析

标准答案
Please login to view
思路分析
We start by restating the problem setup in our own words to ground the discussion: the code trains two models on molecular data using the same feature set, but for different targets and purposes — a regression model (LinearRegression) to predict a continuous activity value (pKi), and a classification model (RandomForestClassifier) to predict a binary status (Active/Inactive). The question asks which statement best describes when to use each type of model given this setup. Option 1: "A model like cls_model is preferred if the data set contains only active compounds, but no inactives." This is flawed because a classifier requires examples of both classes to learn a decision boundary; if there are no inactives, the model cannot learn what distinguishes Active from Inactive, and ......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!