题目
2254 BIOSC 1544 SEC1000 Exam #2
单项选择题
Consider the following Python code, which implements both regression and classification models to predict molecular activity: import random import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.ensemble import RandomForestClassifier random.shuffle(compound_data) split_idx = int(len(compound_data) * 0.6) train_set, test_set = compound_data[:split_idx], compound_data[split_idx:] features = ["logP", "num_hbd", "num_hba", "mw", "num_rotatable_bonds"] target_reg = "pKi" target_cls = "status" # "Active" or "Inactive" X_train_reg = [[mol[feat] for feat in features] for mol in train_set] y_train_reg = [mol[target_reg] for mol in train_set] X_test_reg = [[mol[feat] for feat in features] for mol in test_set] y_test_reg = [mol[target_reg] for mol in test_set] X_train_cls = X_train_reg # Using same features as regression y_train_cls = [1 if mol[target_cls] == "Active" else 0 for mol in train_set] X_test_cls = X_test_reg y_test_cls = [1 if mol[target_cls] == "Active" else 0 for mol in test_set] reg_model = LinearRegression() reg_model.fit(X_train_reg, y_train_reg) print("R² on test set:", reg_model.score(X_test_reg, y_test_reg)) cls_model = RandomForestClassifier() cls_model.fit(X_train_cls, y_train_cls) print("Accuracy on test set:", cls_model.score(X_test_cls, y_test_cls)) Based on this code, which of the following statements is most accurate?
选项
A.A model like cls_model is preferred if the data set contains only active compounds, but no inactives.
B.A model like reg_model is better than one like cls_model because it provides more detailed predictions rather than just a binary label.
C.A model like reg_model is preferred when predicting continuous activity values (e.g., pKi), while a model like cls_model is useful when distinguishing between active and inactive compounds.
D.cls_model is better than reg_model because accuracy is easier to interpret than R².
查看解析
标准答案
Please login to view
思路分析
We start by restating the problem setup in our own words to ground the discussion: the code trains two models on molecular data using the same feature set, but for different targets and purposes — a regression model (LinearRegression) to predict a continuous activity value (pKi), and a classification model (RandomForestClassifier) to predict a binary status (Active/Inactive). The question asks which statement best describes when to use each type of model given this setup.
Option 1: "A model like cls_model is preferred if the data set contains only active compounds, but no inactives." This is flawed because a classifier requires examples of both classes to learn a decision boundary; if there are no inactives, the model cannot learn what distinguishes Active from Inactive, and ......Login to view full explanation登录即可查看完整答案
我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。
类似问题
In a consumer society, many adults channel creativity into buying things
Economic stress and unpredictable times have resulted in a booming industry for self-help products
People born without creativity never can develop it
A product has a selling price of $20, a contribution margin ratio of 40% and fixed cost of $120,000. To make a profit of $30,000. The number of units that must be sold is: Type the number without $ and a comma. Eg: 20000
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!