Which statement is correct?

Question

BlackTom AI · Accepted Answer

Question restatement: Which statement is correct?

Option A: 'Stochastic Gradient Descent (SGD) computes the gradients using the whole training set to update the model parameters once.' This is inaccurate. SGD updates parameters using one or a few samples at a time, not the entire training set, which is characteristic of batch methods. The description more closely matches Batch Gradient Descent in its general idea, but SGD specifically uses stochastic updates rather than the full batch.

Option B: 'Batch Gradient Descent (BGD) computes the gradients using one data point to update the models parameters once.' This is a common point of confusion. In BGD, gradients are computed using the entire training set as one batch before updating parameters, not from a single data point. Therefore, this statement is incorrect because it describes SGD behavior rather than BGD.

Option C: 'Mini-batch Gradient Descent has the most bouncing behavior compared to SGD and BGD.' The phrase 'bouncing behavior' is informal and vague here, but in standard optimization terminology, mini-batch gradient descent balances between the noisiness of SGD and the stability of BGD. It does not inherently have the most bouncing behavior; rather, SGD tends to be noisier and BGD the most stable. So this statement is misleading and not accurate in a formal sense.

Option D: '10 training epochs mean each data point has the opportunity to update the model parameters 10 times.' This statement aligns with the definition of an epoch: one pass over the entire training dataset. If there are 10 epochs, each data point is used to update parameters once per epoch, totaling 10 updates per data point across the training run. This matches standard understanding of epochs in training neural networks.

In summary, the incorrect options hinge on common misconceptions: A confuses SGD with full-batch usage, B swaps the roles of SGD and BGD, and C relies on an imprecise notion of 'bouncing' without a formal basis. Option D accurately describes the epoch concept and how multiple epochs translate to multiple updates per data point, which is consistent with how training loops are typically structured.

BU.330.775.T2.FA25 Final- Requires Respondus LockDown Browser

Which statement is correct?

查看解析

登录即可查看完整答案

类似问题

Which of the following statements about gradient descent and learning rate is true?

在梯度下降中如何更新参数？How do we update the parameters in gradient descent?

Which of the following best describes the role of the gradient in gradient descent?

Question at position 20 What is gradient descent?A way to calculate the slope of a loss function A regularization techniqueA method to normalize network activationsAn optimization algorithm to find local minimum of a function

Which statement best describes the use of gradient descent in linear classification?

How does random initialization help in gradient descent methods?

更多留学生实用工具

考试浏览器助手

风格化写作助手

论文查重助手

文献引用助手

课堂转译助手

课堂笔记助手

Quiz搜索助手

学校历年真题

智能刷题助手

智能匹配练习

希望你的学习变得更简单