Consider a bi-variate quadratic function
𝑔
(
𝑤
)
=
𝑤
𝑇
𝐶
𝑤
, where
𝑤
=
[
𝑤
1

𝑤
2
]
, and
𝐶
=
[
0.01

0

0

12
]
.

Using RMSprop, please provide the direction vector
𝑤
𝑘
given that
𝑤
𝑘
−
1
=
[
1

1
]
. Assume the step-length parameter is 0.1, the turning parameter is 0.9, and all the exponential averages used in RMSprop at
𝑘
−
2
are 0.

Question

Consider a bi-variate quadratic function 
𝑔
(
𝑤
)
=
𝑤
𝑇
𝐶
𝑤
, where 
𝑤
=
[
𝑤
1

𝑤
2
]
, and 
𝐶
=
[
0.01
	
0

0
	
12
]
.

Using RMSprop, please provide the direction vector 
𝑤
𝑘
 given that 
𝑤
𝑘
−
1
=
[
1

1
]
. Assume the step-length parameter is 0.1, the turning parameter is 0.9, and all the exponential averages used in RMSprop at 
𝑘
−
2
 are 0.

BlackTom AI · Accepted Answer

We start by restating the problem setup and the quantities we will use to compute the RMSprop update.
- The bi-variate quadratic function is g(w) = w^T C w with w = [w1, w2]^T and C = [[0.01, 0], [0, 12]]. Since C is symmetric, the gradient is ∇g(w) = (C + C^T) w = 2 C w.
- With w_{k-1} = [1, 1]^T, we compute the gradient at k-1: ∇g(w_{k-1}) = 2 C w_{k-1} = 2 * [[0.01, 0], [0, 12]] * [1, 1]^T = [0.02, 24]^T.
- The RMSprop update uses v_t which accumulates squared gradients: v_t = β v_{t-1} + (1 − β) g_t^2 (elementwise), and w_t = w_{t-1} − α g_t / sqrt(v_t).
- Given that at k−2 the exponential averages are 0, we first compute v_{k-1} using g_{k-1} = ∇g(w_{k-1}) = [0.02, 24]. With β = 0.9 and α = 0.1:
  v_{k-1} = 0.9 * 0 + 0.1 * (g_{k-1} ∘ g_{k-1}) = [0.1*(0.02)^2, 0.1*(24)^2] = [0.00004, 57.6].
- Next, we compute v_k using g_k = ∇g(w_{k-1}) (as is common when the update is defined at step k based on w_{k-1}):
  v_k = 0.9 * v_{k-1} + 0.1 * (g_k ∘ g_k) = 0.9 * [0.00004, 57.6] + 0.1 * [0.02^2, 24^2] = [0.000036 + 0.00004, 51.84 + 57.6] = [0.000076, 109.44].
- Now take the square roots to form the denominator for the update: sqrt(v_k) = [sqrt(0.000076), sqrt(109.44)] ≈ [0.00872, 10.462].
- The step length is α = 0.1, so the actual update amounts are α g_k / sqrt(v_k) = 0.1 * [0.02/0.00872, 24/10.462] ≈ [0.2293, 0.2295].
- Subtract these from w_{k-1} to obtain w_k: w_k ≈ [1 − 0.2293, 1 − 0.2295] ≈ [0.7707, 0.7705].

Now evaluate each candidate option:
- Option 1: w_k = [0.999, 0.76]. This option would imply an update where the first component barely moved from 1 to 0.999 (a change of about −0.001) while the second component moved from 1 to 0.76 (a change of −0.24). Given the gradient and the RMSprop scaling computed above, the first component should have a substantially larger adjustment (the gradient component there is 0.02, scaled by a relatively small v_k value), and the second component also experiences a sizable adjustment, but not to as extreme as −0.24 from 1. Hence this option is inconsistent with the calculated step magnitudes.
- Option 2: None of these options are correct. This choice asserts that none of the provided vectors matches the RMSprop update under the given data. Since the computed w_k ≈ [0.7707, 0.7705], this option would be the one that aligns with a mismatch between the computed update and the listed candidates. However, we must assess all options before deciding if this is truly the case.
- Option 3: w_k = [0.998, −1.4]. This would require a large downward move in the second component to a negative value, which is not supported by the computed update that yields a positive second component around 0.77. The magnitude and sign do not align with the actual calculation.
- Option 4: w_k = [0.994, −6.590]. Similar to option 3, this implies a dramatic negative shift in the second component, inconsistent with the computed gradient-driven update and the RMSprop scaling.
- Option 5: w_k = [0.610, −1.915]. This also suggests a strong negative shift in the second component, which contradicts the positive second component produced by the RMSprop step above.

In summary, the RMSprop update using the provided parameters and w_{k-1} yields a first component around 0.771 and a second component around 0.771, with both components remaining positive and showing modest reductions from 1.0. All of the listed options either place one or both components in regions (near 0.77) that do not match exactly due to rounding, or place the second component at negative values, which is inconsistent with the computed direction and step idea in this setup.

Consequently, there is a mismatch between the calculated w_k and every listed option, indicating that none of the given vectors exactly matches the RMSprop update under the stated assumptions.

25S-STATS-102B-LEC-3 S25 Midterm Exam- Requires Respondus LockDown Browser

查看解析

登录即可查看完整答案

类似问题

Which of the following statements about gradient descent and learning rate is true?

Which statement is correct?

在梯度下降中如何更新参数？How do we update the parameters in gradient descent?

Which of the following best describes the role of the gradient in gradient descent?

Question at position 20 What is gradient descent?A way to calculate the slope of a loss function A regularization techniqueA method to normalize network activationsAn optimization algorithm to find local minimum of a function

Which statement best describes the use of gradient descent in linear classification?

更多留学生实用工具

考试浏览器助手

风格化写作助手

论文查重助手

文献引用助手

课堂转译助手

课堂笔记助手

Quiz搜索助手

学校历年真题

智能刷题助手

智能匹配练习

希望你的学习变得更简单