Which statement best contrasts REINFORCE vs Actor-Critic?

Question

BlackTom AI · Accepted Answer

Question: Which statement best contrasts REINFORCE vs Actor-Critic?

Option 1: 'REINFORCE is Monte Carlo; Actor-Critic bootstraps via a learned value function' 
This is accurate. REINFORCE uses Monte Carlo estimates of returns to update the policy directly, relying on complete episodes. Actor-Critic methods, on the other hand, use a critic—typically a value function—that is learned (often via bootstrapping) to provide a baseline or value estimate for updating the actor. The critic’s value function is updated using temporal-difference (TD) learning, which is a bootstrap method.

Option 2: 'REINFORCE bootstraps; Actor-Critic does not' 
This is misleading. REINFORCE does not bootstrap; it uses full-episode returns to update the policy. Actor-Critic explicitly relies on bootstrapped value estimates from the critic to inform the actor, so the statement that Actor-Critic does not bootstrap is incorrect.

Option 3: 'Actor-Critic is model-based' 
While some variants can be model-based, the standard Actor-Critic framework does not require a model of the environment. It typically relies on a learned value function and a policy without a specified environmental model. Saying it is model-based overgeneralizes and is not a defining feature of all Actor-Critic methods.

Option 4: 'Both are purely Monte Carlo' 
This is inaccurate. REINFORCE is Monte Carlo by design, but Actor-Critic uses bootstrapping for the critic’s value estimates, which is not purely Monte Carlo. Therefore, this option mischaracterizes Actor-Critic.

In summary, option 1 correctly highlights the key contrast: REINFORCE uses Monte Carlo returns, while Actor-Critic uses a learned, bootstrapped value function to guide policy updates.

COMP90054_2025_SM2 Practice exam (long answer questions)- Requires Respondus LockDown Browser

Which statement best contrasts REINFORCE vs Actor-Critic?

查看解析

登录即可查看完整答案

类似问题

Which statement best describes the difference between SARSA and Q-learning?

Which of the following best describes a key difference between Monte Carlo and Temporal-Difference (TD) learning?

Select all of the following methods that use bootstrapping to estimate values

Choose all that apply to Reinforcement Learning (RL). I Regression tree algorithms power deep RL. II An RL agent wants to maximize its cumulative reward. III It is an ML paradigm that differs from supervised and unsupervised. IV It mathematically formalized the idea of learning by interactions.

强化学习的重点是什么？What is the focus of reinforcement learning?

更多留学生实用工具

考试浏览器助手

风格化写作助手

论文查重助手

文献引用助手

课堂转译助手

课堂笔记助手

Quiz搜索助手

学校历年真题

智能刷题助手

智能匹配练习

希望你的学习变得更简单