题目
题目

COMP90054_2025_SM2 Supplementary or Special Exam: AI Planning for Autonomy (COMP90054_2025_SM2)- Requires Respondus LockDown Browser

多项选择题

Shown is the Q Actor-Critic (QAC) function, with line numbers. 1. Initialise 𝑠 , 𝜃 2. Sample 𝑎 ∼ 𝜋 𝜃 3. for each step do 4.      Sample reward 𝑟 = 𝑅 𝑠 𝑎 ; sample transition 𝑠 ′ ∼ 𝑃 𝑠 , ⋅ 𝑎 5.      Sample action 𝑎 ′ ∼ 𝜋 𝜃 ( 𝑠 ′ , 𝑎 ′ ) 6.      𝛿 = 𝑟 + 𝛾 𝑄 𝑤 ( 𝑠 ′ , 𝑎 ′ ) − 𝑄 𝑤 ( 𝑠 , 𝑎 ) 7.      𝜃 ← 𝜃 + 𝛼 ∇ 𝜃 𝑙 𝑜 𝑔 𝜋 𝜃 ( 𝑠 , 𝑎 ) 𝑄 𝑤 ( 𝑠 , 𝑎 ) 8.      𝑤 ← 𝑤 + 𝛽 𝛿 𝜙 ( 𝑠 , 𝑎 ) 9.      𝑎 ← 𝑎 ′ , 𝑠 ← 𝑠 ′ 10. end for Which of the following statements is true (can be more than one)?

选项
A.The critic is used to estimate the value function on line 6
B.The actor is used to estimate the value function on line 6
C.Actor parameters are updated on line 7 and critic parameters are updated on line 8
D.Critic parameters are updated on line 7 and actor parameters are updated on line 8
查看解析

查看解析

标准答案
Please login to view
思路分析
First, let’s restate the setup and walk through what each line does in the Q Actor-Critic (QAC) algorithm as given. Line 6 defines δ (the TD error) as r + γ Q_w(s', a') − Q_w(s, a). This uses the critic’s current Q function to evaluate the next state-action pair and compare it to the current estimate, which is precisely how the critic helps update value estimates. Line 7 shows an update to θ with α ∇_θ l......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

类似问题

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!