Questions
COMP90054_2025_SM2 Supplementary or Special Exam: AI Planning for Autonomy (COMP90054_2025_SM2)- Requires Respondus LockDown Browser
Multiple choice
Shown is the Q Actor-Critic (QAC) function, with line numbers. 1. Initialise ๐ , ๐ 2. Sample ๐ โผ ๐ ๐ 3. for each step do 4.ย ย ย Sample reward ๐ = ๐ ๐ ๐ ; sample transition ๐ โฒ โผ ๐ ๐ , โ ๐ 5.ย ย ย Sample action ๐ โฒ โผ ๐ ๐ ( ๐ โฒ , ๐ โฒ ) 6.ย ย ย ๐ฟ = ๐ + ๐พ ๐ ๐ค ( ๐ โฒ , ๐ โฒ ) โ ๐ ๐ค ( ๐ , ๐ ) 7.ย ย ย ๐ โ ๐ + ๐ผ โ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ , ๐ ) ๐ ๐ค ( ๐ , ๐ ) 8.ย ย ย ๐ค โ ๐ค + ๐ฝ ๐ฟ ๐ ( ๐ , ๐ ) 9.ย ย ย ๐ โ ๐ โฒ , ๐ โ ๐ โฒ 10. end for Which of the following statements is true (can be more than one)?
Options
A.The critic is used to estimate the value function on line 6
B.The actor is used to estimate the value function on line 6
C.Actor parameters are updated on line 7 and critic parameters are updated on line 8
D.Critic parameters are updated on line 7 and actor parameters are updated on line 8
View Explanation
Verified Answer
Please login to view
Step-by-Step Analysis
First, letโs restate the setup and walk through what each line does in the Q Actor-Critic (QAC) algorithm as given.
Line 6 defines ฮด (the TD error) as r + ฮณ Q_w(s', a') โ Q_w(s, a). This uses the criticโs current Q function to evaluate the next state-action pair and compare it to the current estimate, which is precisely how the critic helps update value estimates.
Line 7 shows an update to ฮธ with ฮฑ โ_ฮธ l......Login to view full explanationLog in for full answers
We've collected overย 50,000 authentic exam questionsย andย detailed explanationsย from around the globe. Log in now and get instant access to the answers!
Similar Questions
The value of an action ๐ ๐ ( ๐ , ๐ ) depends on the expected next reward and the expected value of the next state.ย We can think of this in terms of a small backup diagram, as follows: Let ๐ ( ๐ โฒ | ๐ , ๐ ) be the transition probability and ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) = ๐ธ [ ๐ ๐ก + 1 | ๐ ๐ก = ๐ , ๐ด ๐ก = ๐ , ๐ ๐ก + 1 = ๐ โฒ ] the expected reward for the transion from state ๐ to state ๐ โฒ via action ๐ . Rearrange the definition of ๐ ๐ ( ๐ , ๐ ) in terms of these quantities, such that no expected-value notation appears in the equation. A. ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ ๐ ( ๐ โฒ โฃ ๐ , ๐ ) [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ ๐ ( ๐ โฒ , ๐ ) ] B. ย ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ] ๐ ( ๐ โฒ โฃ ๐ , ๐ ) ๐ฃ ๐ ( ๐ โฒ ) C. ย ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ ๐ ( ๐ โฒ | ๐ , ๐ ) [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ฃ ๐ ( ๐ โฒ ) ] D. ย ๐ ๐ ( ๐ , ๐ ) = ๐ [ ๐ โฒ โฃ ๐ , ๐ ] [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ฃ ๐ ( ๐ โฒ ) ] ย
Which statement best describes the difference between SARSA and Q-learning?
Which of the following best describes a key difference between Monte Carlo and Temporal-Difference (TD) learning?
Select all of the following methods that use bootstrapping to estimate values
More Practical Tools for Students Powered by AI Study Helper
Making Your Study Simpler
Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!