Questions
COMP90054_2025_SM2 Exam: AI Planning for Autonomy (COMP90054_2025_SM2)- Requires Respondus LockDown Browser
Single choice
What is the "advantage," typically denoted A(s,a), used for in Actor-Critic?
Options
A.It measures how much better an action is than the average at that state
B.It equals the value function minus the reward
C.It estimates the return from a state only
D.It equals the TD error plus entropy
View Explanation
Verified Answer
Please login to view
Step-by-Step Analysis
In the context of Actor-Critic methods, the term 'advantage' is used to quantify how good a specific action is compared to the average action at that state.
Option 1: It measures how much better an action is than the average at that state. This aligns with the s......Login to view full explanationLog in for full answers
We've collected overย 50,000 authentic exam questionsย andย detailed explanationsย from around the globe. Log in now and get instant access to the answers!
Similar Questions
Shown is the Q Actor-Critic (QAC) function, with line numbers. 1. Initialise ๐ , ๐ 2. Sample ๐ โผ ๐ ๐ 3. for each step do 4.ย ย ย Sample reward ๐ = ๐ ๐ ๐ ; sample transition ๐ โฒ โผ ๐ ๐ , โ ๐ 5.ย ย ย Sample action ๐ โฒ โผ ๐ ๐ ( ๐ โฒ , ๐ โฒ ) 6.ย ย ย ๐ฟ = ๐ + ๐พ ๐ ๐ค ( ๐ โฒ , ๐ โฒ ) โ ๐ ๐ค ( ๐ , ๐ ) 7.ย ย ย ๐ โ ๐ + ๐ผ โ ๐ ๐ ๐ ๐ ๐ ๐ ( ๐ , ๐ ) ๐ ๐ค ( ๐ , ๐ ) 8.ย ย ย ๐ค โ ๐ค + ๐ฝ ๐ฟ ๐ ( ๐ , ๐ ) 9.ย ย ย ๐ โ ๐ โฒ , ๐ โ ๐ โฒ 10. end for Which of the following statements is true (can be more than one)?
The value of an action ๐ ๐ ( ๐ , ๐ ) depends on the expected next reward and the expected value of the next state.ย We can think of this in terms of a small backup diagram, as follows: Let ๐ ( ๐ โฒ | ๐ , ๐ ) be the transition probability and ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) = ๐ธ [ ๐ ๐ก + 1 | ๐ ๐ก = ๐ , ๐ด ๐ก = ๐ , ๐ ๐ก + 1 = ๐ โฒ ] the expected reward for the transion from state ๐ to state ๐ โฒ via action ๐ . Rearrange the definition of ๐ ๐ ( ๐ , ๐ ) in terms of these quantities, such that no expected-value notation appears in the equation. A. ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ ๐ ( ๐ โฒ โฃ ๐ , ๐ ) [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ ๐ ( ๐ โฒ , ๐ ) ] B. ย ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ] ๐ ( ๐ โฒ โฃ ๐ , ๐ ) ๐ฃ ๐ ( ๐ โฒ ) C. ย ย ๐ ๐ ( ๐ , ๐ ) = โ ๐ โฒ ๐ ( ๐ โฒ | ๐ , ๐ ) [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ฃ ๐ ( ๐ โฒ ) ] D. ย ๐ ๐ ( ๐ , ๐ ) = ๐ [ ๐ โฒ โฃ ๐ , ๐ ] [ ๐ ยฏ ( ๐ , ๐ , ๐ โฒ ) + ๐พ ๐ฃ ๐ ( ๐ โฒ ) ] ย
Which statement best describes the difference between SARSA and Q-learning?
Which of the following best describes a key difference between Monte Carlo and Temporal-Difference (TD) learning?
More Practical Tools for Students Powered by AI Study Helper
Making Your Study Simpler
Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!