题目
Artificial Intelligence Lecture 7 Quiz
简答题
For the dice problem, the problem is now: every round we draw nice, if you get to stay, you get $4. If you is kicked out, you get nothing. If you choose to quit, you get $10. We will now use a mixed strategy: "I want to first take the risk and earn at least X dollars before I quit and take my $10". What's the optimal X? Implement policy iteration to find out X, suppose the max money you can get is $100. Define the state space to include the money you got so far. https://colab.research.google.com/drive/13VwGV6JRm5_mwuKb2mtX6XE45cKC8t14?usp=sharing Links to an external site. The optimal X is:
查看解析
标准答案
Please login to view
思路分析
We start by restating the problem setup and recognizing what the decision-maker is optimizing. In this dice problem, each round you either stay and earn $4, are kicked out and earn $0, or quit and lock in $10. With a mixed policy that attempts to maximize the expected payoff, you plan to continue playing until your accumulated money reaches a threshold X, at which point you quit and take the $10 instead of risking further rounds. The question effectively asks for the optimal stopping threshold under a policy iteration framework, given a maximum possible total of $100 and a state space that includes the current amount of money earned so far.
First, consider the dynamic programming structure. The state is the current total amount you have accumulated. From any non-terminal state s < X, you have an action to continue playing, which leads to a probabilistic next state: you might stay and gain some additional amount, ......Login to view full explanation登录即可查看完整答案
我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。
类似问题
In a consumer society, many adults channel creativity into buying things
Economic stress and unpredictable times have resulted in a booming industry for self-help products
People born without creativity never can develop it
A product has a selling price of $20, a contribution margin ratio of 40% and fixed cost of $120,000. To make a profit of $30,000. The number of units that must be sold is: Type the number without $ and a comma. Eg: 20000
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!