Reinforcement Learning problem

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3).

After a period interacting with the environment we have the following values of the Q function:

Q1(S1,a1) = -2

Q2(S1,a2) = -6

Q3(S2,a1) = -4

Q4(S2,a2) = -2

Q5(S3,a1) = -4

Q6(S3,a2) = -2

Now the agent is in state S2 and he choses the action a1 with reward -1.

Consider he stays in S2, what will be the chance the a1 action to be chosen again?

ε=δ=0,1 and discount factor γ =0.9

I think we have to use temporal difference learning

Taidot: Algoritmi

Näytä lisää: what algorithm, problem algorithm, algorithm problem, algorithm function, a1, reinforcement learning, learning, learning c, interacting, learning algorithm, vassilito, s1, reinforcement, state algorithm, site map problem, please give chance design website, joomla template problem, problem megavideo player, joomla ie6 compatible ie7 problem, problem datadir check sure exists writeable, math problem find range, problem send mail php spam, magento shopping cart problem, centos free memory problem

Tietoa työnantajasta:
( 7 arvostelua ) Athens, Greece

Projektin tunnus: #1684809

Myönnetty käyttäjälle:


Please read PM

$70 USD 1 päivässä
(0 Arvostelua)

2 freelanceria on tarjonnut keskimäärin %project_bid_stats_avg_sub_26% %project_currencyDetails_sign_sub_27% tähän työhön


Hi, Lets do this.

$100 USD 2 päivässä
(18 arvostelua)