Käynnissä

Reinforcement Learning problem

Reinforcement learning agent with two actions (a1,a2) and three states (S1,S2,S3).

After a period interacting with the environment we have the following values of the Q function:

Q1(S1,a1) = -2

Q2(S1,a2) = -6

Q3(S2,a1) = -4

Q4(S2,a2) = -2

Q5(S3,a1) = -4

Q6(S3,a2) = -2

Now the agent is in state S2 and he choses the action a1 with reward -1.

Consider he stays in S2, what will be the chance the a1 action to be chosen again?

ε=δ=0,1 and discount factor γ =0.9

I think we have to use temporal difference learning

Taidot: Algoritmi

Näytä lisää: what algorithm, problem algorithm, algorithm problem, algorithm function, a1, reinforcement learning, learning, learning c, interacting, learning algorithm, vassilito, s1, reinforcement, state algorithm, site map problem, please give chance design website, joomla template problem, problem megavideo player, joomla ie6 compatible ie7 problem, problem datadir check sure exists writeable, math problem find range, problem send mail php spam, magento shopping cart problem, centos free memory problem

About the Employer:
( 7 reviews ) Athens, Greece

Projektin tunnus: #1684809

Myönnetty käyttäjälle:

conatus

Please read PM

70 $ USD 1 päivässä
(0 arvostelua)
1.7

2 freelanceria on tarjonnut keskimäärin 85 $ tähän työhön

dobreiiita

Hi, Lets do this.

100 $ USD 2 päivässä
(18 arvostelua)
4.5