(Artificial Intelligence Problem) Flappy Bird: http://flappybird.io/ Consider th
ID: 667764 • Letter: #
Question
(Artificial Intelligence Problem)
Flappy Bird: http://flappybird.io/
Consider the task of designing an intelligent agent to play Flappy Bird. Describe the performance measure. environment. actuators and sensors of your agent. For each of the seven environment properties we discussed in class, indicate how this environment would be described along these seven dimensions and give a short explanation as to why. Finally, of the four types of agents (simple reflex. model - based reflex. goal - based. Utility - based). indicate which you think would be best for your Flappy Bird agent and give a short explanation as to why.Explanation / Answer
Here the intelligent agent should do a action in certain state. Then it should find itself in a new state and get a reward based on it..there are many variants to be used in different situations.
It requires few inputs and generate a single output.the various inputs are like height of bird and distance between pipes and bird and the output will be to touch or not to touch and incrementing score
State Space
The discretized space at the following parameters.
Vertical distance between lower pipe and bird
Horizontal distance between next pair of pipes and bird
Is the bird living or dead
Actions
In this we can perform only two actions
They anothingre clicking or doing
Do Nothing
Rewards
If bird is alive we get one point plus if bird is dead 100 points minus
The Learning Loop
Step 1: look and observe wheather in which state Bird is and do the action that increases the reward points.
the gaming engine performs its operation. Now. Bird is in a next state, s'. Step 2: Look at the new state, s', and the points associated with it is +1 if the bird is still alive and if bird is dead -100 points
Step 3: Update the Q array as per the given Q Learning rule.
Q[s,a] Q[s,a] + (r + *V(s') - Q[s,a])
The alpha choosen is 0.7 because we have a deterministic state and It should be pretty hard to un-learn something. Also, the discount factor, lambda, was 1.
Step 4: Adjust the current state to s' and start over.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.