Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(Artificial Intelligence Problem) Flappy Bird: http://flappybird.io/ Consider th

ID: 667764 • Letter: #

Question

(Artificial Intelligence Problem)

Flappy Bird: http://flappybird.io/

Consider the task of designing an intelligent agent to play Flappy Bird. Describe the performance measure. environment. actuators and sensors of your agent. For each of the seven environment properties we discussed in class, indicate how this environment would be described along these seven dimensions and give a short explanation as to why. Finally, of the four types of agents (simple reflex. model - based reflex. goal - based. Utility - based). indicate which you think would be best for your Flappy Bird agent and give a short explanation as to why.

Explanation / Answer

Here the intelligent agent should do a action in certain state. Then it should find itself in a new state and get a reward based on it..there are many variants to be used in different situations.

It requires few inputs and generate a single output.the various inputs are like height of bird and distance between pipes and bird and the output will be to touch or not to touch and incrementing score

State Space

The discretized space at the following parameters.

Vertical distance between lower pipe and bird

Horizontal distance between next pair of pipes and bird

Is the bird living or dead

Actions

In this we can perform only two actions

They anothingre clicking or doing

Do Nothing

Rewards

If bird is alive we get one point plus if bird is dead 100 points minus

The Learning Loop

Step 1: look and observe wheather in which state Bird is and do the action that increases the reward points.

the gaming engine performs its operation. Now. Bird is in a next state, s'.                Step 2: Look at the new state, s', and the points associated with it is +1 if the bird is still alive and if bird is dead -100 points

Step 3: Update the Q array as per the given Q Learning rule.

Q[s,a] Q[s,a] + (r + *V(s') - Q[s,a])

The alpha choosen is 0.7 because we have a deterministic state and It should be pretty hard to un-learn something. Also, the discount factor, lambda, was 1.

Step 4: Adjust the current state to s' and start over.