Title | PADI apontamentos |
---|---|
Course | Pleaneamento Aprendizagem Decisão inteligente |
Institution | Instituto Superior Técnico |
Pages | 30 |
File Size | 2.7 MB |
File Type | |
Total Downloads | 45 |
Total Views | 196 |
Warning: TT: undefined function: 32 Warning: TT: undefined function: 32Markov ChainIrreducibilityAperiodicityStationary distributionErgodicityHidden Markov ModelsForward AlgorithmForward-Backward Algorithm Example: Calculate most likely state at t= Expected utilityDecision TreesMarkov Decision Probl...
Markov Chain Irreducibility
Aperiodicity
Stationary distribution
Ergodicity
Hidden Markov Models Forward Algorithm
Forward-Backward Algorithm
Example: Calculate most likely state at t=0
Viterbi Algorithm
Expected utility Decision Trees
Markov Decision Problems
If we want to calculate the Discounted Cost-to-Go for a policy 𝜋:
To find the optimal Cost-to-Go we can use Value Iteration: • Calculate the Q function for every action and get the new J from the minimum values in each row, if the J is the same as before, then J = J*
POMDP
The belief
Value iteration
Point-based methods
Policy Iteration
We represent a policy for a POMDP in a policy graph
MDP Heuristics MLS Heuristic • Select the most likely state given the belief and choose the action that state would choose, given an optimal policy for the MDP
Problem: as it knows the states because it is an MDP, it may not select some actions, such as “Listen” that is only useful in partial observability
AV Heuristic • Executes the action that given the belief would me more likely to be executed.
Problem: it selects the most voted action, and as an MDP can fully observe the states, it may not select some actions, such as “Listen” that is only useful in partial observability
Q-MDP Heuristic
FIB Heuristic
Theoretical Questions Forward-Backward Mapping
Belief in a POMDP
Pros and Cons Q-MPD Heuristic
Optimal Policy
Period in Markov Chain
Utility preferences
Extras
Aperiodicity extra:
The Biblical Question...