Jake's CS Notes

//****************************************************************************//
//************ Behavior Planning (cont.) - February 20th, 2019 **************//
//**************************************************************************//

- Ah, the rain - cool and refreshing, or in this case freezing and pneumonia-causing
- Also, because of the delayed midterm (which is on MONDAY, by the way), we ARE going to delay homework 5 by a few days
    - Homework 5 isn't "hard," per se, but it does involve figuring out the function calls
- The midterm can cover anything we've talked about so far, up to and INCLUDING what we're going to talk about today 
--------------------------------------------------

- Last time we started getting into behavior planning, which is really the first decision-making strategy we've talked about that feels more like "artificial intelligence" (planning, searching, etc.) than a clever trick
    - We started comparing and contrasting pathfinding and behavior-planning with A*, because even though they use the same algorithm there are som pretty siginificant differences:
        - With pathfinding, a lot of stuff is pretty intuitive: a grid is a decent representation of a physical space, the actions are pretty limited and simple (one movement action isn't too preferable to another, i.e. we don't 'want' to move left inherently any more than moving right), the distance heuristic makes sense, etc.
        - Behavior planning, on the other hand, is more complicated; the actions are very different, comparing them is more apples-to-oranges, and it's difficult to estimate how "far away" we are from reaching a goal

- As an example, here's a planning system about simulating airplanes that uses STRIPS-style actions ("Stanford Research in Planning Systems")
    - We have a state like this, listing the airports and paths available to us:

            State: airport(LAX), airport(ATL), at(plane1, ATL), at(plane2, LAX), path(ATL, LAX), !at(plane1, LAX), etc.

    - With the following action (in this case, the action takes in parameters, although it doesn't NEED to):

            Fly(?plane, ?from, ?to):
                Precondition (checking we're at the right spot, that plane really is a plane, path between airports exists, etc.): at(?plane, ?from), plane(?plane), airport(?from), airport(?to), path(?from, ?to), ?from != ?to
                    - Side note: Even though this last rule in the model prevents the airport we're at from being the destination, this has happened to Professor Riedl several times in Atlanta ("It isn't the greatest feeling when your plane turns around")
                Effect (update the state variables we're changing): at(?plane, ?to), !at(?plane, ?from)

        - So, if all the variables check out and the path exists, we're good!

- A more complicated example from the slides: a bank robbery!
    - Here, the state keeps track of if the man is a criminal yet, if he has a gun, if he has ammo, if he's hungry, if he has money, etc.
        - There might be actions like "RobBank," "LoadGun" (if we have the gun and ammo), "ShootGun" (if gun is loaded), "BuyGun" (if you have money and you're not a criminal - "background checks, right?"), "HuntPossum", "EatPossum," "TakeBath," etc.
            - "...it's not a very realistic bank robbery simulation, okay?"
    - So, we can't lay these states out in a physical map like we could for pathfinding, but we CAN still lay out a state space with the possible actions we can take and the successor states they'll get us to
        - For instance, if we "haveMoney" and we're "!isCriminal," then we can take the "BuyGun" action to reach the state of losing our money and having a gun
            - Similarly, if we have money, we can "takeBath" to lose our money!
            - On the other hand, if we don't "haveGun," then we can't take the "loadGun" option
        - So, with this, we've got ourselves a successor function that tells us which states we can reach given our current conditions - that's an ingredient for A*!
    - So, let's say our goal is to reach the state of being rich and not being hungry - knowing that, and our successor function, we can search through our state space to find a path to a state that fulfills that goal!
- ...but what can we do to get a heuristic? Without knowing how close we are to the goal, this is just breadth-first search!
    - So, what're some ideas?
        - We could use the number of actions to get to the goal, but we don't know how many actions it'll take to reach the goal until we've solved the problem
        - We could do the number of goal conditions met, but we might meet a few conditions and still not be able to reach the goal - besides, it's a monotonically increasing condition, so it'll get us there anyway (?)
    - What CAN we do instead, then?
        - With pathfinding, straight-line distance worked so well because it was a "relaxation" of the problem; we pretended we didn't have to worry about obstacles, simplifying the problem and getting us an underestimating cost-estimate
            - This is the whole idea of heuristics, right? Simpler versions of the problem so that we don't have to deal with NP-hard type stuff!

- So, how can we relax our problem here in the STRIPS model? Here's what those dorks at Stanford came up with:
    - 1) Only consider POSITIVE effects
        - So, positive effects are traits that are "true" in the problem, like "gunLoaded" or "isRich" as opposed to "~gunLoaded" or "~possumAlive"
            - A BIG complication in these sorts of planning problems are when things go from true to false, which makes order matter (e.g. if we shootPossum before we robBank, then our gun isn't loaded and we can't rob it - but if we robBank first, then we have money, so we can buy more ammo and then shootPossum)
        - If we pretended that "shootPossum" didn't unload our gun, though, or that we can always buy our gun, then actions never get eliminated halfway through our plan, which makes the problem easier!
            - In fact, this means all plans can be generated in linear time!
    - 2) Consider each condition in the current state as independent of the others

- So, to actually calculate the heuristic, we'll do something like this:

        heuristic(state):
            for each condition in current state:
                get minimum # of actions between that condition and the goal
            take the maximum of all conditions in current state as the heuristic value (so that we get the closest underestimate)

    - There's some magic going on here: how do we know how many actions are between each condition and the goal? (got a little confused here)
        - So, let's say our current state has the positive effects "hasGun," "possumAlive," and "has"
        - Knowing the goal node, we'll construct a mini-search problem where we ignore all the negative effects in our state/action (just pretend they don't exist; for now, we'll assume none of our goal conditions are negative); we'll start at the goal and then work BACKWARDS
            - So, all the adjacent nodes to the goal must've transformed some condition we have to a positive one, which means they took in some conditions and gave the output goal condition
                - We say the goal conditions are "0-steps" from the goal, while the input conditions that allowed us to take those actions are "1-step" from the goal
                - We'll then look backwards at the actions that got us to THAT node, and any input conditions that allowed us to take the action that got us to that 1-step-away condition will be 2-steps away
                    - We save all these values in a table; if the same condition ever has 2 different steps-away values, then we save the max value
        - So, we do this search on a simplified version of the state space map once we know the goal, it takes polynomial time (run until our 'openList' from the goal runs out), and then we save all the "steps-away" values in a table
            - So, we calculate this whole search to get the table ONCE when we get our goal node; then, for each "successor()" call, we just look up the steps-away values in the table and say our heuristic is the max applicable one
    - This magical technique has *drumroll* the WORST name ever: "heuristic search planning"
        - "...I don't care what you kids say, I think it's cool"

- OKAY - phew.

- There's ONE more thing to talk about when it comes to decision making - hierarchical planning - but I think we'll save that for after the exam. So, 'til then, happy studying!