Jake's CS Notes - Intro. to AI

//****************************************************************************//
//************** Intro to Probabilistic Agents - May 31st, 2018 *************//
//**************************************************************************//

- Today from the board, we see what's writ:
    - "Wrapping up logic
    - Intro Probability
    - Probability Agent (inference)"
- Once again, project 1 is due Sunday night - so do it! Don't overlook the extra credit!
    - People are constantly surprised that the TAs exist for this class - they do! They have office hours at 11am tomorrow in TSRB's common-stairwell-ish area! 
---------------------------------------------------

- So, for our "Hunt the Wumpus" game, we saw the map omnisciently and ran through what we should ideally do, making inferences about the world from our prior observations
    - The big advantage of logical agents: they can use rules to make INFERENCES or DEDUCTIONS of things they don't know based on facts that it does know (in its knowledge base)
        - It starts off knowing some rules about the world, adds facts from its sensors to its knowledge base as it moves, and asks questions about the world and tries to answer them with a proof from its facts/rules
            - Example question: "is (2,2) safe?"
            - The answer to any of these questions it asks is true, false, or unknown
    - For this reason, logic agents are great when we only have partially-observable environments
    - At this point, let's drag just a bit of discrete math back into things:
        - We have our knowledge base facts: True/False formal sentences
            - Basically, a collection of propositions
        - A possible world: a possible explanation of our knowledge base facts, described by a set of propositions
            - These can be "atomic" propositions ("It's sunny", "It's raining", represented by "P" or "Q"), or larger facts we've deduced ("(1,1) is a safe tile")
        - And we have our boolean logic operators: "iff", "and", "or", "implies", and "not"
        - For complicated stuff, we also have parentheses to group logical statements
        - We can then make FORMULAE out of combinations of the above, e.g. "Q -> !P" ("If it's sunny, it's not raining")
    - Our sensors give us new "atoms" of information; THINKING is trying to prove new propositions from ones we already know, while ACTING is the action we take to approach the goal, in light of our knowledge base and deductions
    - So, what are the atoms for our "Hunt the Wumpus" game?
        - P(i,j) = "There's a pit at (i,j)"
        - W(i,j) = "There's a Wumpus at (i,j)"
        - G(i,j) = "... Gold ..."
        - B(i,j) = "Breeze detected at (i,j)"
        - S(i,j) = "Smell detected at (i,j)"
    - And then what're our rules we know? Well, something like this:
        - B(i,j) <--> (P(i+1,j) or P(i-1,j) or P(i,j+1) or P(i,j-1))
            - "A Breeze implies an adjacent pit somewhere, or vice versa"
        - (...)
    - Now, our agent's goal is just trying to move to new rooms for as long as possible, so we have one question we need to keep asking: is (i,j) safe to enter?
        - Let's define a new atom: K(i,j) = "(i,j) is safe to move into"
        - Let's then say what safe means: K(i,j) <--> not P(i,j) and not W(i,j)
    - In the book, there's a pretty good algorithm for doing this; I won't make you learn it for this semester

- Now, while this system is GREAT and lets our agent infer quite a lot, it does break down in some situations
    - Let's say that as soon as we enter the maze, we detect a smell, so the Wumpus could be either up or right - but we don't know which one to pick!
        - A possible solution for this is to shoot an arrow up or right - if we kill the Wumpus, great! If we don't, we know the way in that direction is safe!
            - Solutions like this are called "coercing the environment" - transforming an unknown environment into one more favorable to us by acting
    - Now, what if instead of a smell, it's a breeze?
        - ...in this case, there's no way for us to decide where the pit is; we would just have to choose randomly, which isn't something a purely logical agent should do
        
- So, even in this simple discrete game, there're situations where we can't figure out things for sure; in the breeze example, we don't know for sure which location is the most safe
    - ...but are both locations equally likely to be safe? Not always; in some situations, where we can't know for CERTAIN what the next step is, we CAN figure out what option is most likely to be successful
    - In the case of "Hunt the Wumpus," for instance, certain board configurations are more likely to occur than others (e.g. given some known sensor data, there might be 4 valid configurations with a pit to the right, but only 2 with a pit to the left)
- Furthermore, in the real world, sensors aren't 100% accurate "true-or-false" things; they're messy, and would be something more like "Wind speed = 1.5mph +/- 0.2mph"
    - ...and our actions aren't guaranteed either; we might not always shoot the arrow perfectly straight, for example
- So, instead of working with pure "true" and "false" values, working with certainties, it makes more sense to work with probabilities...which brings us to our next topic

- "Before we move on from logic, I won't expect to ask you questions about logical algorithms or stuff like that...but things like 'given this set of rules, where's the pit? What action should you take?', etc., are fair game"

- So, for this reason, logic agents in the real world have mostly been replaced with PROBABILISTIC AGENTS
    - "Now, about half of you have not taken a probability course here at Georgia Tech. For those who already know this stuff: please bear with me. For everyone else: pay attention. You'll actually need this stuff for the rest of the course."
    - And now, a super quick review/lesson on probability:
        - In academic classes, probability usually gets talked about like this: "There exists some universe U that contains all possible worlds"
            - In probability, a WORLD is some given combination of all relevant variables
                - It's a bit like a possible state, if you want to think of it like that
            - An INFERENCE is the process of figuring out which world we're actually in
                - "This is like some weird quantum-parallel-timey-wimey thing, where there's any number of possible worlds that COULD happen and we're tring to figure out which one we're probably in"
            - Canonical example: flipping a coin!
                - With a fair coin, there's an equal chance of getting heads or tails
                - This brings us to the ever-confusingly named topic of RANDOM VARIABLES: Some symbol that can take on values, the value can change every time we check it, and the values given are governed by a probability distribution
                    - "We can know the probability distribution, but we can NOT know for sure what the exact next value of the random variable is; we can't know for sure if it'll be heads or tails next, but we can calculate the probability"
                    - It can also be 2 OR MORE values (e.g. instead of just being "true" or "false", the weather can be sunny, rainy, cloudy, stormy, or dusty, and a die can be 1,2,3,4,5, or 6)
            - So, with these definitions, we say the PROBABILITY of a value is the proportion of all possible worlds in which the random variable takes on that value
                - The probability of ALL possibilities must add up to 1, since we must exist in at least one of those worlds, so the probability of being within the universe as a whole is 100%
            - If we think of this as a set, and there's a subset "A" in the universe U that cover 1/10th of the possible worlds, then we can say a few simple rules:
                1) A + !A = U, so P(A) + P(!A) = 100%
                2) The probability of the universal set U minus the set A = 1.0 - P(A) = P(!A)
                    - Notice that this is mostly just reshuffling the first rule, but it's a useful property in its own right ("The probability of me getting ice cream is 30%, so the probability of me not getting ice cream is 70%")
                3) If we have a set A and a set B, and part of them overlap, then P(A or B) = P(A) + P(B) - P(A and B)
                    - We have to subtract the overlapping part so we don't double-count the middle
            - All of that's pretty simple and intuitive, but we need some more exciting stuff for our AI, so what happens if we have multiple random variables interacting?
            - Well, that's where we start doing inferencing on 2+ random variables
                - Let's say you're visiting the dentist, and the dentist is scraping around in your mouth with that horrible metal-scrapey thing to see if you have a cavity
                - Now, you might have a toothache, you MIGHT have a cavity, and the doctor's scrapey-hook MIGHT catch on something
                - Knowing this, we can create a FULL-JOINT DISTRIBUTION table to describe the probabilities of each possible combination of events (all entries in the table together sum to 1):

                    Toothache        | No Toothache
                    Catch | No Catch | Catch | No Catch |
        Cavity      0.108 | 0.012    | 0.072 | 0.008    | = 0.2
        -------------------------------------------------
        No Cavity   0.016 | 0.064    | 0.144 | 0.576    | = 0.8

                - So, we can answer questions like "What's the probability I have a toothache if I know I have a cavity and I know the doctor's hook caught?"
                    - In that case, we can just look up the number in the table: 0.108
                - But how do we calculate the probability of having a cavity?
                    - ...we can just sum up the row! So, in this case, it'd be 0.2
                        - This is the MARGINAL probability...since, well, you write the sum of the row in the margin (statisticians are very creative)
                - What's the probability of us having a cavity AND a toothache?
                    - Well, then we just add up the corresponding table entries; it's P = 0.108 + 0.012 = 0.120
                - What is the probability we have a cavity OR we have a toothache?
                    - The naive way to do it would be to figure out the exact 6 things I need (sum up the cavity row, then add in chance of toothache without a cavity)
                        - OR, we could use the identity we found earlier: P(cavity) + P(toothache) - P(cavity AND toothache) = 0.2 + (0.124 + 0.076) - 0.120 = 0.28
                    - Or, doing even less math, 1 - NOT(P(cavity OR toothache)) = 1- P(NOT(cavity) AND NOT(toothache)) = 1 - (0.144 + 0.576) = 0.28

- That's all the UNCONDITIONAL probability stuff you've gotta know for Bayesian inferences; but most of the time, we'll be worried about the probability of something given that something ELSE has happened
    - CONDITIONAL probability is just the probability that "Given X happened, what is P(Y)?"
    - A quick example: What's the chance I have a cavity given I have a toothache?
        - More formally, what's P(cavity | toothache)?
            - Well, we're ASSUMING that toothache is true, so we ONLY care about the left half of the FJD table
        - So, we end up with all the probabilities of "if cavity AND toothache"; since we're assuming the toothache is true, we'll remove that probability out by dividing
            - The formula for this: P(cavity | toothache) = P(cav AND tooth) / P(toothache)
            - So, 0.12/0.2 = 0.6
    - More generally, the formula for probability of something given something else is:

            P(y|z) = P(y AND z) / P(z)

        - Again, we divide so that the probabilities will sum up to 1, since only the probabilities assuming "z" are true are "in play"
        - "Any time I talk about 'normalization' in the class, I just mean making sure the probabilities we deal with all sum up to 1"
    - In AI, the problem will often come up in this form:

        P(x|E) = "The probability of X, given all the evidence E we've received from our sensors"

        - Which, again, will work out to this: 
        
            P(x|E) = P(x AND E) / P(E)

            - P(E) is just the number that normalizes everything back to 1; we usually don't figure it out directly, since it's just all the previous possible values of X summed together (CHECK THIS)
            - So, we can just say that P(x|E) = Normalized( P(x AND E))

    - In order to figure out these problems, we need to create the FJD table for these probabilities; but often, in partially observable probabilities, we don't know how likely some things are yet, so we don't have the full table
        - So, what we need in the meantime are LATENT VARIABLES - hidden variables that help us to marginalize probabilities (i.e. calculate them without needing to know everything)
    - Two more rules we use: the "probability rule" and the "chain/product rule" (which are actually the same rule if you squint just right)
        - So, we said the "marginal" probability of something was the sum of the probability of something GIVEN all other possible variables (i.e., the sum of the row)
        - Now, the PRODUCT RULE states that for 2 independent events Y and Z, the probability that both Y and Z occur is:
        
             P(Y, Z) = P(Y AND Z) = P(Y|Z) * P(Z)

            - This lets us switch between conditional probabilities and the full FJD table!
                - A boringly classic example: rolling 2 dice, the probability of rolling a 4 and a 6 = P(r4, r6) = P(r4 | r6) * P(r6) = 1/6 * 1/6 = 1/36
                    - In this particular case, rolling one number previously doesn't affect future probabilities of other numbers, so P(r4 | r6) is the same as P(r4), which is still 1/6
                - Notice here that we could also expand this the other way as "P(Z|Y) * P(Y)" - either way is valid!
        - Applying this in the special 3-variable case, P(X, Y, Z) = P(X|Y,Z) * P(Y,Z) = P(X|Y,Z) * P(Y|Z) * P(Z)
        - Additionally, the TOTAL PROBABILITY RULE states that:
        
            P(Y) = Sum for all possible values of Z( P(Y|Z=Z)*P(Z=Z) )

            - This is really just a fancy way of restating marginal probabilites
            - This lets us go between single variables and conditionals

- One of the most common places this stuff is used is in medical diagnoses, where we know the symptoms but don't know what the disease might be
    - So, we want to know if the known symptoms are caused by a disease "D"
        - We know the overall prevalence of certain diseases from surveys (e.g. probability someone in the world has the common cold); this is "P(D)"
        - Similarly, we know from medical surveys the prevalence of most symptoms (P(s)); we can just look these up
        - For a given disease, we also have lists of symptoms and their probabilities (e.g. "Given you have migraines, 67% of sufferers have noise sensitivity, 40% of have impaired cognition..."); this is P(s|D), the probability of a symptom given the disease
    - But, what do we actually WANT to know? We want to know P(D|S) - the probability of a disease given a SET of symptoms "S"
        - We know the following:
        
            P(B|A) = P(A and B)/P(A)

        - Using the Product Rule, though can expand the top part to this:

            P(A and B)/P(A) = (P(A|B) * P(B)) / P(A)

        - This has an issue, though - we could be dividing by zero if P(A) = 0!
        - So, to eliminate that, we'll multiply both sides by P(A) to get:
        
            P(A|B)*P(B) = P(B|A) * P(A)

        - DIVIDING BY P(A), we get P(B|A) = (P(A|B) * P(B)) / P(A)
            - THAT'S BAYES' RULE!
            - (wait, why did we multiply it out then divide to put it back in?...I'll assume I missed the explanation, since it was the end of class and I was writing sort-of fast)

        - Applying this to the disease, we can say this:
            
            P(D|S) = P(S|D)*P(D) / P(S)

- Next week, we'll go through an example to make this concrete, and then get into how we can use this rule to help improve our searching methods when the environment itself is uncertain - until then, enjoy your weekend and get project 1 turned in!