Jake's CS Notes - Intro. Modeling/Simulation

//****************************************************************************//
//************* Output Analysis / Validation - March 26th, 2019 *************//
//**************************************************************************//

- Alright - the 2nd project checkpoint is due next week on MONDAY, April 1st
    - Hopefully, you're all wrapping up the initial document stuff and working very hard to get your simulator working
        - Your initial simulator should just be as simple as possible (even dumb simple) to verify that it runs; from there, you can improve the quality of the simulator as much as you need until you're happy with it
            - For the checkpoint, we want to see that you have SOMETHING that actually runs and is usable, even if it's very simple - it'll make the analysis portion of the project far easier

-----------------------------------------------------------------------

- Alright, we're almost done with the "lifecycle" part of discrete simulations, but there're 2 important parts left for us to cover:
    - Output analysis
    - Verification/Validation
- Hopefully, we'll have both of these wrapped up after today
    - If you're following along in the textbook, you should be reading Birta-Arbez Chapter 6 for today

- Now, most simulations you'll be dealing with are stochastic, and that means that a single simulation run won't do; instead, you'll need to take the outputs from many runs and aggregate them
    - In general, we'll try to take this data and create a confidence interval for the outputs we care about
    - There are two broad types of studies we can do on this sort of output:
        - BOUNDED HORIZON studies are where we know the end-time of the system (e.g. traffic data between 12-1pm), and try to get some statistics from that
            - How many runs do we need before we're confident in our results, though?
        - STEADY STATE studies are where we want to know the behavior of the system over a long period of time, i.e. after the "warm up" period of the simulation has finished
            - This "warm-up" time might be relevant for bounded horizon studies, too - for your traffic simulations, you'll probably want to wait until the roads have filled up before you start collecting data
            - How do we know when the warm up period is finished, then? And how long should we run the simulation for?

- So, some vocabulary to be aware of for this topic:
    - A POINT SET OUTPUT VARIABLE (PSOV) just means a single measurement of whatever we're interested in (e.g. the time it takes for a specific car to get through our simulation)
    - A DERIVED SCALAR OUTPUT VARIABLE (DSOV) is some value we get based on a bunch of PSOVs (e.g. the average wait time of a bunch of cars)
        - Typically, we're much more interested in the average behavior of a bunch of things, so DSOVs are what we're usually interested in
        - Since our output numbers are all random variables, the DSOV from a given run is just a single sample for one of those variables

- For Bounded Horizon studies, there are two things we're particularly interested in:
    - A POINT ESTIMATE of the data, especially the mean of all the DSOV outputs from each run
        - But once we get this value, how confident should we be that it's correct?  
    - An "Interval Estimate" of the confidence interval for our point estimates
        - To get this confidence interval, we need to know the distribution of our sample mean, which we can find using the CENTRAL LIMIT THEOREM!
            - If it's been awhile, the CLT says that any sample mean for any random variable's outputs is normally distributed about some mean u with variance var^2/n, REGARDLESS of the distribution of the variable itself
                - That's great...but how do we figure out the mean and variance of our data points?
                - Furthermore, this means that n has to be large in order for us to have a high confidence - but that's not good! We don't want to run our simulation a million times!

- So, if we don't have that many samples to work with, we can use the T-DISTRIBUTION instead!
    - Famously, this distribution was discovered by an engineer working at Guinness (pour one out)
    - How does it work? Let's say that Y1, Y2, ..., Yn are n random variables that are normally distributed w/ mean u and variance var, and "y1, ..., yn" are random samples from those variables
        - We can calculate the sample mean and sample variance from these data points in the usual way:

                mean_y(n) = 1/n * sum(yi)
                var_y(n) = (sum(yi - mean_y(n))^2) / (n-1)

            - From that, we'll say that the random variable defined as so:

                    x = (mean_y(n) - u) / (sqrt(var_y(n)) / sqrt(n))

                - ...has a "Student's T-Distribution" with n-1 degrees of freedom
        - A BIG caveat here is that Yi NEEDS to be normally distributed, which doesn't always hold true
        - Also, notice that we're using the SAMPLE mean/variance here, with "n-1"
    - The great thing about this is that n doesn't have to be big - we can have a small number of samples and still figure out this distribution!
        - The T-distribution itself looks like a squashed version of the normal distribution, giving us wider confidence intervals

- Once we've got that T distribution, we can calculate the bounds of our confidence interval as the sample mean plus some offset "zeta" (representing the size of one side/half of the confidence interval):

        CI = (mean - zeta, mean + zeta)

    - Where zeta can be calculated as:

        zeta = (t_n-1(a) * s(n) / sqrt(n))

    - Where "a" is the confidence level we care about (e.g. a = 0.05 means a 90% confidence interval, a = 0.1 means an 80% confidence interval, etc.), and t_n-1(a) is the corresponding value for the T-distribution (we'll just look this up in a table)
        - For your final project, we do NOT just want to see an average output; we want to see these kind of confidence intervals for your estimates!
- Now, as you can tell from this equation, more runs still get us a more precise, useful confidence interval - so, the number of runs we need is usually based off of however many is needed to achieve our target accuracy
    - A good starting point is to get the confidence interval smaller than 10% of the sample mean, but what's acceptable will vary from application to application
        - If the confidence interval isn't within this 10% range, then do a larger number of simulations until you get within this range

- So, to sum up the important stuff for bounded horizon studies:
    - The values produced by a stochastic simulation should just be viewed as samples from the underlying random variable
        - To figure out information about that underlying variable, then, we need multiple samples - i.e. multiple simulation runs!
    - Confidence intervals let us characterize the range of likely outputs we can expect from our model (and hopefully, therefore, the actual system)
    - Increasing the number of runs lets us shrink the confidence interval down to the size we need

- Now, let's talk a bit about Steady State simulations and their all-important WARM-UP periods
    - The warm up period is just "the transient period from initialization until the system reaches a steady state," like the time it takes for an initially empty traffic simulation to fill up with cars
        - Typically, we don't collect any data from the simulation until the warm-up period is over - but when does THAT happen?
    - Figuring out when the warm-up period ends can be tricky, so to help figure it out we'll use WELCH'S METHOD (or "Welch's moving average method")
        - To smooth out the data, first divide the time scale into cells large enough to each contain multiple data points (e.g. into 1-week long cells for a data set spanning months)
            - The value for each time cell will then be the average of all the runs inside that cell
        - Then, let's define a "replication size" of how many cells we'll use to compute an average, e.g. the average of 5 cells
            - The window size "w" will be the number of additional samples we take on EITHER SIDE of our current cell (e.g. w = 3 means current cell + 3 to the left + 3 to the right = average of 7 total cells)
            - Finally comes the "moving average" part of this procedure: take the average of cells [current - replicationSize, current + replicationSize] and plot that average; then take the average of [1, replicationSize+1] and plot that, etc., sliding our average "window" to the right until we reach the end of our cells
                - If our window is too big for the current sample, then shrink the window's width until it fits inside the data
    - This might seem arbitrary, but this method can significantly smooth out noisy data where it's hard to see a trend, making warm-up times more obvious
- So, in summary, we need to often consider this warm-up period that doesn't reflect the system we want to study, and Welch's method provides a way of visually seeing this for noisy data

- Another thing you'll HAVE to do for your project is verification/validation of your models - so let's do that!
    - There's a nice paper by Bob Sargent on this if you want a reference, which we'll base our discussion on
- Very generally, there are 4 things we're concerned about here:
    - VERIFICATION - Does our model/software match what I intended it to do? Is it working as we planned?
    - VALIDATION - Does the model represent the ACTUAL simulation under investigation "well enough" for our purposes?
    - CREDIBILITY - Do our customers believe in our model? Are they confident enough to actually use it?
        - Even if our model is statistically accurate, if our users don't trust it, they won't use it!
    - USABILITY: Do our users understand how to use this program/data? Is it "easy" enough to be helpful?

- For VERIFICATION, as we've stated before, we're asking if we actually built our software correctly: that it's free from bugs, that it doesn't crash, and so on, and that it matches the conceptual model
    - In general, the more general the language we used to build our software, the longer it takes to verify
    - How do we actually do this, then?
        - The classic techniques are STATIC verification, where the team walks through the software and checks that everything looks alright
        - More relevantly for your project are DYNAMIC techniques, where we vary parameters and make sure the output is reasonable, test simplified versions of our model for correctness, do unit-testing on individual models, etc.
            - This also includes testing simple configurations for "reasonable" results (e.g. if we increase the amount of traffic, do the travel times increase?)

- VALIDATION, once again, is asking a different question: did we build the right model in the first place?
    - It's possible for a model to be valid for some conditions, but break down for others (e.g. a traffic model might work fine for moderate traffic, but underestimate the effects of heavy traffic)
        - What's an acceptable level of accuracy will vary from project to project; a stock simulation for teaching kids about money management isn't going to be as stringent as a professional flight simulator
    - There are 3 broad types of validation:
        - DATA VALIDATION is where we're making sure the data we're basing our model on "makes sense"
            - Almost all data sets have errors within them, or missing data, or something
            - You should look through the data, check that any outliers make sense, that the data is consistent, etc.
                - The difficulty here is that collecting all this data is often quite time consuming, and can be impossible in some circumstances
        - CONCEPTUAL MODEL VALIDATION is where we're checking that the theories and assumptions our model makes are valid, and the representation of the problem is sufficient
            - A very common way of doing this is "face validation:" you present your model to experts on the subject, and they give their opinion
        - OPERATIONAL VALIDATION is where we're comparing our model's outputs with real-world data, and ensuring that the output makes sense
            - For your projects, this is probably the most relevant test
            - When you can, you'll want to do this using objective approaches, like comparing your output data directly to real-world datasets
                - If that's not possible, you can resort to subjective approaches instead, like having experts review your model's output, or comparing it to similar models
    - You can spend a LOT of time on validation, and it often takes up a large portion of the project- but, at the end of the day, there are deadlines. you'll need to draw a line somewhere between confidence and expedience
        - If you incorrectly reject a model that's actually fine, that's a type I error, and can cost developers money
        - If you incorrectly ACCEPT and invalid model, that's a type II error, and a MUCH bigger problem: people will base their ideas off of your incorrect study!
            - There's also the slippery type III error: your model is valid, but it doesn't actually answer the problem you care about

- In his paper, Sargent recommends the following, general procedure to validate things correctly without taking an eternity of time:
    - Make some agreement with your clients about what validation techniques will be used/needed
    - Agree with your clients on what range of accuracy is acceptable BEFORE you start building the model
    - As often as you can, test the assumptions underlying your model
    - For every model iteration, at least do a face validity check on its conceptual model
        - For each iteration, you should also explore the model's output behavior in some way
    - At LEAST once the final model is built (and during prior iterations if possible), compare the model with the real-world system (preferably via datasets)
    - Document all of your validation findings, good or bad, and present them
    - If your model will continue to be used, make sure to periodically revalidate it

- ALRIGHT, that should be it for the modeling-simulating lifecycle - and you should actually now have everything you need to finish your second project! On Thursday, we'll start diving into parallel computing, so make sure not to miss that!