Jake's CS Notes - Intro. Modeling/Simulation

//****************************************************************************//
//***** Activity Scanning / Random Number Generation - March 12th, 2019 *****//
//**************************************************************************//

- Some people have already started asking about the 2nd exam, which is WAYYYYY out at the end of the semester; the hope is that your project will cover all the "simulation lifecycle" stuff we're going over now, but NOT the parallel stuff we'll start covering after Spring break; we'll talk about details when we get to them
    - The second exam will EMPHASIZE parallel processing stuff because of that, but there *might* still be a question or two on this lifecycle stuff...
----------------------------------------------------------------------------

- Okay, we started talking about activity-scanning simulations last week; let's try going over it in more detail
    - For activities, we need to worry about 2 different categories of events:
        - CONDITIONAL (or "C") events occurs when a condition is met
        - BOUND (or "B") events occur when we're at a particular timestep
    - There are 3 main types of activities we'll be worried about in this class:
        - A regular ACTIVITY is started by a condition and runs for a certain amount of time
            - So, for instance, an airplane activity might be started when the runway is free (a C-type event), and then end after 5 minutes of landing procedures (a B-type event) 
        - A TRIGGERED activity starts at a certain time after being created by an event, and runs for a certain amount of time
            - For instance, the "docking" activity might be started as soon as we're done with landing, and then take a certain amount of time
        - ACTION SEQUENCES are where one event schedules another event in sequence
    - "Again, the main point isn't the specific terminology here, but the concepts"

- How is this kind of simulation actually run, though?
    - Well, we'll have a list of "current activities," with new activities being created throughout the simulation and old ones being deleted when they're completed
        - Each activity here has a "trigger" that tells us when it should start running
- The simulation time will be discretized into different "time steps," and for each time step, we'll scan through the whole list of current activities
    - If it's a B-type event, and it's time for it to start, we execute it!
    - If it's a C-type event, and their condition is true, we execute it!
        - We continue doing this for the timestep until no more events are executed
            - Keep in mind that events can create new events, INCLUDING more events for the current time, so we need to check if any of those were added!
        - After that, we'll advance to the next step

- This scanning approach is pretty simple, but it's not the most efficient - it's okay for small projects, but we end up scanning the entire list EVERY timestep, even if we don't have any events that need to be run right now!
    - To speed this up, remember that conditions change in the simulation ONLY when an event occurs - so, no new C-type events can fire until a B-type event triggers their condition!
        - That means that we can maintain a "future event list" priority queue and get the smallest timestep B-type event, along with any others w/ the same timestep
        - We'll then execute those events, and any of the conditional events that're now true; we'll then get the next smallest timestamp from the FEL
            - This means that we can skip a BUNCH of timesteps when nothing is happening, greatly improving performance

- So, in summary, there are 3 "worldviews" we can take when building our simulation model, and they all have different trade-offs
    - Event-oriented approaches are straightforward and efficient, but they can have a complexity issue when scaling up for large problems
    - Process-oriented simulations are probably the most widely used, since they simplify model maintenance and organization
        - However, this requires threading, and handling those threads can incur some overhead and complexity of its own
    - Activity-scanning approaches make more sense for some applications; they can have inefficiency problems, but many of these can be mitigated

- Shifting gears, one of the most common things we need when simulating stuff is randomness - so how do we add that in? We use RANDOM NUMBER GENERATORS!
    - This is a MASSIVE topic, so we're just going to go through the 10,000 foot high version
    - To be clear, we're generating PSEUDO-random numbers
        - Ignoring the whole "deterministic" vs "random" philosophical debate, "truly random" numbers can be generated with certain hardware (e.g. measuring radioactive decay from uranium, or white noise in electronic circuits)
            - The problem with these "truly random" approaches is that it makes debugging harder for us - since the numbers aren't repeatable, we can't reproduce a case that gives us trouble and investigate it!
                - It also mans we can't reproduce results if someone else needs to see them - "when the IRS calls, they'll want to verify your accounting simulation for themselves"
    - Instead, 99.9% of the time, we'll use pseudo-random numbers that generate numbers using a deterministic formula

- What properties does this function need to have? Well, we'd like to have the following:
    - Should produce numbers over the (0, 1) range without any obvious pattern or correlation between the inputs (satisfies "statistical tests of randomness")
    - Should be repeatable, as we've discussed 
    - Should be "portable" and work on a variety of hardware
    - Needs to be reasonably efficient
    - Needs to be usable with multiple data streams at a time (i.e. parallelizable)
    - Should be well-documented and understood
- With that in mind, we want a function like this:

        u = random();

    - That returns a floating-point number between 0 and 1, with any number between 0 and 1 equally likely to occur
        - Ideally, we start with a "seed" value for this function, but after every call it'll jump to a new state, where the valid range of states (for our purposes) will be the set of integers {1, 2, ..., m-1}
        - There'll be a function "g()" that maps one number in our state to another
            - Notice that since there's a finite number of states, this function will eventually get back to a state we've been to previously after at MOST "M" calls; once we reach that, the function will start repeating its sequence
                - The length of this loop is called the PERIOD of the generator, which should preferably be as long as possible (For a "k"-bit integer, it should be as close to 2^k as we can get it)
                - With a well-designed generator, the "build-up" period to first reach a number in this loop should also be small or non-existent
        - ...and an "output function" h() that can map our current state to an output number in the (0, 1) range
            - For our purposes, "h(xi) = xi / m" should be sufficient

- With those pieces, we'll talk about probably the most common RNG technique, known as LEHMER'S ALGORITHM (pronounced like the monkey-like creature)
    - Here, we start with our seed value in our set of states, and then have the following state-mapping function:

            g(x) = (a*x_i-1) % m

        - Where "%" is the modulus operator, i.e. the remainder (or, more precisely, u%v = u - v*floor(u/v))
        - This is a generalized form of the "linear congruential generator," without a constant-value being added
            - Because of this, Lehmer's algorithm is sometimes known as the "multiplicative LCG"
    - Intuitively, this algorithm is like taking a "pebble" (the current state's number), throwing it HARD down a tiled floor, and then measuring how far into a given tile our pebble landed to get the next value
        - If we analyze this function, we see the following:
            - If our state is ever 0, then the function gets "stuck" at 0 - yikes! We don't want that!
                - Luckily, it turns out that this will never happen if we choose a prime number for "m" - great!
            - Due to our modulus operator, if m < "M", then the next number will always be in our state space
            - Since we're going through numbers in our "period," it actually means our values ARE dependent on the previous numbers!
                - However, if our number of draws is significantly less than M, this is still a "good-enough" approximation of true randomness
    - What should "a" and "m" be, though? Well, that's the tricky part - in fact, the choice of those 2 numbers is the crucial difference between a good generator and a bad one!
        - It turns out to be pretty complicated to work this out, and intimately related to number theory, but the basic idea is that we want m as close to a "full-period sequencer" (i.e to M-1) as possible, so we're not wasting any states
            -  We also want to avoid the "burn-up" period to our cyclical part - which, as it turns out, is eliminated if we pick "m" to be a prime number
            - That doesn't help us with the full-period sequencer, though, and calculating this is much more difficult
                - Some people also argue using powers of 2 for M to allow for faster computation, but that isn't crucial
        - Large periods of m are NOT sufficient to guarantee randomness (since the numbers could all be clustered together), so we need to find a value of "a" that will make this work
            - There are a few suggested values for this, but Lehmer suggested a number ~= 48271
            - For m, it turns out that 2^31-1 is prime, so that's often used to maximize our periods for 32-bit integers
    - Notice here that we're also multiplying a by x_i-1 before dividing, so we need to deal with integer overflow in our implementation
- Overall, Lehmer's is commonly used because it's simple to implement, straightforward to parallelize, and it requires a pretty small state, but the limited period size is a major drawback that scares some people off
    - There are more sophisticated algorithms that overcome that, like the Mersenne Twister, but we won't go into the details of those

- Alright, that's enough to get us a uniform random distribution - but how can we use this to get other distributions, like, say, the normal distribution?
    - To start off, let's remind ourselves about discrete random variables:
        - A RANDOM VARIABLE is just a variable that's "random," and has different values each time we look at it
            - We say its current value is "x," and the possible values it can take are in the state space "chi"
                - "Chi" is pronounced "kai" (it's statistics, not tea)! 
            - The probability of a given value "x" occurring is given by the PROBABILITY DENSITY FUNCTION (PDF)
            - The probability of getting a value <= x is given by the CUMULATIVE distribution function, the CDF
        - A RANDOM VARIATE is an algorithmically-generated version of a random variable with a given PDF "p"
    - This leads us into the topic of EMPIRICAL DISTRIBUTIONS
        - As an example, suppose we're running a fine Italian restaurant, and we want to figure out the average size of the groups that come into our restaurant
        - Being fancy Georgia Tech students, we collect this sort of data on all of our customers, and from that data we can create an "empirical distribution" of the group size
            - You'll need to do something similar using your traffic simulation data in project 2
        - To do this, we'll have integer states from {1, ..., k}, where "k" is the size of the largest group
            - We'll then record how many groups of each size came into our restaurant...
            - ...and calculate the PDF (i.e. the probability of each group size) by just dividing the number of groups of that size by the total number of groups, e.g.:

                    P(group size x) = (# of groups of x) / total # of groups

                - We can then generate a staircase-like CDF by just successively adding each of these probabilities up to 1
    - Once we have an empirical distribution, we can use the INVERSE-DISTRIBUTION method to generate our random variate like this:
        - Select a random number "u" between (0, 1)
        - Compute F*(u), the INVERSE of our CDF (search for the smallest CDF(x) greater than u): we plug in our "probability" u, and get the "x" value that has that range
            - That's it! It works for a TON of stuff!
            - People have derived the formulas for "classical" distributions using this method, like:
                - Equilikely(a,b) = a + (u * (b - a + 1))
                - Geometric(p) = (log(1.0 - u)) / log(p)
                - (a ton more)

- Similarly, these techniques hold for continuous functions; the main difference is that the PDF and CDF are now actual functions, so we need the actual inverse CDF function "f-1(i)"
    - A commonly-used example of this is the exponential distribution
    - "For a number of distributions, though, finding this inverse isn't straightforward at all"

- This is great, but occasionally, if we can't compute the inverse distribution, we can try to instead use the REJECTION-ACCEPTANCE METHOD
    - Basically, if we have a range of x values (a, b) we want a value between, and a curve for our PDF, we'll throw darts at a box with base (a, b) and height equal to the highest value of the PDF
        - We'll compute x/y values for our random "dart," where u1 and u2 are random numbers in the range (0,1):
        
                x' = a + u1*(b - a)
                y' = highestPDFVal * u2

            - If y' is under the curve (i.e. y' < PDF(x')), then we can take x' as a valid sample! Otherwise, we'll just regenerate these numbers and try again
    - There's a slight catch here, though: how many times do we need to try this before we get a value underneath the curve?
        - Since the total area under the curve is 1, the probability a given dart will be accepted is "1 / (highestPDFVal * (b - a))", which is usually fairly reasonable

- In general, you'll want to use the inverse distribution method, with the rejection-acceptance method as a backup if you have trouble calculating the inverse CDF

- OKAY - that was moving a little fast, but there you go! We'll keep going on this journey come Thursday!