//****************************************************************************//
//****************** Performance Basics - May 31st, 2018 ********************//
//**************************************************************************//

- Only 16 people coming to lecture in a class with ~45 people...is 10am early all of a sudden?
- First test is next Thursday - in recitation, at MINIMUM, you guys'll get a list of topics, and hopefully a few practice questions (probably not a full practice test)
- NOTE TO SELF: Are there any computers designed for the blind? Half of my laptop is a monitor, massive amounts of computing resources are devoted to drawing graphics (GPUs, system calls, etc)
-----------------------------------------------------

- So, today, we'll start getting into chapter 5 - improving the processor's performance, especially via pipelining!

- Let's say you've been hired by ILM to find the "best performing" workstation out of a few different options
    - What would you be looking at? CPU speed? Number of cores? Amount of RAM? Graphics card(s)? Upgradeability? Parallelizability (how easily it could be run in parallel w/ other computers)? Company reputation? Cost? Designed for the problem we're looking at?
        - ...those are things people actually consider! A computer might be "better" in one metric and worse in another
    - Back in the day, VHS and Betamax were competing home video standards, and Betamax was actually considered to be slightly BETTER from an engineering standpoint - but today, everyone remembers VHS! VHS was "good-enough," it was more cost-effective, they did a better job marketing themselves to Hollywood studios, etc.

- First things first: how do we actually measure performance?
    - There are two important objective metrics for any program:
        - SPACE - how much memory does the program code/data require?
            - If we ONLY care about space, we could have an array of data crammed together - but then we'd have to search for things every time
        - TIME - what is the execution time for the program?
            - If we ONLY cared about fast execution time, we'd want a hash map, but there's a lot of extra memory used by it
    - One example of this trade-off: CISC vs RISC architectures!
        - CISC, 
        - RISC, even though it might take more individual instructions to do the same thing (something)
    - Now, Space and Time are NOT necessarily correlated - in some cases, we can have things that use less space AND have better performance 
    
- How do we actually measure execution time?
    - One metric is that: 
        - Execution Time = sigma(CPI, individual instruction #) * clock cycle time
            - CPI means "Cycles Per Instruction"
        - Execution Time = n * average CPI * clock cycle time
            - N is the number of instructions we're running
    - STATIC instruction frequency is the frequency that an instruction occurs in compiled code; DYNAMIC instruction frequency refers to how frequently a particular instruction is executed when the program runs
        - Loops can mean a program with supposedly only a couple of instructions can actually run forever!
- Looking at this, it seems that there's a couple obvious ways to speed up our processor:
    - Reduce the number of executed instructions
    - Increase the clock speed
    - Reduce the cycles per instructiom
- How can we measure our speedup? Well, basically:
    - Performance = 1 / (Execution Time)
    - A machine "X" is n times faster than machine "Y" per:
        - Y executed time / X execution time = n
            - NOTE: This is different than pure "percent improved", which is (old - new) / old!

- Early on, people tried to focus on optimizing individual instructions - looking at the instructions that took the longest or were used the most and reducing how many clock cycles it took to run them
    - That's still a valid option, but people started having a thought: what if we INSTEAD focused on improving the throughput of the processor as a whole (i.e., how many instructions we can run in a given span of time)?
- First off, to measure how long an instruction takes, "wall clock" time can be a little unreliable - we're trying to measure in picoseconds, after all!
    - So, to handle this, we instead figure it out in "CPU time":
        
        Execution Time = Instructions per Program * Clock Cycles per Instruction * Seconds per Clock Cycle

    - Now, some instructions take more clock cycles than others
        - Remember: the clock cycle can only be as short as the longest microinstruction
        - To deal with this, we can ammend the equation to be:

            CPU Clock Cycles = Sigma(CPI for instruction * Number of instructions for this class, # of different instruction length classes)

- Again, how can we speed this up? Fewer instructions executed, reduce # of clock cycles per instruction, and reduce the time per clock cycle - those jump out immediately
    - But there are a few more, less obvious ways we can speed this up:
        - Improve the datapath itself! We can save time by adding extra buses, using registers more, putting components physically closer, etc.
        - Break up instructions - this way, complex instructions can be broken into 2 or more simpler, faster-to-run ones
    
- AMDAHL'S LAW:

        Time After = (Time affected / Amount of Improvement) + Time Unaffected

    - An example of this:
        - 20% of a processor's time is spent adding
        - A new ALU is 4 times as fast (speedup) at adding
        - What's the overall improvment? well, assuming it originally took 10 seconds, then
            
            Time After = (0.2) * 10s / 4 +  (0.8) * 10s = 2/4 + 8 = 8.5 seconds
            Speedup = 10/8.5 = 1.18

        - So, adding an ALU that's 4x as fast won't magically make our whole computer 4x faster; it'll only speedup the parts of the process it affects, and only when it's being used!

- BENCHMARKS - these can get pretty complicated (depends on what program you pick, etc.), but in general, the best choice is to use real programs, chosen from typical applications, using a large amount of code (to discourage cheating and optimizing JUST for the benchmark)

- IN THE SLIDES: 3 problems near the end about CPI
    - Professor Moss: "I like these questions. In fact, I like them so much that something similar MIGHT just end up on a test in the near future..."

- Okay, everything UP TO NOW is fair game for the test
---------------------------------------------------------

- So, let's now get to the main event - PIPELINING
    - Consider Chipotle - whne you get to the first person, they ask you if you want a bowl or a burrito - then, the next person asks you what kind of meat you want - then, the next person asks if you want rice or beans - then, the next person gives you any toppings you want - then, finally, the last person acts as the cashier and takes your payment
        - The SECOND I finish telling the first person my general order, they start helping the person behind me - EVEN THOUGH I'm still in the line, finishing the rest of my order!
    - So, right now, we fetch our instruction, wait until we're COMPLETELY done executing it, and then go back to fetch the next instruction
        - ...meanwhile, fetch is just sitting there, not doing anything. Why can't we have it do something while it's waiting?
        - "But wait! We only have one bus, so we can't do anything right now..."
            - WELL THEN, we'll have to improve the hardware!
    - By doing this improvement, we go from executing our instructions looking like this, waiting for each whole instruction to finish:

            ====
                ====
                    ====
                        ====
                            (...)

    - To THIS, handling multiple things at the same time:

            ====
             ====
              ====
               ====
                (...)

    - ...but, of course, the devil's in the details, and getting to this point isn't as easy as just slapping on a new bus; getting our hardware ready to do this will involve some major challenges and pitfalls

- What're those challenges, and how will we deal with them? Tune in next week to find out! In the meantime, enjoy the weekend!