//****************************************************************************// //**************** Load Balancing (cont.) - April 21st, 2020 ****************// //**************************************************************************// - Currently, if I do raw grading scores, there are 3 people in the A range, 14 in the B range, and 6 in the C range - I'd tried to give enough extra-credit to avoid having to use a class-wide curve, but given the extraordinary circumstances with the coronavirus, I do think some extra leniency is warranted - Because of that, I'm considering a "letter-sign shift" in the grade assignments, where a B+ becomes and A-, an A becomes an A+, etc. - This isn't a huge shift - just ~3.3% - but hopefully it should give us a more normal-ish looking curve - In addition, it also seems that the final being a full 30% of the course when I've been less available for office hours since the break isn't fair, so I'm going to have the final work like this: - If the final raises your grade, you can count it as 30% - OTHERWISE, it will only count for 15% of your grade (meaning your total course grade will sum to 85%, rather than 100%) - Programming Project 3 is due tonight, so PLEASE make sure you're working on that! - Homework 4, for extra credit, is still out, and also due tonight - For problem 2, you can think of this Toeplitz matrix as being like a polynomial multiplication problem, like with FFT - - "My philosophy is that turning something in late is better than turning in nothing at all, so if you need extra time reach out to me 1-on-1 and, while you probably won't get full credit, something can be arranged" -------------------------------------------------------------------------------- - Last time, we were talking about LOAD BALANCING - One prominent feature of physical simulations is that data tends to have some sort of locality/spatial relations, and the density of the data at points can shift over time - Think of a wind tunnel simulation, where the air can shift around and move - We had a STATIC load balancing problem, where we wanted to minimize computational imbalances, and minimize communication across cuts - and these 2 goals are usually opposed! - Typically, if we end up with cuts that cross a small number of edges, it means - The DYNAMIC load balancing problem, then, also tries to minimize the movement of data from the old partitions to the new partitions we choose - Minimizing the movement of data from the old to the new partition is the (TODO: on lecture) - We tend to prefer incremental algorithms that can update the data in small increments, and small changes in the input result in gradual changes to the partitions - So, those are our problems; let's take a look at some solutions! - There are 2 types of models for these problems: - GEOMETRIC models are where coordinates for our data items are available, and are great for problems where the data is in an actual physical space - COMBINATORIAL models are more abstract for non-physical problems, and are where we don't have any physical locations but we DO have connectivity information for our data items - Let's look at some geometric solutions! - The RECURSIVE COORDINATE BISECTION (RCB) algorithm divides work into 2 equal parts by cutting it with a flat line/plane, then recursively divides each of THOSE portions to end up with an equal amount of work, lternating the direction we cut - Why alternate the dimensions (i.e. horizontal, then vertical, instead of just all horizontal)? Because of a ratio: Surface Area to Volume! - By alternating cuts, we're keeping our subdomains as close to square as possible, trying to minimize the size of our cuts - This algorithm is hierarchical, and is IMPLICITLY incremental with the cuts changing gradually to match the data, making it a good fit for dynamic load-balancing problems - So: - Pros: - Fairly simple and efficient to implement - Regular subdomains - Doesn't need connectivity information - Cons: - No explicit control of communication costs (since we don't use any connectivity information) - Can generate disconnected domains - Basically, the partitions it generates aren't guaranteed to line up naturally with the problem space, which can make it more awkward/inefficient to calculate (e.g. ) - Partition quality can be meh - Needs geometric coordinates - Still, though, this algorithm is commonly used in particle simulations, volume rendering, crash simulations, etc. - Another approach: the SPACE-FILLING CURVE PARTITION (SFC) - The idea here is that we have a space-filling curve that completely fills a domain, and we can apply it recursively to obtain however much granularity as we need - Given a coordinate, we can find its partition quickly, and vice-versa - We can use this for partitioning as follows: - Run a space-filling curve on a grid containing all our data points - Order objects based on their 1D position along the curve - Evenly split the data along that 1D curve (i.e. so the points are evenly distributed) - Doing this gives us non-rectangular but still pretty compact sub-domains, and figuring out where to divide is SUPER simple once we have this space-filling curve function (lie Hilbert's curve) - We can make this algorithm incremental, too, by taking a given grid square and recursively splitting it where we need more data! - So: - Pros: - Simple and fast - Maintains geometric locality (good for dynamic balancing) - Linear ordering of objects can give great cache performance - Partitioning and cache-blocking are distinct, but definitely related - Cons: - No explicit control of communication costs (again, it's wholly determined by the data's position) - Can still generate disconnected subdomains/partitions - Partitions are often lower quality than PCB - Still need geometrics coordinates - Then, for combinatorial solutions - First up, MULTI-LEVEL GRAPH PARTITIONING! - The idea here is that we construct small approximations of our graph repeatedly until there's one we know how to partition easily, we partition that easier problem, and then re-scale the problem back up to the original graph (making some slight changes as we scale it up to keep it good) - This multi-level approach has a BUNCH of sub-approaches; the default approach is okay for static partitioning, but for dynamic partitioning some approaches use a "diffusive strategy" to take some information from the original graph into account, keeping the changes incremental - So: - Pros: - High-quality partitions known for MANY types of graphs and applications (e.g. highly optimized engineering models, which take edges into accounts) - Can control communication costs - Widely used for static partitioning - Cons: - More expensive than geometric approaches - Difficult to make incremental - So, what are some applications of graph partitioning? There's tons! - Finite element analysis (?) - "Multi-physics" simulations (rarely need to rebalance, but need high-quality partitions to start) - Linear solvers - These are square and structurally symmetric, and decomposing based on the mesh indices is a natural fits - This graph-based approach to load balancing is very popular in scientific computing, but it has a flaw that becomes evident in non-spatial applications: it assumes the number of edge cuts is the same as the communication volume - In reality, this ISN'T always true! Some edges get used much more than others - Graph models also tend to assume a symmetri, square (???) - So, is the graph based model good enough? - In mesh-based applications, probably! It works well in prctice and the structure of meshes guarantees low vertex degrees and decent partitions - For irregular networks, no, it doesn't; - If you want to play around with some of these approaches, a good resource is the Zoltan Toolkit (which was party developed at Georgia Tech!), which lets you write code and freely switch between these algorithms - So, to compensate for the problems of the graph model, some researchers have tried to devise the HYPERGRAPH MODEL - We won't have time to cover the details, but be aware that it's a thing - Okay, I'll have office hours from 1:00 to 2:00 today to talk about the project if you have any questions; thanks for showing up today and sticking with it through this semester! - I'll