//***************************************************************// //*****************B Trees - March 1st, 2017********************// //*************************************************************// -Exam is in ONE WEEK (AHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH) -Don't be a child and complain if TAs take a week to grade homework... -Don't whine for exam topics when it's on the syllabus... -...apparently these are actual issues on Piazza? -"If you thought AVLs were bad, wait till you hear about 2-4 trees" -...2-4 trees WILL be on the exam! -------------------------------------------------- -B trees are similar to BSTs, but with one HUGE difference: each node can hold MULTIPLE DATA ITEMS -The "order" of the B-Tree describes how many data items/children the node can have -e.g. If the "order" of the B-tree is 5, that means each node can have up to 5 children, and each node can have 4 (5-1 = 4) data items -Order is based on # of children -THE AMOUNT OF DATA EACH NODE CAN HOLD IS (Order - 1) -Data in children will be ordered "least to most" -B-Trees DO have balancing, but it's different from AVL balancing -We will be focusing on 2-4 TREES; these are a subtype of B-Trees that have an order of 4 -i.e., each node can hold up to 3 pieces of data, w/ up to 4 children -The data items WITHIN each node are stored in ASCENDING order (least -> greatest) -The first child is less than the first data item, the 2nd child is between the first/second data item, the third child between the 2nd data/3rd data, etc. [x y z] / / \ \ A B C D -All data items in node A are less than X, everything in B is between X and Y, everything in C is between Y and Z, and everything in D is greater than Z -"[]" is the node, "x y z" are the data elements in "[]" -In a B-Tree, the depth of ALL leaf nodes is the same -If there are "n" data items in a node, then there must be either 0 OR "n+1" children -BECAUSE of these rules, a B-Tree will ALWAYS be balanced, and searching/adding/removing from a B-Tree will ALWAYS be O(log n) -SEARCHING is done by -Starting at the root -Compare the first data item in the node w/ what you're searching for -If the data you're looking for is LESS THAN that 1st data in the node, then go to the "nth" child, where "n" is the position of the data in the node -If the data is equal, hey, you've found the node! Exit and give yourself a cookie -PSEUDOCODE- search (data, node) if node == null return false else for (i = 1, i < n, i++) if data == node.data[i] return true else if data < node.data[i] child = node.getChild(i) return search(data, child) -When ADDING to a B-Tree, the new data item is ALWAYS added to a leaf node, not to an internal node -We follow the same steps as searching until we find the data in the tree, or reach a leaf node... -Then, if the data is NOT in the tree (i.e. we're at a leaf), then we add the data to that leaf node -If the data IS in the tree, then we can do a few different things depending on implementation. We might do nothing, we might replace the existing data, we might panic and trigger an emergency disco sequence, etc. -OVERFLOW- after the data is added to a leaf, the node might have too many data items for the type of tree (e.g. for a 2-4 tree, we can only have a max of 3 data items, but after adding we might have 4) -If this is the case, the MIDDLE item (in a 2-4 tree, either the 2nd or 3rd node) is "promoted" to be the parent of the existing node; then, we SPLIT the existing node into a left half (the data values less than the "promoted" one) and a "right half" (the data values greater than the promoted node) -HOWEVER: If the parent node ALREADY exists and has room for another data item, we instead move the "middle" data item to the existing parent instead of creating a new one -WE STILL SPLIT THE CHILDREN IN THE SAME WAY, however -If the parent node DOESN'T have room for another element, WE STILL MOVE THE MIDDLE ITEM TO THE PARENT, then recursively call "Overflow()" on the (now-overflowing) parent! -Sometimes when doing this, we end up with a node with 5+ children (i.e. more children than we're allowed), meaning we have to split up the children again, with all of the nodes less than the removed "middle" node becoming children of the left child, and all of the node greater than it becoming children of the right child... -PSEUDOCODE- add(data, node) for (i = 1, i < n, i++) if (data == node.data[i]) //do something w/ duplicate else if (data < node.data[i]) if node.numberOfChildren() > 0 child = node.getChild(i) add(data, child) if (child.numItems > 3) break child into 2 nodes, promote "middle" item return else add data to "ith" position in this node if (node.numberOfChildren() > 0) child = node.getChild(i) add(data, child) if (child.numItems > 3) break child into 2 nodes, promote "middle" item else add "data" to last spot in this node -With REMOVING, it actually gets even messier...but we'll get to that on Friday. READ SAIKRISHNA SLIDES TO PREPARE