Jake's CS Notes - Data Structures / Algorithms

//*****************************************************************//
//**************Binary Search Trees - February 1st, 2017**********//
//***************************************************************//

-"We have the best terminology in CS" - Prof. HB
-The test is ONE WEEK from today; be prepared!
-----------------------------------------------------------------

-So, important question: "Why are tree traversals important?"
    -e.g. level-order traversals could be used to find the oldest element in a tree; in-order traversals can retrieve a sorted list of data, etc.
        -In other words, look up their applications later

-REMEMBER: for BSTs, left is smaller, right is larger

-The purpose of a BST is to be used for sorting and searching elements

-The idea of BINARY SEARCH TREES comes from the idea of "Binary search" in arrays, where you split the list repeatedly in halves
    -BSTs simply STORE the nodes in a way that traversing them IS a binary search, with the nodes ALREADY sorted by adding them to either the left or the right depending on their value

-To search in a BST:
    1) Start at the root node
    2) Check the data; if you HAVEN't found it...
        -If what you're searching for is LESS than the current node, go to the left
        -If what you're searching for is GREATER than the current node, go to the right
    3) Repeat until you either find the data, or you reach a null node
        -This should be O(log n), as it is identical to a binary search
    PSEUDOCODE:
        search(data, node)
            if (node == null)
                return false
            else
                if (data == node.data)
                    return true
                else if (data < node.data)
                    return search(data, node.getLeft())
                else
                    return search(data, node.getRight())

-To ADD a new element
    -Follow the same steps as searching until you either find the data, OR reach a leaf node
        -(for this class, we will NEVER have duplicates in the tree; HOWEVER, this depends on the implementation)
    -Once the proper position is found, if the data does not already exist there, then add it to that position
        -This whole procedure should take O(log n), as it's basically just following the same steps as for searching, and the actual process of adding the node is O(1)

    PSEUDOCODE:
        add(data, node)
            if (node == null)
                return new node w/ data
            else
                if (data == node.data)
                    {depends on implementation}
                else if (data < node.data)
                    node.left = add(data, node.getLeft())
                else
                    node.right = add(data, node.getRight())
            return node //keeps nodes the same, as we replace the node w/ itself UNLESS the "node == null", where we add the new node containing the data we want to add

        ANOTHER implementation:
        //this is the "look-ahead" implementation, and it works just fine; BUT, we want you to get used to the "pointer reinforcement" technique above if you can, as when we get to AVL trees, doing it w/o that technique is HARD, and needlessly complex

        add(data, node)
            if (data < node.data)
                if(node.left == null)
                    node.left = new Node(data)
                else
                    add(data, node.left)
            else
                if(node.right == null)
                    node.right = new Node(data)
                else
                    add(data, node.right)

-First, some terms for a BST:
    -The SUCCESSOR of a node is the next-highest value node in the tree
        -Obviously, all nodes except for the highest-value node itself (the one most to the right) have a successor
    -The PREDECESSSOR of a node is the next-smallest value node in the tree
        -Like the successor, ALL nodes in the list have a predecessor EXCEPT for the lowest-value (farthest left) node in the tree

-So, adding/searching really isn't too bad; how do we remove data, then?
    -3 cases: the node could have NO children, 1 child, or 2 children
        -Removing a node with 0/1 child nodes is pretty easy to deal with; 2 children is MUCH harder
    -0 CHILDREN: We just set the node to null to "disconnect it" from the tree. Easy, right?
    -1 CHILD: We don't want to lose the child when we disconnect the node, BUT, since the current node is greater/less than its parent node, its child must ALSO be greater/less than the parent node; so we just change the parent node's pointer from the current node to the child; aka, the child replaces the parent node
    -2 CHILDREN: We replace the node with EITHER its predecessor or successor (depending on the implementation), which is GUARANTEED to itself have only 1 child or no children (and thus be easier to itself remove)
    -COST TO REMOVE: O(log n), as locating the node to remove is O(log n), but re-assigning the node is still itself an O(1) processu

    -PSEUDOCODE:
        -

-UPDATE METHODS: