//*******************************************************************// //***********Rabin Karp String Search - April 5th, 2017*************// //*****************************************************************// -Well, when it rains, it pours...and generates tornado warnings -In other news, CS 2051 test: I HAS TO STUDY NOWS!!!! -...but, gripped by the endless curse of colossal laziness, I'm kinda hoping for a storm delay (which, c'mon, won't happen) -Note From Future Me: It did not happen (the storm delay, I mean; the test very much did happen) -Test 3 is in ONE WEEK -AVL Coding questions ARE FAIR GAME! Will probably focus on rotations -Will include matching questions, a few where you have to trace pattern matching, and instead of 1 big coding question, 2 or 3 smaller coding questions -...of course, be prepared for anything -------------------------------------------------------------- -Well, let's start with the last pattern-matching algorithm: Rabin-Karp! -Hey, I spelled it right this time! -The 3 previous searches we did: -BRUTE FORCE -The ONLY one we went over that has no pre-processing step -It's considered a "naive" algorithm; it takes the most obvious, simple approach -BOYER-MOORE -Uses a last-occurrence table -Starts from the back of the pattern, uses LOT to shift efficiently up -Lines up matching characters, but still has to check them -KMP -Generates a failure table -Uses prefixes / suffixes -RABIN-KARP is different from these 3 since it does NOT directly compare characters; instead, it uses a "rolling hash" to calculate a hash pattern for the text substring we're searching through, and we compare the HASHCODES -NOTE: If the hashes match, it's possible we still have a collision, so we STILL have to check the characters to verify it's a valid match -O(n) to generate the initial hash, but O(1) to shift it over -To generate the initial hash for the pattern, we take the sum from 0 to m-1 of character(i) * BASE^((m-1) - i) -In code: String pattern; int hash = 0; int base = 10; for (int i = 0; i < m.length; i++) { hash += pattern.getCharAt(i) * pow(base, m.length - 1 - i); } -e.g. for the message "347," it would be hash = 3*100 + 4*10 + 7 = 347 -This function is sometimes called "Rabin's fingerprint function" -We then calculate this hash for the substring from 0 to pattern.length - 1 in the text we're searching through -To "roll" this hash to the next character, we say: newHash = base * (oldHash - (oldChar * base^(pattern.length - 1))) + newChar -So, let's do an example, with CHARACTERS this time around -Let's say "a" =1, "b" = 2, "c" = 3, etc., to make the math easier -In your actual code, you'll just use the ASCII values of each character -Let's also say our "base" is 2 (for example's sake) pattern: bac text: abcbabac -The FIRST THING we should check is to see if pattern.length > text.length; if it is, then we'll NEVER find a match, so we shouldn't waste time preprocessing and should just return "false" right away -In this case, "bac" is smaller than the text, so let's move on -To generate the hash for "bac", we say: -"hash" = "b" * 2^(3-1) + "a" * 2^(3-2) + "c" * 2^(3-3) = 2*4 + 1*2 + 3*1 = 13 -We then do the same thing for the first 3 characters in the text, "abc," and get 11 -This doesn't match our pattern's hash, so we "roll" the hash over by 1 -"bcb"s hash = (11 - 1*2^(3-1))*2 + 2 = 16 -Still not match, so we keep going -Once we DO find a match, then we are NOT done; we might have just happened to find a collision -So, we go in and compare the characters old-fashioned style; if all the characters match, YAY! Otherwise, we have to keep looking -We keep going until we hit the end of the text; at this point, the substring will be smaller than our pattern, so we're done -So, what's performance for Rabin-Karp? -The best case is O(m), where "m" is the length of the pattern string -This is in the event that the first "m" characters match; this best case exists for pretty much all the sorts that we just looked through -In the WORST-CASE, where we have a bad hash function and all of the hashes match, we end up with O(m*n); the algorithm basically is just brute-force search at this point -In the AVERAGE case, Rabin-Karp is O(m + n) -Another example (illustrate the worst case): pattern: c a b text: a b c b a c -With a base of 2, the "abc" hash is 1*4 + 2*2 + 3*1 = 11, and "cab" is