u//************************************************************************//
//********Intro. to Dynamic Programming - April 19th, 2017****************//
//***********************************************************************//
-"Two-factor login, heed the professor's call..."
-...but really, 8 minutes in, and HB is still wrestling to get the presentation up and running because of the new two-factor login policy
-"The document camera is locked down currently, so let's see how this goes..."
-------------------------------------------------
-So, we introduced Kruskal's algorithm last time but didn't quite get through an example (actual example on T-Square)
-As a reminder: with KRUSKAL'S MST algorithm, we start on a weighted graph and find the edge with the LOWEST weight
-We then add the two vertices connected to that edge, and that edge
-We then find the NEXT lowest weight edge, and add that...and so on
-If we find an edge is connected to one of the vertices we've ALREADY looked at, then we "connect" those two edges together into a group; if two subgroups are joined by a new edge, we then join those subgroups together into a single group
-If adding an edge would create a CYCLE (i.e. connect two vertices that are already part of the same group), then we do NOT add it as an edge
-"We CONSIDER the edges in order of edge-weights, but we may not actually add all the edges we're considering"
-Once we have formed a single group that connects ALL of the vertices, then we're done!
-So, in order to execute Kruskal's, we need a list of edges and a list of vertices for each of the subgroups
-DYNAMIC PROGRAMMING-
-This is *GASP* our LAST topic of the semester: DYNAMIC PROGRAMMING
-"I like this because it's a VERY powerful idea, but by itself, it's really not all that complicated"
-Some VERY complex problems actually can be boiled down to a bunch of very simple sub-problems that have been put together; dynamic programming is a way of doing this
-So, actual definition: DYNAMIC PROGRAMMING is a technique where we break a complex problem into a collection of simpler, recurring subproblems, SOLVE each of those subproblems ONCE, and then store the solutions to those subproblems and use them to solve the complex problem
-Technically, this is considered the "Bottom-Up" dynamic programming (DP) approach, where we break the problem down into simpler problems, solve ALL the sub-problems first, and then move up a step to the larger problem
-In this scheme, the subproblems are ALL independent (i.e. the solution of one sub-problem is NOT dependent on the solution of another)
-Classic example of an application: Matrix multiplication!
-"So, in matrix multiplication, you take a given row's entries and multiply it by a column's entries, then sum them up"
-So, DP allows us to efficiently solve a seemingly complex or time-consuming problem if it has:
-(relatively) Simple subproblems
- (other thing)
- (...another thing)
-So, one example of this: String Subsequences
-A subsequence is not *quite* the same thing as a substring;
-e.g., from a string "ABCDEFGHIJK",
-ACEGIJK is a valid subsequence
-DFGHK is a valid subsequcne
-DAGH is NOT a valid subsequence, since it didn't preserve the relative ordering of the characters
-A common problem is finding the LONGEST COMMON SUBSEQUENCE (LCS): given 2 strings X and Y, we want to find the longest subsequence that BOTH X and Y share
-e.g. For "ABCDEFG" and "XZACKDFWGH", the longest common subsequence is "ACDFG"
-Seems kind of arbitrary at first, but this problem does have real-world applications is gene sequencing, DNA testing, etc.
-So, how do we do this using DP?
-Let's say that L[i, j] is the length of the LCS of the substrings from X[0] to X[i], and from Y[0] to Y[j]
-Let's also say that -1 is allowed as an index (to prevent OutOfBounds errors in the upcoming algorithm), where L[-1,*] = 0 and L[*, -1] = 0 (i.e. "the null subsequence of X has no match w/ any subsequence in Y", and vice-versa)
-We can then say that, in general, this is our problem-solving strategy for finding a new LCS for the substrings:
-If X[i] == Y[j], then L[i,j] = L[i-1, j-1] + 1 (i.e. if the last characters match, then the LCS length for this substring is equal to 1 + the length of the LCS of tese substrings w/ the last characters removed)
-If X[i] =/= Y[j], then L[i,j] = max(L[i-1, j], L[i, j-1]) (if there's no match, then the LCS length is whatever it was with the non-matching characters removed)
-Using this algorithm, we can use the previously found LCS lengths to calculate the new ones, meaning we don't have to start over from scratch!
-That's the power of DP; by remembering our old results, we can avoid having to recalculate the same things over and over!
-In Pseudocode:
lcs(String x, String y) {
int[,] L = new int[x.length][y.length];
for (int i = 0; i < x.length; i++) {
l[i, -1] = 0;
}
for (int j = 0; j < y.length; j++) {
l[-1, j] = 0;
}
for (int i = 0; i < x.length; i++) {
for (int j = 0; j < y.length; y++) {
if (x.charAt(i) == y.charAt(j)) {
l[i, j] = l[-1, j-1] + 1
} else {
l[i, j] = max(l[i-1, j], l[i, j-1]);
}
}
}
}
-An example of this LCS algorithm in action:
-Let's say we have strings "bacbad" and "abazdc"
-We first fill in the first row/column as zero, since nothing will match the blank/"null" character
-We then go from left-to-right and top-to-bottom
-If there's a match, then we take the entry to the TOP-LEFT of the current entry and add 1 to it (top-left + 1), then put it in the entry
-If there's NOT a match, then we say the entry = the maximum previous entry in the row/column (i.e. immediately to the left/top) of the current entry
b | a | c | b | a | d
0 0 0 0 0 0
a|0 0 1 1 1 1 1
b|0 1 1 1 2 2 2
a|0 1 2 2 2 3 3
z|0 1 2 2 2 3 3
d|0 1 2 2 2 3 4
c|0 1 2 3 3 3 4
-The largest entry is "4," so we know the LCS is 4 characters long
-How do we know what those characters are? Well, we pretend that for each entry, an arrow is pointing from the entry we took the value from
-So, for each entry where we incremented the value from the top-left corner (i.e. the arrow is pointing to the bottom-right), we know that that letter was a match!
-Going backwards from the bottom-right entry, we see that this subsequence is "abad"
(***ADDENDUM - April 21st, 2017)
-Woke up at 8:53, ran here in a panic, still somehow beat HB to the room...
-"Graphs WILL be on your final, and could show up as either coding questions or diagrams; dynamic programming / LCS could make an appearance as a diagramming question"
-Remember that the final IS CUMULATIVE; we can cover ANYTHING from this semester. What were the most important topics? What did the previous exams not cover? Be prepared for anything!
-Remember, the homework is due on THIS SUNDAY; AFTER MONDAY, NO REGRADE REQUESTS WILL BE ACCEPTED
-Also: FILL OUT CIOS!!!
- Another example (* indicates a match, meaning we've taken the top-left value + 1 for that entry):
t | a | c | o | c | a | t
0 0 0 0 0 0 0 0
t|0*1 1 1 1 1 1 *1
o|0 1 1 1 *2 2 2 2
g|0 1 1 1 2 2 2 2
a|0 1 *2 2 2 2 *3 3
s|0 1 2 2 2 2 3 3
-So, the largest entry (in the bottom-left) is "3", so we know the LCS is 3 characters long
-To reconstruct the string, we follow the arrows back up until we hit the top-left corner; doing this, we get the substring "toa"
-----------FROM THIS POINT, NOTHING NEW WE TEACH'LL BE ON THE FINAL-------------
-(...they taught nothing new after this point)