//****************************************************************************//
//*********** Feature-Based Image Alignment - September 25th, 2019 **********//
//**************************************************************************//

- "Okay, today I'm excited; we get to talk about MY research area!"
    - Structure from motion is SUPER cool, but we're going to ease into it with feature-based alignment first
- Also, don't forget that project 2 is due FRIDAY
    - "Pytorch is GOLDEN in industry right now, so this is actually really useful stuff you're learning"
--------------------------------------------------------------------------------

- Now, between any 2 images, there's going to be some kind of geometric transformation between them (either 2D or 3D)
    - Let's say we have 3 photos of the same scene but with the camera translated a bit (like a panorama, but the lens isn't allowed to rotate); how can we stitch them together?
        - Well, all 2D alignment problems involve making a set of feature matches and figuring out what transformation was responsible for going from one image to the next
            - To do this, after we get those set of matches, we need to determine the best parametric model "p*" that defines the transformation
            - If we're allowing translation ONLY, then we would say this model is

                    f(x, t) = x + t

                - ...where p == t
            - So, for instance, let's say that a point "x1" in the first image is at [600, 150], and it's position in the second image "x1'" is [50, 50], and the parametric model is the translation one above, "p" would mean:

                    x1' = f(x1, t)
                =>  [600, 150]  = [50, 50] + t
                =>    t = [-550, -100]

        - So, what if we have MORE than 1 point - what should we do?
            - Your first guess would be to take the average of all these points, right? And it turns out that's exactly right!
                - We want to minimize the "sum of squared differences" between all the points and our calculated offset vector (or "parameter vector") "p":

                    E_ls = sum{ (f(x_i, p) - xi')^2 }

            - However, it turns out this can be generalized! We can solve for "p," the best guess to minimize this error!
                - Most of the time, f(x, p) is going to be linear (i.e. both of the vectors), so that:

                        f(x, p) = x + J(x)*p

                    - ...where "J(x)" is the Jacobian of x, and "x" is a 2x1 vector and p is an Nx1 vector (meaning J(x)*p is a 2xN vector)
                - Therefore, we find the loss function becomes

                        E_ls = sum{ (J(x_i)*p - dx_i)^2 }

                - Differentiating this equation and setting it to 0, we can use the chain rule
                    - (confused here??)
                    - The Jacobian transpose here just comes from multivariable derivative rules, where the chain rules requires it for the multiplication to work
                - So, going forward a bit, we end up with the Hessian, where A is an NxN matrix:

                        [sum(J^T(x_i)*J(x_i))]*p = sum( J^T(x_i) * dx_i)

                    =>  A*p = b
                    =>  p*= A^-1 * b

                - So, this looks complicated, but it turns out the average DOES come out of this, since all 2D models are linear! (with the exception of perspective transforms)
                    - Now ,for translation J = I, which means we end up with an equation where:

                         p = sum( dx_i )
                      => p* = 1/n sum( dx_i )

                    - So, after a LOT of mathematical haggling, it turns out that the optimal P really is just the average of all the "flow vectors"/derivatives/differences!

- Actually, I lied: the euclidean transformation is NOT linear, because it has a sine/cosine term for theta!
    - That's an issue, since it means out "mean" finding doesn't apply now!
        - so, what do we do? We still try to minimize the sum of squared errors, but unlike a linear model, here we have to linearize our model here around some guess, which gets us a linear approximation of the true model using our Jacobian term
            - So, this gets us a Hessian term again, but we now have to (???)
            - "This is a SUPER important algorithm, called NONLINEAR LEAST SQUARES"

- Okay; let's now talk about something SUPER important in computer vision: the Projective transformation!
    - Here, the Jacobian transformation is a bit more complicated than the previous examples
        - (crazy linear algebra equation I don't have time to type out)

- ...okay, that's the math. Let's use it!

- Let's talk about the RANSAC algorithm!
    - What is this? Well, when we're making these feature matches across objects, we're just doing it based off of the descriptors, but we know that there are some false-positive "outlier" matches that'll mess our stuff up if we try to match them up, etc.
        - So, RANSAC is an algorithm to try and get rid of these outlier points so that only the "true" matches remain
            - Why don't we just take the median instead of the mean, and have a better error function instead? That actually is a valid approach, and some people do that - but not us!
    - So, that sounds great! How can we do this?
        - Well, how would we get rid of outliers in a 2D scatter plot? If it looks like it's too far away for the main group, we get rid of them!
        - We're trying to do a similar thing here:
            - To find the "main group," we pick 2 points at random and fit a line between them, and say its "support" is the number of points within distance "d" of the line
                - We do this for all pairs of points, pick the line with the most inliers, and then get rid of all the rest of the points that aren't within "d" of the line - that's it!
    - Here's the actual details of the algorithm:
        - Randomly select "S" points
        - Instantiate a model (i.e. a line)
        - Get all of the inliers within your chosen distance "D"
            - How do we determine this distance threshold?
            - To do this, we use a gaussian distribution and a Chi-squared distribution (the details aren't super important)
        - If the match is above a "good enough" threshold we set (i.e. enough inlier matches have been found for our purposes), just stop and return
        - Otherwise, repeat for "N" trials, and return the best model we found
            - How many samples should we take? This is a VERY important question ("It's DEFINITELY going to be on the midterm")
            - So, it depends on how many points we're looking for, and how many outliers there are
                - If the portion of outliers is "e", then the portion of inliers is p = 1 - e
                - Therefore, the probability of getting a sample of size "s" with ALL inliers is:

                    p^s

                - ...and the probability of a sample with any outliers is:

                    1-p^s

                - So, the probability of getting "N" outliers is:

                    (1 - p^s)^N

            - What we actually WANT, then, is for the probability of getting N outliers to be under some probability (e.g. having a 99% chance of getting a good match)
                - So, if our threshold certainty P=0.99, then we want the probability of getting "N" outliers to be:

                    p(N outliers) < 1 - P

                - In other words, we want:

                    (1 - p^s)^N < 1 - P

                - Which means we can solve for the number of trials "N" we need!

                    N > log(1-P) / log(1-p^s)

- Now, for translation, we only need 1 match to figure it out; for homography, we need 4 points; and for Euclidean transforms, we need 2

- Okay, we'll keep going on with this stuff next week!