//****************************************************************************//
//******* Edge Detection and Hough Transforms - September 23rd, 2019 ********//
//**************************************************************************//
- Professor Dellaert has arisen from his Belgian slumber!
- "Next class, we will have our 2nd participation quiz - COME PREPARED! If you can't make it for a legitimate reason, email us BEFORE the quiz and we'll do our best to take care of you"
- We're skipping our segmentation lectures for now so that we can revisit it when we talk about classification, and so we can start getting into your project 3 stuff on Wednesday
--------------------------------------------------------------------------------
- Now, last time we talked about SIFT and feature matching, but let's go over the rest of chapter 4: edge detection!
- To go from image gradients to edges, we've gotta go through some steps:
- First, we'll do some slight smoothing/gaussian blurring to cut down on our noise
- We'll then try to enhance the edges by playing around with contrast
- Finally, we'll determine which edges are actually edges, and which are just noise
- "Your brain just goes to town doing this work for you, filling in edges and gaps, doing depth perception and whatnot...we don't have that luxury when we're coding things"
- One of the simplest ways to determine edges from noise is just to use a THRESHOLD value, and keep all the gradients above a certain level of contrast
- This'll get us a lot of edges we want, but it'll also include a lot of edges we don't actually care about in the background, etc; but if we use a higher threshold to compensate, we might end up losing legitimate edges
- There's also the classic, annoying problem of backgrounds which're a similar color to the thing we care about, which can lead to missing edges
- Another edge-detecting algorithm is the CANNY EDGE DETECTOR
- "Canny is a Professor out at Stanford who actually works with robotic planning, but in my world, this is the only thing he's ever done"
- This algorithm has a few different steps
- First, we filter the image with a derivative of the Gaussian ("or the Gaussian of the derivative; it's a linear operation, so screw order"), then take the magnitude/orientation of the image's gradients
- We then do NON-MAXIMUM SUPPRESION; since the "edges" our gradient returns will be regions rather than single-pixel wide slivers, we have to thin them down
- To turn these thick lines into curve, we'll put a line along the gradient direction and check what the max pixel is across the edge
- This should turn the image into a pure black-and-white one, but it has the problem that not all "true" pixels survive, and there can be gaps
- To combat this, we'll use HYSTERESIS THRESHOLDING: the start of a new curve has to meet a high-contrast threshold, but to continue it we just need a lower threshold
- "...this is super annoying to program. We could assign it as a homework - I had to code it as a kid - but we'll spare you guys"
- Putting this all together, the hysteresis thresholding gives a much better result than either plain high/low thresholds
- So, that's all well and good, but there's a problem with these sorts of algorithms: the edges they find don't line up with how humans see things!
- Humans tend to only draw the big, "important" edges in an image, but the algorithm just doesn't care; it'll draw any little edge that has a lot of contrast, and will occasionally miss forms that humans consider important because it isn't high-contrast enough
- At Berkeley, there was a research project several years ago where they had multiple people draw the boundaries they thought were correct, and tried to use that to guide their conours using color/texture/etc. gradients (well beyond what'd been done before)
- This worked to some extent, but it still wasn't that amazing; it was obvious which ones were done by the AI
- Of course - as with everything in this world - there are people with Phds who've tried throwing this problem into the deep learning woodchipper
- "This 2016 paper is cool, except that it draws babies with missing eyes. Generally, people who draw babies put eyes there. We should know that. From data."
- Coming up with a "goodness" measure for this stuff is kinda difficult, so many of these papers focus on qualitative measures
- "When I want to learn about a topic, I go to Google Scholar, find a famous paper, and then see who references it and go from there"
- Alright - we're about to cover a really famous algorithm known as the HOUGH TRANSFORM
- To talk about this, we have to first talk about a political topic: voting!
- Let's say we have an image, and we want to identify all the straight lines in the image - how can we do that?
- "This idea for object recognition has faded away, but Hough transforms themselves have started making a comeback"
- So, how would we define a line, parametrically?
- Well, a line is 2D, so we could define it with "y = mx + b", or with polar coordinates (since we only need 2 numbers: an angle and a distance)
- For circles, we'd need 3 numbers (the x/y of the circle's center, and the radius)
- Now, once we've defined this parametric model for our features, we can't just look at a single point to determine what our line/circle belongs to; instead, we have to look across the whole image
- Importantly, we need to worry about performance a LOT; we can't check every possible line combination
- Note that an assumption we're making here is that lines in photos are actually straight (i.e. the camera has been calibrated well)
- However, we CAN'T find straight lines with edge detection, since there's too much noise, missing pixels, etc.
- What can we do to handle all of these? We can use VOTING!
- For each pixel, we'll generate a line that we think fits it the best; we'll then pick
- HOUGH TRANSOFORMS are a way of actually dealing with this voting
- The main idea here is that we record a "vote" for each possible line that an edge point (found via edge detection) touches, and look for the lines that touch the most edge points
- To help us with this, we'll say the image is rendered in "image space" (x/y axes) and that we want to convert each line in the image space to a POINT in "Hough space" (m/b axes, for "y = mx + b")
- This also works the other way; for a given x/y point in image space, there'll be a LINE of m/b parameters in Hough space that all have the same value for
- So, EACH pixel we found using our edge detection algorithm will vote for EVERY possible line in Hough space that passes through it - and we'll then take the line with
- If we look at 2 points in image space, then, the line that goes through both of them in image space will be the INTERSECTION POINT of their 2 Hough space lines
- Due to a bit of noise, the lines often won't exactly intersect for 3, 4, 5, etc. edge points - so, we'll have to take an average of nearly-intersecting lines instead
- One minor issue is that Cartesian coordinates can't represent vertical lines, so we'll instead represent our lines in polar coordinates!
x*cos(th) - y*sin(th) = d
-...where "d" is the perpendicular distance from the line to the origin at th = 0 degrees
- Now, points in image space will map to sinusoidal curves instead of straight lines, and vice-versa
- So, the formal Hough Transform algorithm will be:
- Initialize our polar Hough grid h[d, theta] = 0
- For each edge point (x,y) we found via edge detection:
- For theta = [theta_min, theta_max]
- d = x*cos(theta) - y*sin(theta)
- H[d,theta] += 1 // vote for this line
- Find where H[d, theta] is maximum, and record those line parameters (i.e. d and theta)
- The final detected line is given by: d = x*cos(theta) - y*sin(theta)
- While Hough Transforms are most often done with lines, they can actually work for any shape you can parameterize: circles, squares, taxicabs, eyes, faces, crazy Computer Science professors, you name it!
- In fact, Hough voting used to be used for object recognition before deep learning took off: we would create some parametrization for a portion of an object (e.g. a cow's leg), and then take a small image patch and do Hough Voting using those images
- While this raw technique has fallen out of favor, a version of it IS being used with neural networks
- "A recent paper from Stanford and Facebook by Kaiming He shows how you can use Hough voting on 3D point clouds to get the bounding boxes for different objects"
- Alright, hopefully you thought that was cool - work on your projects, and fly forth!