//****************************************************************************// //**************** Pose Estimation - September 30th, 2019 *******************// //**************************************************************************// - Ah, lack of sleep: turning everything I hear into an odd mixture of a lullaby and someone reciting "Jabberwocky" - Alright, we're on September 30th and - Project 3 will be out tomorrow, and'll be about pose estimation, finding correspondences between images, and - - *at this point Professor Dellaert's microphone died* - "...I'll just project and totally kill my voice" - These projects are obviously pretty complex; you have the code, the python notebook, the textbook, the slides, and you have to synthesize all of them - but the slides should be helpful in guiding you through what to do -------------------------------------------------------------------------------- - So, last week we were talking about image alignment and the RANSAC algorithm, where randomly select some points, generate a straight-line regression, check the number of inliers, and repeat until we get a good enough model or have run for too many trials - If we don't know the percentage of outliers in advance, then we'll start off by assuming it's 50% - We'll then run the algorithm for some number of trials "N," and update this based on the results we find - We also talked about 2D image alignment, where we have a set of matching points and try to fit a parametric model representing the image transformation - Today, we're going to move from this 2D alignment onto 3D alignment, where map 3D points into a 2D image! - *cue video from Oculus Quest, which Professor Dellaert casually mentions his grad students worked on* - This device actually builds a 3D map of the environment in real-time - "In principle, this is what we'll be doing for project 3!" - ...*shudder of fear goes through the class* - "...I mean, it won't be THIS advanced, but the same ideas apply" - So, what's pose estimation? - POSE ESTIMATION is where we try to take some 2D points in an image that we know the corresponding 3D points for, and try to map a camera matrix that'll fit there - The 3D to 2D projection, wayyyyy back from our projection lesson, looks like this: X = K * [R|t] * X = PX - Geometrically, "t" is a vector from an origin we choose to the camera - R, though, is the 3x3 rotation matrix that converts from camera coordinates to absolute world coordinates, with each column mapping to a single camera axis (X/Y/Z) - A rotation matrix should be "wRc," then; from the world coordinates to the camera's - "Remember, an inverse is just a transpose for a rotation matrix" - In homogenous coordinates, we do this mapping from a 3D world point "X" to a 2D camera point "x" as (I THINK?): x = K * R * [I|-t] * X = P*X - Or, from the camera to the world (might have that backwards?): x = K * [R|t] * X = P*X - Essentially, though, - How many parameters are in the right side? 11, right - P is a 3x4 matrix, but scale doesn't matter. On the left side, K has 5 DoF (since scale doesn't matter) and the rotation - So, what do the columns of P mean? - The vector [0, 0, 0, 1] would get us the image of the origin in homogenous coordinates - so, the 4th column is the origin - Similarly, the 1st column is at infinity on the X-axis, the 2nd column is the point at infinity on the Y-axis, etc. - So, the columns of P are basically the 3 vanishing points of the image's x/y/z coordinates + the arbitrary position of the origin - What do the rows mean? That's an exercise I'll leave to you - So, if we want to do pose estimation, can't we just slap 4 points (the 3 vanishing points and origin) into a 3x4 matrix and call it a day? - Unfortunately, NO, since this doesn't work; each point only has 2 degrees of freedom (since it'd a 2D point in the image), so 4 points only gets us 8 DOF - short of our goal of 11 - So, we need at LEAST 6 points (since this gets us 12 pieces of information, which is >= 11) - Once we have 6 points that we know the 3D/2D coordinates for, we look at the 2D points, estimate from our model where the 3D coordinates are, find the least-squares error from the actual guess, and minimize until we get close to the correct "P" matrix - You can use nonlinear least squares to do this, which we will NOT ask you to implement since sci-py can do it for you - Alright, that's pose estimation - let's now start looking at structure from motion! - Dealing with multiple views of the object seems like a pain, but if we know where the cameras are then we can actually triangulate points and figure out EXACTLY where they are in the world! - Stereo vision kind of reduces to single-depth cues past ~10 meters ("beyond that, you have to use other stuff for depth cues"), but is effective within that range - In a stereo camera rig, all we have to do is search horizontally to see how offset the same image's pixels are - "In Project 2, to get 2 different perspectives, you CANNOT just rotate the camera - the world doesn't get 'more 3D' by rolling your eyes. You need to actually pick up the camera and MOVE." - In particular, we'll start by talking about the FUNDAMENTAL MATRIX - It's hard to find 3D correspondences between different images, but the good news is that we can check if something is a good match with the fundamental matrix theorem! - Suppose we have 2 different 2D views of the same point, P - where can the point be in the world? - Well ,the thing we don't know is at what depth that point appears, so it could be anywhere along a ray shooting into the image - and if we have 2 images, and we know where the point is in both images, then we know! - What if we only know the point in one image and the actual, 3D location of the point - could we figure out where it appears in the 2nd image? Yes, we could! - And if we DON'T know the exact 3D position of the point, we still know that the point has to appear along a certain line in the 2nd image (at a different point for each possible depth the point is at) - and we can calculate that line! - How the heck do we compute that line of "possible appearances", though? Well, that gets stupidly mathematical with 2D geometry - and we'll do that next lecture!