
Finding Objects via Feature Matching and Perspective Transforms
In the previous chapter, you learned how to detect and track a simple object (the silhouette of a hand) in a very controlled environment. To be more specific, we instructed the user of our app to place the hand in the central region of the screen and then made assumptions about the size and shape of the object (the hand). In this chapter, we want to detect and track objects of arbitrary sizes, possibly viewed from several different angles or under partial occlusion.
For this, we will make use of feature descriptors, which are a way of capturing the important properties of our object of interest. We do this so that the object can be located even when it is embedded in a busy visual scene. We will apply our algorithm to the live stream of a webcam and do our best to keep the algorithm robust yet simple enough to run in real time.
In this chapter, we will cover the following topics:
- Listing the tasks performed by the app
- Planning the app
- Setting up the app
- Understanding the process flow
- Learning feature extraction
- Looking at feature detection
- Understanding feature descriptors
- Understanding feature matching
- Learning feature tracking
- Seeing the algorithm in action
The goal of this chapter is to develop an app that can detect and track an object of interest in the video stream of a webcam—even if the object is viewed from different angles or distances or under partial occlusion. Such an object can be the cover image of a book, a drawing, or anything else that has a sophisticated surface structure.
Once the template image is provided, the app will be able to detect that object, estimate its boundaries, and then track it in the video stream.