上QQ阅读APP看书，第一时间看更新

Finding Objects via Feature Matching and Perspective Transforms

In the previous chapter, you learned how to detect and track a simple object (the silhouette of a hand) in a very controlled environment. To be more specific, we instructed the user of our app to place the hand in the central region of the screen and then made assumptions about the size and shape of the object (the hand). In this chapter, we want to detect and track objects of arbitrary sizes, possibly viewed from several different angles or under partial occlusion.

For this, we will make use of feature descriptors, which are a way of capturing the important properties of our object of interest. We do this so that the object can be located even when it is embedded in a busy visual scene. We will apply our algorithm to the live stream of a webcam and do our best to keep the algorithm robust yet simple enough to run in real time.

In this chapter, we will cover the following topics:

Listing the tasks performed by the app
Planning the app
Setting up the app
Understanding the process flow
Learning feature extraction
Looking at feature detection
Understanding feature descriptors
Understanding feature matching
Learning feature tracking
Seeing the algorithm in action

The goal of this chapter is to develop an app that can detect and track an object of interest in the video stream of a webcam—even if the object is viewed from different angles or distances or under partial occlusion. Such an object can be the cover image of a book, a drawing, or anything else that has a sophisticated surface structure.

Once the template image is provided, the app will be able to detect that object, estimate its boundaries, and then track it in the video stream.