Meanshift Summary

Both CamShift and ABCshift were derived and originated from the Mean Shift algorithm first proposed by Fukanaga and Hostetler [12]. It is a non-parametric gradient estimator working on discrete probability images. It works on a portion of the image called search window where it finds the centroid or mean location of the probability distribution.

An in depth technical explanation of the algorithm is described in [3] and [12] and briefly in [1] and [2]. The following are excerpted from [1]:

  1. Choose a search window size.
  2. Choose the initial location of the search window.
  3. Compute the mean location in the search window.
  4. Center the search window at the mean location computed in Step 3.
  5. Repeat Steps 3 and 4 until convergence (or until the mean location moves less than a preset threshold).

Camshift Summary

CamShift is a nonparametric model-based visual tracker that works just like mean shift but is adaptive and works with changing probability distributions over time. The tracked object is modeled as a static color histogram..

In every frame, the algorithm works by operating on a portion or subset of the whole image/frame storing the distribution's centroid and region's zeroth moment or the sum of all probabilities. Search window size and location of a frame is a function of the last frame's zeroth moment and centroid location respectively. Hence, the algorithm is a feedback loop. The algorithm can be summarized as follows [1] [4]:

  1. Set the region of interest (ROI) of the probability distribution image to the entire image.
  2. Select an initial location of the Mean Shift search window. The selected location is the target distribution to be tracked.
  3. Calculate a color probability distribution of the region centered at the Mean Shift search window.
  4. Iterate Mean Shift algorithm to find the centroid of the probability image. Store the zeroth moment (distribution area) and centroid location.
  5. For the following frame, center the search window at the mean location found in Step 4 and set the window size to a function of the zeroth moment. Go to Step 3.

Color Spaces

The algorithm makes use of pixel color to track. Digital colors are units of a large set called color spaces. Mathematically, a color space is an abstract model describing the way colors can be represented as tuples of numbers, typically as three or four values or channels [6]. Typical color spaces are the RGB, cubic, and the HSV/HSI, conic, color spaces which are both three-dimensional or has three channels.

Object Model

The Camhift algorithm uses color histograms as object model. A color histogram is a representation of the distribution of colors in an image, derived by counting the number of pixels of each of given set of color [7]. The dimension of the histogram depends on the number of channels used to describe the object. In [1], they used only the Hue channel of the HSV color space to track faces. While in [4], they experimented on using combinations of the HSV's channels to track general object. Note that when talking about color histograms or histograms in general, dimensions are different from bins. A dimension describes a (random) variable while bins are non-overlapping sets of its possible values.

Color Image Probability Distribution

The majority of CShift processes work on the color image probability of a subset of each frame called the search window. It is discussed in [5] that in pure mathematical sense, pixel probability within the ROI (region of interest) or search window is computed using Bayes' law as

where P(O|C) denotes the probability that the pixel belongs to the object given its color and P(C|O) is the object model or color histogram of the object. The prior probabilities that the pixels represent object and possesses the color C are given by P(O) and P(C) respectively.

Centroid/Mean Location Calculation

The Camshift algorithm tracks by repositioning the search window at every frame with the center of the region at the mean location or gradient of the probability distribution. The mean location or centroid C = (xc,yc) of the 2D discrete color image probability distribution is computed as follows:

Find the zeroth moment

Find the first moment for x and y

Then the mean search window location (the centroid) is

where I(x,y) is the pixel (probability) value at position (x,y) in the image, and x and y range over the search window.

 

"Let me tell you the secret that has led me to my goal. My strength lies solely in my tenacity."

- Louis Pasteur