Links

Home

Publications

Research

Gallery

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fast Wavelet-Based Visual Classification

 

Guoshen Yu, Jean-Jacques Slotine

 

 

Summary:

We investigate a biologically motivated approach to fast visual classification, directly inspired by the recent work [Serre et al 07]. Specifically, trading off biological accuracy for computational efficiency, we explore using standard wavelet transforms and patch transforms to parallel the tuning of visual cortex V1 and V4 cells, alternated with max operations
to achieve scale and translation invariance. A feature selection procedure is applied during learning to accelerate recognition. We introduce a simple attention-like feedback mechanism, significantly improving recognition and robustness in multiple object scenes. In experiments, the proposed algorithm achieves or exceeds state-of-the-art performance in object recognition, but also in new applications such as texture classification, satellite image classification, and alphabet identification. Preliminary results on sound classification are shown as well.

 

 

References:

 

Outline:

 

Algorithm description

 

Algorithm overview: 4 steps (S1, C1, S2, C2)

 

 

 

S1: Wavelet Transform 

  • Visual cortex V1: frequency and orientation tuning.

  • Wavelet transform: scales, orientations (horizontal, vertical, diagonal)

 

 

 

C1: Local Maximum and Subsampling

  • Local maximum => local translation invariance

  • Subsampling => scale invariance

 

 

S2: Patch Transform   

  • V4: larger receptive fields, responsive to more complex
    stimuli.

 

 

 

C2: Global Maximum

  • Global Maximum translation and scale invariance

 

 

Patch Selection in S2

  • Randomly sampled from the C1 coefficients of the training images.

  • Patch selection: active patches are selected and inactive ones are discarded.

  • Patch grouping: similar patches are grouped in clusters; a single patch is used to represent each cluster.

 

Attention Focusing

  • Focus attention on the objects one by one.

  • Spatially cluster the C2 coefficients.

  • Recognize each object using its own C2 coefficients.

 

 

 

Experiments

 

Object Classification (object vs background)

 

 

 

 

 

 

Object Recognition with Attention Focusing

 

 

Accuracy without/with attention focusing: 74% / 98%

 

 

Texture Classification (111-class Brodatz Database, only 10 are shown here)

 

 

Accuracy: 87.8%.

The state-of-the-art texture classification algorithm in [Lazebnik et al 05] achieves 88.2%.

 

 

Satellite Image Classification  (4-class, multi-resolultion: forest, urban areas, rural areas, sea.)

 

 

Accuracy: 100%.  (Images provided by CNES.)

 

 

Alphabet Classification (8-class: Arabic, Chinese, English, Greek, Hebrew, Japanese, Korean, Russian.)

 

 

Accuracy: 100%.

 

 

Sound Classification (5-class)

 

 

Accuracy: 100%.