CovarianceNet: Convolutional Neural Network Feature
Extraction Using Covariance Tensor Decomposition
IEEE Access 2021 (Oral Presentation, Best Poster Award)

Abstract

overview

This work describes a new method to extract image features using tensor decomposition to model data. Given an image dataset, we extract patches from images, compute the covariance tensor for all patches, decompose with the Tucker Decomposition, and obtain the most critical features from a tensor core. These features can be leveraged in diverse forms and with different objectives. This work explored rearranging these features into block-wise kernels , which are plugged into convolutional layers of CNNs for classification tasks. Preliminary experiments were conducted on these kernels to test their discriminative and initializer capabilities. The discriminative experiments showed an increased classification accuracy of around 67% with CIFAR 10, 64% with CIFAR 100, and 98% with MNIST, using 10,100,1000 samples with a single feed-forward training. On the other hand, the initialization experiments showed the feature extraction capability versus available initializers (He random, He uniform, Glorot, random), returning comparable findings with state-of-the-art around 91% with CIFAR 10, 72% with CIFAR 100, and 99% with MNIST. The results were promising, allowing us to identify the proposed method's advantages over traditional convolutional neural networks.

Video

Covariance Tensor

The covariance tensor, also called n-mode cross-covariance, is calculated among different patches extracted from images in the training dataset. The covariance tensor is a generalization of the covariance matrix.



The next step involves getting the factor matrices (U’s) from the mean covariance tensor. Here, unlike the standard tucker decomposition, we decide only to get the factor matrices U1, U2, and U3. It is because the covariance matrix is super symmetric so that we can get a good approximation of the tensor core with only these three matrices.

We got the factor matrices by unfolding and decomposing (Singular Value Decomposition) the mean covariance tensor.

Tucker Decomposition

Once we have the factor matrices, we can get an approximation of the core tensor by Tucker Decomposition.

scales

The resulting core tensor dimensions are given by the first dimension of each factor matrix U and the patches dimensions.

scales

The core tensor encapsulates some relevant features of our dataset. We propose rearranging these features into kernels of size (patch height, patch width, channels) for subsequently plugged into convolutional layers.

scales

Results

We evaluated the discriminative and initializer capabilities of the generated kernels. Among the results, we noted different patterns by performing a visual sanity check, a similarity check of the correlation between kernels generated with different small subsets, and a state-of-the-art comparison.

scales

We study the correlation between generated kernels (per layer) using different sizes of training sub-datasets.

scales

We compare our model with state-of-the-art architectures on CIFAR 10, CIFAR 100 and MNIST.

scales

Related links

- The book Multilinear Subspace Learning Dimensionality Reduction of Multidimensional Data provides an excellent introduction to tensor operations: unfolding, n-mode product, n-mode cross-covariance (Covariance Tensor), and Tucker Decomposition.

- To understand Tucker Decomposition with R, explore Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

- To know more about the cascade style used in our work, check PCA Based Kernel Initialization for Convolutional Neural Networks.

Citation

Powered by Jon Barron and Michaël Gharbi.