Related resources
Search for item elsewhere
University researcher(s)
Representation learning with a temporally coherent mixed-representation
[Thesis]. Manchester, UK: The University of Manchester; 2017.
Access to files
- FULL-TEXT.PDF (pdf)
Abstract
Guiding a representation towards capturing temporally coherent aspects present invideo improves object identity encoding. Existing models apply temporal coherenceuniformly over all features based on the assumption that optimal encoding of objectidentity only requires temporally stable components. We test the validity of this assumptionby exploring the effects of applying a mixture of temporally coherent invariantfeatures, alongside variable features, in a single ‘mixed’ representation. Applyingtemporal coherence to different proportions of the available features, we evaluate arange of models on a supervised object classification task. This series of experimentswas tested on three video datasets, each with a different complexity of object shape andmotion. We also investigated whether a mixed-representation improves the capture ofinformation components associated with object position, alongside object identity, ina single representation. Tests were initially applied using a single layer autoencoderas a test bed, followed by subsequent tests investigating whether similar behaviouroccurred in the more abstract features learned by a deep network. A representationapplying temporal coherence in some fashion produced the best results in all tests,on both single layered and deep networks. The majority of tests favoured a mixed representation,especially in cases where the quantity of labelled data available to thesupervised task was plentiful. This work is the first time a mixed-representation hasbeen investigated, and demonstrates its use as a method for representation learning.
Layman's Abstract
Guiding a representation towards capturing temporally coherent aspects present in video improves object identity encoding.Existing models apply temporal coherence uniformly over all features based on the assumption that optimal encoding of object identity only requires temporally stable components.We test the validity of this assumption by exploring the effects of applying a mixture of temporally coherent invariant features, alongside variable features, in a single `mixed' representation. Applying temporal coherence to different proportions of the available features, we evaluate a range of models on a supervised object classification task.This series of experiments was tested on three video datasets, each with a different complexity of object shape and motion. We also investigated whether a mixed-representation improves the capture of information components associated with object position, alongside object identity, in a single representation. Tests were initially applied using a single layer autoencoder as a test bed, followed by subsequent tests investigating whether similar behaviour occurred in the more abstract features learned by a deep network.A representation applying temporal coherence in some fashion produced the best results in all tests, on both single layered and deep networks.The majority of tests favoured a mixed-representation, especially in cases where the quantity of labelled data available to the supervised task was plentiful.This work is the first time a mixed-representation has been investigated, and demonstrates its use as a method for representation learning.
Keyword(s)
Autoencoders; Computer vision; Neural Networks; Representation learning; Temporal coherence; Unsupervised learning