In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Representation learning with a temporally coherent mixed-representation

Parkinson, Jon Charles

[Thesis]. Manchester, UK: The University of Manchester; 2017.

Access to files

Abstract

Guiding a representation towards capturing temporally coherent aspects present invideo improves object identity encoding. Existing models apply temporal coherenceuniformly over all features based on the assumption that optimal encoding of objectidentity only requires temporally stable components. We test the validity of this assumptionby exploring the effects of applying a mixture of temporally coherent invariantfeatures, alongside variable features, in a single ‘mixed’ representation. Applyingtemporal coherence to different proportions of the available features, we evaluate arange of models on a supervised object classification task. This series of experimentswas tested on three video datasets, each with a different complexity of object shape andmotion. We also investigated whether a mixed-representation improves the capture ofinformation components associated with object position, alongside object identity, ina single representation. Tests were initially applied using a single layer autoencoderas a test bed, followed by subsequent tests investigating whether similar behaviouroccurred in the more abstract features learned by a deep network. A representationapplying temporal coherence in some fashion produced the best results in all tests,on both single layered and deep networks. The majority of tests favoured a mixed representation,especially in cases where the quantity of labelled data available to thesupervised task was plentiful. This work is the first time a mixed-representation hasbeen investigated, and demonstrates its use as a method for representation learning.

Layman's Abstract

Guiding a representation towards capturing temporally coherent aspects present in video improves object identity encoding.Existing models apply temporal coherence uniformly over all features based on the assumption that optimal encoding of object identity only requires temporally stable components.We test the validity of this assumption by exploring the effects of applying a mixture of temporally coherent invariant features, alongside variable features, in a single `mixed' representation. Applying temporal coherence to different proportions of the available features, we evaluate a range of models on a supervised object classification task.This series of experiments was tested on three video datasets, each with a different complexity of object shape and motion. We also investigated whether a mixed-representation improves the capture of information components associated with object position, alongside object identity, in a single representation. Tests were initially applied using a single layer autoencoder as a test bed, followed by subsequent tests investigating whether similar behaviour occurred in the more abstract features learned by a deep network.A representation applying temporal coherence in some fashion produced the best results in all tests, on both single layered and deep networks.The majority of tests favoured a mixed-representation, especially in cases where the quantity of labelled data available to the supervised task was plentiful.This work is the first time a mixed-representation has been investigated, and demonstrates its use as a method for representation learning.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Doctor of Philosophy
Degree programme:
PhD Computer Science (CDT)
Publication date:
Location:
Manchester, UK
Total pages:
190
Abstract:
Guiding a representation towards capturing temporally coherent aspects present invideo improves object identity encoding. Existing models apply temporal coherenceuniformly over all features based on the assumption that optimal encoding of objectidentity only requires temporally stable components. We test the validity of this assumptionby exploring the effects of applying a mixture of temporally coherent invariantfeatures, alongside variable features, in a single ‘mixed’ representation. Applyingtemporal coherence to different proportions of the available features, we evaluate arange of models on a supervised object classification task. This series of experimentswas tested on three video datasets, each with a different complexity of object shape andmotion. We also investigated whether a mixed-representation improves the capture ofinformation components associated with object position, alongside object identity, ina single representation. Tests were initially applied using a single layer autoencoderas a test bed, followed by subsequent tests investigating whether similar behaviouroccurred in the more abstract features learned by a deep network. A representationapplying temporal coherence in some fashion produced the best results in all tests,on both single layered and deep networks. The majority of tests favoured a mixed representation,especially in cases where the quantity of labelled data available to thesupervised task was plentiful. This work is the first time a mixed-representation hasbeen investigated, and demonstrates its use as a method for representation learning.
Layman's abstract:
Guiding a representation towards capturing temporally coherent aspects present in video improves object identity encoding.Existing models apply temporal coherence uniformly over all features based on the assumption that optimal encoding of object identity only requires temporally stable components.We test the validity of this assumption by exploring the effects of applying a mixture of temporally coherent invariant features, alongside variable features, in a single `mixed' representation. Applying temporal coherence to different proportions of the available features, we evaluate a range of models on a supervised object classification task.This series of experiments was tested on three video datasets, each with a different complexity of object shape and motion. We also investigated whether a mixed-representation improves the capture of information components associated with object position, alongside object identity, in a single representation. Tests were initially applied using a single layer autoencoder as a test bed, followed by subsequent tests investigating whether similar behaviour occurred in the more abstract features learned by a deep network.A representation applying temporal coherence in some fashion produced the best results in all tests, on both single layered and deep networks.The majority of tests favoured a mixed-representation, especially in cases where the quantity of labelled data available to the supervised task was plentiful.This work is the first time a mixed-representation has been investigated, and demonstrates its use as a method for representation learning.
Additional digital content not deposited electronically:
Code was produced to run the various neural networks described in this work. This code is being prepared and will be passed to my supervisor for the appropriate storage in the next week.
Non-digital content not deposited electronically:
None
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:308982
Created by:
Parkinson, Jon
Created:
4th May, 2017, 20:38:24
Last modified by:
Parkinson, Jon
Last modified:
7th September, 2017, 12:32:40

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.