In April 2016 Manchester eScholar was replaced by the University of Manchester’s new Research Information Management System, Pure. In the autumn the University’s research outputs will be available to search and browse via a new Research Portal. Until then the University’s full publication record can be accessed via a temporary portal and the old eScholar content is available to search and browse via this archive.

Modality Translation: From Image To Sound

Wang, Jiaxuan

[Thesis]. Manchester, UK: The University of Manchester; 2018.

Access to files

Abstract

A software system that performs signal modality translation is described, tested and evaluated. Motivated to develop a system with the functions of aiding visual impaired and blind people in navigation and location, the system uses a live video stream as its input; modality translation is applied and produces a live audio stream as the result. The audio signals generated are determined and modulated by statistical parameters derived from the image data. The statistical parameters are calculated by methods including the RGB channels split, the Fast Fourier Transform, grey-Level co-occurrence matrix algorithms, and other feature extraction algorithms. The real-time modulation of audio signals is based on chords composed of five tones, of which each tone’s frequency is determined by one to two statistical parameters and their corresponding coefficients that are obtained from repeated tests and trials. Further, a graphical user interface (GUI) track bar has been created for users to personally adjust two modulation factors within a fixed range. The work exploits real-time Digital Signal Processing (DSP) techniques and was applied within a software development framework based on OpenCV. This system met all of the design and performance objectives; the output sound is continuous, in real time, easy to distinguished for each captured image or video stream and pleasant to the ear. The system represents an initial exploration of using real-time DSP to modulate sound in response to live-feed video data, and further offers a potential route forwards for developing a wide range of techniques and systems in the area of acoustic assisted technologies.

Bibliographic metadata

Type of resource:
Content type:
Form of thesis:
Type of submission:
Degree type:
Master of Philosophy
Degree programme:
MPhil Electrical and Electronic Engineering
Publication date:
Location:
Manchester, UK
Total pages:
142
Abstract:
A software system that performs signal modality translation is described, tested and evaluated. Motivated to develop a system with the functions of aiding visual impaired and blind people in navigation and location, the system uses a live video stream as its input; modality translation is applied and produces a live audio stream as the result. The audio signals generated are determined and modulated by statistical parameters derived from the image data. The statistical parameters are calculated by methods including the RGB channels split, the Fast Fourier Transform, grey-Level co-occurrence matrix algorithms, and other feature extraction algorithms. The real-time modulation of audio signals is based on chords composed of five tones, of which each tone’s frequency is determined by one to two statistical parameters and their corresponding coefficients that are obtained from repeated tests and trials. Further, a graphical user interface (GUI) track bar has been created for users to personally adjust two modulation factors within a fixed range. The work exploits real-time Digital Signal Processing (DSP) techniques and was applied within a software development framework based on OpenCV. This system met all of the design and performance objectives; the output sound is continuous, in real time, easy to distinguished for each captured image or video stream and pleasant to the ear. The system represents an initial exploration of using real-time DSP to modulate sound in response to live-feed video data, and further offers a potential route forwards for developing a wide range of techniques and systems in the area of acoustic assisted technologies.
Thesis main supervisor(s):
Thesis co-supervisor(s):
Language:
en

Institutional metadata

University researcher(s):

Record metadata

Manchester eScholar ID:
uk-ac-man-scw:316876
Created by:
Wang, Jiaxuan
Created:
10th October, 2018, 10:34:22
Last modified by:
Wang, Jiaxuan
Last modified:
2nd November, 2018, 14:26:25

Can we help?

The library chat service will be available from 11am-3pm Monday to Friday (excluding Bank Holidays). You can also email your enquiry to us.