This page provides additional online material for the following paper:

I S HOWARD & M A HUCKVALE, 
Learning to Control an Articulator Synthesizer by Imitating Real Speech
.
Submitted to the Special Issue of ZASPIL, derived from the Franco-German Speech Production/Speech Perception Summer school in Lubmin, Germany, 2004.

This supplement provides .wav files for the various input and output speech utterances described in the text.  A more up-to-date PowerPoint presentation of further developments of this work available here.

The Babble Generator

The input parameter space to the vocal tract synthesiser was driven with a random signal.  This signal was generated by interpolating between target values, chosen by sampling from the parameter space of the vocal tract synthesiser.  It is possible to sample this space in a variety of different ways.  In the examples below, the synthesiser output results from three different sampling strategies.  These strategies are:

  1. Sampling targets from the parameter space directly (in which case the targets selected constitute a uniform sample of the parameter space with no constraint on any phonetic relevance).
  2. Sampling targets from parameter space that represent 5 vowels.
  3. Sampling targets from parameter space that represent 5 vowels and the consonants  /b/and /g/.
1. Direct Babbling in the Synthesiser Parameter Space 2. Babbling Targets chosen from Pure Vowel Space 3. Babbling Targets chosen from Vowel and Consonant  Space
babble_vtSpace babble_vowelSpace babble_vowelConSpace

 

Using the Inverse Model to Re-synthesise Speech from an Identical Vocal Tract

After the inverse model had been trained using a babbling phase, its operation was investigated by first re-synthesising speech generated by an identical vocal tract synthesiser.  This constitutes the simplest case, since the issue of speaker normalization does not arise.

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

1. Speech Synthesised Driven From Babble Generator 2. Speech Synthesised Driven via Inverse Model running on  babble speech input
initiator_babble_abaSpace  imitator_babble_abaSpace
initiator_babble_AiubAiuSpace imitator_babble_AiubAiuSpace
initiator_babble_vowelSpace imitator_babble_vowelSpace

 

Using the Inverse Model to Re-synthesise Real Speech from a Human Subject

The inverse model was then investigated using real speech from a single male subject.  This is a much more difficult task, since the exact characteristics of the subject's vocal tract and also that of the synthesiser will generally differ.

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

Real Speech Input Speech Synthesised Using Inverse Model with real Speech Input. 
realSpeechIAH1

  /babababa/

resynthSpeechIAH1  

/bababab/

realSpeechIAH3  

' boogie boogie bababa' x2

resynthSpeechIAH3  

' boogie boogie bababa' x2

 

Using the Inverse Model to Re-synthesise Real Speech from a Human Subject incorporating an additional Sparse Coding Stage in the Auditory Analysis

An additional sparse coding stage was incorporated in the auditory analysis.  

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

Real Speech  Input Speech Synthesised Using Inverse Model with real speech input.  Auditory Analysis also used a Sparse Coding Stage.
realSpeechIAH1

  /babababa/

SresynthSpeechIAH1
realSpeechIAH3  

' boogie boogie bababa' x2

SresynthSpeechIAH3
 

Please send your comments about this web page to: drianhoward@gmail.com 
Copyright © 2005-2010 Ian Howard. All Rights Reserved
Last Changed: 05 August 2010