ZASPIL Paper 2005

This page provides additional online material for the following paper:

I S HOWARD & M A HUCKVALE,
Learning to Control an Articulator Synthesizer by Imitating Real Speech.
Submitted to the Special Issue of ZASPIL, derived from the Franco-German Speech Production/Speech Perception Summer school in Lubmin, Germany, 2004.

This supplement provides .wav files for the various input and output speech utterances described in the text. A more up-to-date PowerPoint presentation of further developments of this work available here.

The Babble Generator

The input parameter space to the vocal tract synthesiser was driven with a random signal. This signal was generated by interpolating between target values, chosen by sampling from the parameter space of the vocal tract synthesiser. It is possible to sample this space in a variety of different ways. In the examples below, the synthesiser output results from three different sampling strategies. These strategies are:

Sampling targets from the parameter space directly (in which case the targets selected constitute a uniform sample of the parameter space with no constraint on any phonetic relevance).
Sampling targets from parameter space that represent 5 vowels.
Sampling targets from parameter space that represent 5 vowels and the consonants /b/and /g/.

1. Direct Babbling in the Synthesiser Parameter Space	2. Babbling Targets chosen from Pure Vowel Space	3. Babbling Targets chosen from Vowel and Consonant Space
babble_vtSpace	babble_vowelSpace	babble_vowelConSpace

Using the Inverse Model to Re-synthesise Speech from an Identical Vocal Tract

After the inverse model had been trained using a babbling phase, its operation was investigated by first re-synthesising speech generated by an identical vocal tract synthesiser. This constitutes the simplest case, since the issue of speaker normalization does not arise.

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

1. Speech Synthesised Driven From Babble Generator	2. Speech Synthesised Driven via Inverse Model running on babble speech input
initiator_babble_abaSpace	imitator_babble_abaSpace
initiator_babble_AiubAiuSpace	imitator_babble_AiubAiuSpace
initiator_babble_vowelSpace	imitator_babble_vowelSpace

Using the Inverse Model to Re-synthesise Real Speech from a Human Subject

The inverse model was then investigated using real speech from a single male subject. This is a much more difficult task, since the exact characteristics of the subject's vocal tract and also that of the synthesiser will generally differ.

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

Real Speech Input	Speech Synthesised Using Inverse Model with real Speech Input.
realSpeechIAH1 /babababa/	resynthSpeechIAH1 /bababab/
realSpeechIAH3 ' boogie boogie bababa' x2	resynthSpeechIAH3 ' boogie boogie bababa' x2

Using the Inverse Model to Re-synthesise Real Speech from a Human Subject incorporating an additional Sparse Coding Stage in the Auditory Analysis

An additional sparse coding stage was incorporated in the auditory analysis.

The input .wav files to the inverse model system are given in column 1 and the resulting synthesised output .wav files are given in column 2.

Real Speech Input	Speech Synthesised Using Inverse Model with real speech input. Auditory Analysis also used a Sparse Coding Stage.
realSpeechIAH1 /babababa/	SresynthSpeechIAH1
realSpeechIAH3 ' boogie boogie bababa' x2	SresynthSpeechIAH3