Generating and Preparing Sound Files for Analysis

Recording guidelines

As reverberation and echoes interfere with the fidelity of the speech signal, recordings should ideally be made in a recording studio or similar anechoic space. To maximise signal to noise ratio (SNR), check for background noises in the environment and use a directional microphone (Cardioid or Hyper-cardioid).

Sound files can be recorded and edited into phrases, or speech of interest, using the freely available Audacity© software. https://www.audacityteam.org/

The speech recordings should be in .wav format.

Sampling rate either 44.1 or 16 kHz

Ensure that the recording level is in the range -12 dB to -6 dB and there is no clipping. Clipping means the input level was too high for the system. Check for clipping using the “View\Show Clipping” option in Audacity as shown in figure below. The red sections indicate clipping.

figure6_generating_and_preparing_sound_files_for_analysis.jpg

Editing guidelines

Make entire file into MONO format.

Only make an edit at a silent point in the speech. This avoids ‘spectral splatter’.

Pad the subsequent sound files for processing with 500 ms silence at the beginning and end of each file.

figure6_generating_and_preparing_sound_files_for_analysis.jpg

Sample Advance

Home page

Study at Cambridge

About the University

Research at Cambridge