Lesson 2: Speaker Comparison

Definition: the comparison of two or more voice recordings in order to identify if a specific speech comes from the same speaker.

Key Concepts

Phonetic Analysis
Speaker-specific Features
Analytic Approach

UNIT 1: Introduction to Speaker Comparison

In order to be able to do a speaker comparison, audio recordings of the perpetrator and the suspect are available. Forensic language experts carry out a meticulous evaluation of two recordings to answer the question of whether the suspect is the perpetrator of the crime.

Because speaker comparison is about an analytic approach – phonetic analysis and speaker-specific characteristics, we need a more detailed analysis than the speaker profile. And nowadays this can only be created using software or artificial intelligence. We will therefore turn to the phonetic software Praat, which is well known in research and will help us to forensically analyze specific characteristics of the two voices, who are suspicious ! Let's track them down…

In this lesson, we will focus primarily on phonetic features and during the first activity, we will look at fundamental frequency. In adult men, the average fundamental frequency is usually between 85 and 180 Hertz (Hz). In adult women, on the other hand, the average fundamental frequency is typically between 165 and 255 Hz.

In order to see how this fundamental frequency is measured, you will need to download a copy of Praat onto your computer. You can download it here: https://www.fon.hum.uva.nl/praat/

Activity 1: The Mystery of the Stolen Lindt Recipes

Is it the same person? No!

Remember from the last lesson:

We had the case of the theft of valuable recipes in the Lindt chocolate factory. We've decided that the math expert is a highly dangerous person, so we've already created a speaker profile of him. But the thing is, besides this person, we have an anonymous source who told us that he saw a person hanging around the Lindt chocolate factory late at night while the robbery was going on. Why was this person hanging around? We interviewed him and he seems suspicious, so we made him repeat the same sentence from the audio recording, to know if we're dealing with the math expert.

Therefore, we need to compare his voice with that of the recording we have of the robber. You will need to save the two different audio recordings on your desktop and upload them to Praat. Listen carefully and determine the average fundamental frequency of the two recordings. Is it the same person?

Original 0

00:00 / 00:26

Sample

00:00 / 00:16

UNIT 2: Analyzing Phonetic Characteristics

As we have seen in part 1 of this lesson, it can be extremely difficult to try to discern speakers by ear and guess which voice belongs to whom. Now that we've taken a look at Praat and how we can use it, we are going to dive deeper into analyzing phonetic characteristics between speakers.

Some of the more noticeable differences that we can measure using Praat are loudness, formants, and the pronunciation of vowels and consonants (which are shown in a spectrogram). These elements can convey information about the speaker's identity or emotional state.

Here's how these factors can affect a speaker's voice:

Pronunciation of vowels & consonants

The pronunciation of vowels and consonants can significantly affect a speaker's voice in various ways, including clarity, tone, pitch, and overall sound. Additionally, the way they are pronounced can give insight into accents, general speaking patterns and moods.

In Praat, users are able to see the pronunciation of vowels and consonants in a spectrogram, which displays the acoustic characteristics of speech - the formants, pitch contour, duration and intensity. A spectrogram is what is shown in the bottom half of the image, highlighted in the red box.

Loudness

Loudness contributes to the overall impression of a speaker's confidence, emotional expressiveness, and emphasis.

In Praat, users will be able analyze the intensity of an audio signal. A blue waveform can be seen in the upper part of the screen, which shows the volume changes of the signal over time.

Formants

As mentioned, formants are crucial for studying regional accents, dialects, and individual speech patterns based on vowel quality.

In Praat, information about vowel quality can be found by identifying resonant frequencies. In the spectrogram, in the above image, the red dots represent formants. When analyzing the vowels, we would focus on the first three formants: F (1), F (2) and F (3). It can reflect the place and manner of articulation of vowels.

Unfortunately, the man who was brought into the police station didn't turn out to be our math guru. However, our witness went to university the next day and heard several of his mathematics professors talking and he was almost certain that one of them matched the voice he heard at the chocolate factory. Detective X then went to the university and recorded these three professors using the same sentence that our witness heard the night of the break-in.

Download the three recordings of our suspects and put them into Praat along with the recording of the robber from the night of the break-in. Try to analyze these recordings using the phonetic characteristics that we just learned about and see if one of them is the math expert.

Activity 2: Finding the Math Expert

Voice 1

00:00 / 00:14

Voice 2

00:00 / 00:12

Voice 3

00:00 / 00:06

Final thoughts for this lesson:

Speaker comparison is used to determine the likelihood of speakers in two or more recordings being one and the same voice.
Understanding of language and sound: Students have learned how language and sounds work on a technical level and especially how they are visualized. This experience can raise awareness of the complexity and diversity of language and potentially spark interest in areas such as linguistics, phonetics and acoustics.

What do you think are other challenges this part of forensic linguistics has to face? How do you think that we can determine how to differentiate between AI created voices and real human voices?

continue with the 3rd lesson

Sources

Eilika Fobbe. (2011). Forensische Linguistik : eine Einführung / monograph. Narr Francke Attempto.

Künzel H. J. (1987). Sprechererkennung Grundzüge forens. Sprachverarbeitung. Heidelberg Kriminalistik-Verlag.