Fundamental frequency as an acoustic cue to phonological phrase boundary in Spanish

Mario Casado-Mancebo

Departamento de Lengua española y Lingüística general (UNED)

Introduction

Acoustic cues encoding phonological phrase (\(\varphi\)) boundaries: duration, pauses and fundamental frequency (\(f_{0}\)) (Wightman et al., 1992).

Autosegmental-metrical approach (Pierrehumbert and Beckman, 1988) to intonation has focused on pitch events of prosody.

tonal events H: upward movement
L: downward movement
tonal domain words: pitch accent (H/L*)
phrases: boundary tone (H/L-/%)

Tonal events are anchored within the domain of syllables (Xu, 1998) but peaks are subject to variation. In prenuclear positions they often appear after their bearing unit (peak delay, H<*). Peaks at nuclear positions, where a boundary tone (H- or H%) is also present (tonal crowding), align with their unit (H*).

Peak delay hypothesis suggests that this is the result of a tonal anticipation in order to leave temporal space for the second tone (Frota et al., 2012). Boundary tones are often followed by a partial reset signaling the beginning of the next constituent (Féry and Truckenbrodt, 2005; Pijper and Sanderman, 1994).

Objectives

  • Identify the presence of phonological phrase boundary in Spanish.
  • Explore \(f_{0}\) behaviour around \(\varphi\) boundaries.
  • Quantify the effects of phonological phrase on boundary tones.

Methods

Participants 30 speakers from Madrid
Procedure Reading aloud task in a recording booth
Materials 8 texts
60 couples of declarative sentences: 30 \(\omega\) + 30 \(\varphi\)

Example:

  1. Las débiles fibras de ese algodón\(]\varphi [\)dejarán bolas al lavarlo.
    That cotton’s weak fibers will bobble after washing it
  2. El algodón\(]\omega [\)decente no causa esos problemas.
    Quality cotton doesn’t cause those issues

Analysis

Samples for the study:

  • Interpolated segments of speech spanning from the previous word to the boundary to the next one for each sentence
  • Pitch floor and ceiling adjusted individually for each participant
  • Time normalized by extracting a fixed number of points (153) from each sentence

Considerations:

  • Second boundary of interest at noun phrase (NP) boundary in \(\omega\) sentences:
    \([\)El \([\)algodón\(]_N\) \([\)decente\(]_{Adj}\bigr]_{NP/\varphi}\) no causa esos problemas

  • Influence of sex on \(f_{0}\)

  • A pause at a boundary generally signals a prosodically higher level constituent Estebas-Vilaplana and Prieto (2008)

  • Speech rate influences on H rise (Torres and Fletcher, 2020)

Functional Principal Components Analysis

Functional Principal Components Analysis (FPCA) allows to analyse phonetic phenomena involving dynamic changes. Principal components (PC) are numbered from 1 onwards in a rank that represents the decreasing percentage of variance that they reflect (Gubian et al., 2015).

Eight PC where explored from log transformed Hz samples as a function of normalized time using landmarkregUtils R package (Gubian, 2024).

Linear Mixed Models

In order to understand the influence of \(\varphi\) boundary, one linear mixed model (LMM) per PC was fitted with the following structure using lmerTest R package (Kuznetsova et al., 2017).

Dependent Independent Random intercepts
PC1 / PC2 /
PC3 / PC4
Boundary * Sex (c.) +
Boundary * Rate (c.)
Participant +
Utterance
  • To normalize effects of sex and speech rate two effects where added, both centered around the mean.
  • Boundary fixed effect includes categories \(\omega\) and \(\varphi\), and also \(\varphi\) + pause.

Results

Only the first four PC caught relevant variation.

Fixed effects (\(p < 0.01\))

PC1 (87.18 %)

Category |Size| Est. (log)
\(\varphi\) 9.1 0.833
\(\varphi\) + pause 7.7 0.773
Sex (c.) 3.9 -1.323

PC2 (6.54 %)

Category |Size| Est. (log)
\(\varphi\) 10.8 -0.759
\(\varphi\) + pause 8.9 -0.71
\(\varphi\) + pause * Rate (c.) 5.4 -0.265
\(\varphi\) * Rate (c.) 2.7 -0.1

PC3 (4.08 %)

Category |Size| Est. (log)
\(\varphi\) + pause 5.1 -0.478
\(\varphi\) 3.9 -0.356

PC4 (2.2 %)

Category |Size| Est. (log)
\(\varphi\) + pause 8.3 -0.378
\(\varphi\) 5.8 -0.239
\(\varphi\) * Rate (c.) 4.6 0.097
Sex (c.) * \(\varphi\) 3.9 -0.086
Rate (c.) 2.9 0.051
  • Logarithmic transformation minimized sex influence…
    • Even in PC 1, which captures vertical variation, its effect is below boundaries’.
    • However it is not enough to fully neutralize sex differences as PC 1 and PC 4 still show a significant effect.
  • PC 2 and PC 4 caught significant speech rate effects, including an interaction.

FPCA models

Reconstruction of prosodic boundary effects from LMM estimates and FPCA functions. Solid lines represent central estimations for each prosodic category. Shaded areas display 95 % confidence interval for each category. Each line has been annotated with grammatical category corresponding to each word around the first boundary.
  • L+H* and L+H<* patterns as expected for Spanish declarative sentences (Estebas-Vilaplana and Prieto, 2008)
  • \(\varphi\) contours before the first boundary are predicted steeper than \(\omega\) contours. This produces a peak anticipation in \(\varphi\) that would be signaling the presence of H- tone according to peak delay hypothesis.
  • This H- tone aligns with the final upstep followed by a partial reset as indicated by arrows. This is expected at the end of a phrasal constituent.
  • \(\omega\) contours show peak anticipation before the second boundary. This matches the expectation as there is a noun phrase boundary after adjectives.
  • LMM do not predict a difference when a pause appears at \(\varphi\) boundary. Central estimates for \(\varphi\) boundaries with and without pause go parallel with a 2 Hz difference and confidence intervals are fully overlapped.

Conclusions

  • \(\varphi\) samples show a different \(f_0\) contour than \(\omega\) samples: They have a steeper rising and tonal anticipation.
  • Those different contours align with syntactic phrases boundaries
  • Said features match with those described in prosodic theory for \(\varphi\) level
  • Samples with a pause at the boundary could not be confirmed as an index of a greater boundary than those without a pause
  • Log-transforming Hz samples was not enough to neutralize influences of sex on \(f_0\). Including a centered fixed effect to normalize it allowed to estimate boundaries effects independently.

References

Estebas-Vilaplana, E., and Prieto, P. (2008). La notación prosódica del español: Una revisión del Sp-ToBI. Estudios de Fonética Experimental, 264–283. https://raco.cat/index.php/EFE/article/view/140072
Féry, C., and Truckenbrodt, H. (2005). Sisterhood and tonal scaling. Studia Linguistica, 59(2-3), 223–243. https://doi.org/10.1111/j.1467-9582.2005.00127.x
Frota, S., Arvaniti, A., and D’Imperio, M. (2012). Prosodic Representations: Prosodic Structure, Constituents, and Their ImplementationSegment-To-Tone AssociationTonal Alignment. In A. C. Cohn, C. Fougeron, and M. K. Huffman (Eds.), The Oxford Handbook of Laboratory Phonology. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199575039.013.0011
Gubian, M. (2024). landmarkregUtils: Utilities for Landmark Registration. https://github.com/uasolo/landmarkregUtils
Gubian, M., Torreira, F., and Boves, L. (2015). Using Functional Data Analysis for investigating multidimensional dynamic phonetic contrasts. Journal of Phonetics, 49, 16–40. https://doi.org/10.1016/j.wocn.2014.10.001
Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2017). lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software, 82, 1–26. https://doi.org/10.18637/jss.v082.i13
Pierrehumbert, J. B., and Beckman, M. E. (1988). Japanese Tone Structure. MIT Press. https://doi.org/10.1017/S095267570000066X
Pijper, J. R. de, and Sanderman, A. A. (1994). On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. The Journal of the Acoustical Society of America, 96(4), 2037–2047. https://doi.org/10.1121/1.410145
Torres, C., and Fletcher, J. (2020). The alignment of F0 tonal targets under changes in speech rate in Drehu. The Journal of the Acoustical Society of America, 147(4), 2947–2958. https://doi.org/10.1121/10.0001006
Wightman, C. W., Shattuck‐Hufnagel, S., Ostendorf, M., and Price, P. J. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. The Journal of the Acoustical Society of America, 91(3), 1707–1717. https://doi.org/10.1121/1.402450
Xu, Y. (1998). Consistency of Tone-Syllable Alignment across Different Syllable Structures and Speaking Rates. Phonetica, 55(4), 179–203. https://doi.org/10.1159/000028432