A quantitative study of disfluencies in French broadcast interviews

Publication Type:

Conference Paper


The 4th Workshop on Disfluency in Spontaneous Speech, Aix-en-Provence, France, p.27-32 (2005)






The reported study aims at increasing our understanding of spontaneous speech-related phenomena from sibling corpora of speech and orthographic transcriptions at various levels of elaboration. It makes use of 9 hours of French broadcast interview archives, involving 10 journalists and 10 personalities from political or civil society. First we considered press-oriented transcripts, where most of the so-called disfluencies are discarded. They were then aligned with automatic transcripts, by using the LIMSI speech recogniser. This facilitated the production of exact transcripts, where all audible phenomena in non-overlapping speech segments were transcribed manually. Four types of disfluencies were distinguished: discourse markers, filled pauses, repetitions and revisions, each of which accounts for about 2% of the corpus (8% in total). They were analysed by utterance, speaker and disfluency pattern types. Four question were raised. Where do disfluencies occur in the utterance? What is the influence of the speakers' status? And what are the most frequent disfuency patterns?


Université de Provence; September 10-12, 2005