Frisian Audio Mining Enterprise

Samenvatting

In this project we will disclose 2600 hours of radio broadcasts from the Omrop Fryslân (Frisian Broadcast). The radio broadcasts contain spoken Frisian and Dutch covering the period 1950?2000. We will use speech technology for spoken document retrieval (speech to text conversion) and for speaker tracking (speaker diarization & recognition). Thus we will be able to locate broadcasts addressing specific topics and specific speakers in the audio signal. In order to guarantee relevance in retrieval, the project will also develop an enriched Frisian lexicon and a semantic search engine for Frisian and Dutch to search the broadcasts. The non-academic project partners acknowledge the disclosure of this data as a rich source of Frisian cultural heritage. The project carries out innovative research since it will investigate efficiency and performance of: 1. Automatic Speech Recognition of Frisian and Dutch using either two separate recognizers or a hybrid one; 2. the integration of speaker diarization and speaker recognition applied to a large longitudinal data set; 3. a flexible semantic search interface targeted at various user groups. In all these topics efficient processing is required, because of the sheer volume of the data.

Key words: audio mining; big data; semantic searching; Cultural Heritage; Frisian; radio broadcasts; language variation; language domains, spoken document retrieval

Kenmerken

Projectnummer

314-99-119

Hoofdaanvrager

Prof. dr. ir. D.A. van Leeuwen

Verbonden aan

Radboud Universiteit Nijmegen, Faculteit der Letteren, Taalwetenschap

Uitvoerders

Dr. J.E. Dijkstra, Dr. E. Yilmaz, Dr. E. Yilmaz

Looptijd

01/07/2015 tot 30/06/2018