Audio description and technologies. Study on the semi-automatisation of the translation and voicing of audio descriptions

Fernández-Torné, Anna

Audio description and technologies. Study on the semi-automatisation of the translation and voicing of audio descriptions

dc.contributor

Universitat Autònoma de Barcelona. Departament de Telecomunicació i Enginyeria de Sistemes

dc.contributor.author

Fernández-Torné, Anna

dc.date.accessioned

2016-09-29T07:29:47Z

dc.date.available

2016-09-29T07:29:47Z

dc.date.issued

2016-09-06

dc.identifier.isbn

9788449066191

en_US

dc.identifier.uri

http://hdl.handle.net/10803/394035

dc.description.abstract

Aquesta tesi explora l’aplicació de tecnologies en l’àmbit de l’audiodescripció per tal de semiautomatitzar-ne el procés. D’una banda, s’implementa la síntesi de parla en la locució de l’audiodescripció en català i, de l’altra, s’aplica la traducció automàtica amb postedició a audiodescripcions angleses per obtenir guions audiodescriptius en català. Quant a la síntesi de parla, s’avaluen veus naturals i veus sintètiques disponibles en català (5 de masculines i 5 de femenines per a cada categoria) mitjançant un qüestionari autoadministrat basat principalment en les escales de notes mitjanes d’opinió (Mean Opinion Score, MOS) de la recomanació P.85 de l’UIT-T per a l’avaluació subjectiva de la qualitat de la parla sintètica. Així, els participants avaluen les veus tenint en compte diferents ítems (impressió general, accentuació, pronunciació, pauses discursives, entonació, naturalitat, agradabilitat, esforç d’escolta i acceptació). Les veus que obtenen els millors resultats de cada categoria es fan servir llavors per avaluar la recepció per part de persones cegues o amb baixa visió d’audiodescripcions locutades amb veu sintètica en comparació amb audiodescripcions enregistrades amb veu natural. Tant les dades quantitatives com les qualitatives obtingudes mostren que les persones cegues o amb baixa visió prefereixen que l’audiodescripció es locuti amb veus naturals, més que no pas mitjançant sistemes de síntesi de parla, ja que les veus naturals obtenen puntuacions estadísticament superiors a les veus sintètiques. Tot i això, els usuaris finals accepten l’audiodescripció amb veu sintètica (94% dels participants) com a solució alternativa, i de fet un 20% dels subjectes sosté que la veu preferida de les quatre que s’avaluen és una de sintètica. Pel que fa a la traducció automàtica, s’avaluen cinc motors de traducció automàtica en línia i disponibles gratuïtament de l’anglès al català per tal de determinar quin és el més adequat per a l’audiodescripció. S’avaluen les versions traduïdes automàticament i l’esforç de postedició mitjançant vuit puntuacions diferents, que inclouen tant opinions humanes (temps, necessitat i dificultat de postedició, i adequació, fluïdesa i classificació de les versions traduïdes automàticament) i mesures automàtiques (HBLEU i HTER). Els resultats mostren que hi ha clares diferències pel que fa a qualitat entre els sistemes avaluats i que un (Google Translate) és el que obté millors puntuacions en sis de les vuit mesures emprades per a l’avaluació. Un cop seleccionat el motor que obté millors resultats, es compara l’esforç, tant objectiu com subjectiu, en tres situacions diferents: en la creació de zero d’una audiodescripció, en la traducció manual d’una audiodescripció, i en la postedició d’una audiodescripció traduïda automàticament. Els resultats indiquen que l’esforç objectiu de postedició és inferior que el de crear una audiodescripció ex novo i que traduir-la manualment, tot i que l’esforç subjectiu es percep com a superior en la tasca de postedició.

en_US

dc.description.abstract

This PhD thesis explores the application of technologies to the audio description field with the aim to semi-automatise the process in two ways. On the one hand, text-tospeech is implemented to the voicing of audio description in Catalan and, on the other hand, machine translation with post-editing is applied to the English audio descriptions to obtain Catalan AD scripts. In relation to TTS, a selection of available synthetic and natural voices in Catalan (5 masculine ones and 5 feminine ones for each category) is assessed by means of a selfadministered questionnaire mainly based on the ITU-T P.85 Standard Mean Opinion Score (MOS) scales for the subjective assessment of the quality of synthetic speech. Thus, participants assess the voices taking into account various items (overall impression, accentuation, pronunciation, speech pauses, intonation, naturalness, pleasantness, listening effort, and acceptance). The voices obtaining the best scores for each category are then used to assess the reception of text-to-speech audio descriptions compared to human-voiced audio descriptions by blind and visually impaired persons. Both quantitative and qualitative data obtained show that the preferential choice of blind and partially sighted persons is the audio description voiced by a human, rather than by a speech synthesis system since natural voices obtain statistically higher scores than synthetic voices. However, TTS AD is accepted by end users (94% of the participants) as an alternative acceptable solution, and 20% of the respondents actually state that their preferred voice from the four under analysis is a synthetic one. As regards MT, a selection of five available free on-line machine translation engines from English into Catalan is evaluated in order to determine which is the most suitable for audio description. Their raw machine translation outputs and the post-editing effort involved are assessed using eight different scores, including human judgments (PE time, PE necessity, PE difficulty, MT output adequacy, MT output fluency and MT output ranking) and automatic metrics (HBLEU and HTER). The results show that there are clear quality differences among the systems assessed and that one of them (Google Translate) is the best rated in six out of the eight evaluation measures used. Once the best performing engine is selected, the effort, both objective and subjective, involved in three scenarios is compared: the effort of creating an audio description from scratch (AD creation), of manually translating an audio description (AD translation), and of post-editing a machine-translated audio description (AD PE). The results show that the objective post-editing effort is lower than creating an AD ex novo and manually translating it, although the subjective effort is perceived to be higher for the post-editing task.

en_US

dc.format.extent

380 p.

en_US

dc.format.mimetype

application/pdf

dc.language.iso

eng

en_US

dc.publisher

Universitat Autònoma de Barcelona

dc.rights.license

L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.rights.uri

http://creativecommons.org/licenses/by-nc-nd/4.0/

dc.source

TDX (Tesis Doctorals en Xarxa)

dc.subject

Audiodescripció

en_US

dc.subject

Audiodescripción

en_US

dc.subject

Audio description

en_US

dc.subject

Síntesi de parla/síntesis de habla

en_US

dc.subject

Speech synthesis

en_US

dc.subject

Traducció automàtica

en_US

dc.subject

Traducción automática

en_US

dc.subject

Machine translation

en_US

dc.subject.other

Ciències Socials

en_US

dc.title

Audio description and technologies. Study on the semi-automatisation of the translation and voicing of audio descriptions

en_US

dc.type

info:eu-repo/semantics/doctoralThesis

dc.type

info:eu-repo/semantics/publishedVersion

dc.subject.udc

en_US

dc.contributor.authoremail

anna.torne@gmail.com

en_US

dc.contributor.director

Matamala, Anna

dc.embargo.terms

cap

en_US

dc.rights.accessLevel

info:eu-repo/semantics/openAccess

Documents

aft1de2.pdf

1.683Mb PDF

aft2de2.pdf

6.379Mb PDF

Aquest element apareix en la col·lecció o col·leccions següent(s)

Departament de Telecomunicació i Enginyeria de Sistemes [58]