Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inn/enYousef, Ahmed M.; Deliyski, Dimitar D.; Zacharias, Stephanie R. C.; de Alarcon, Alessandro; Orlikoff, Robert F.; Naghibolhosseini, Maryam
TitelA Deep Learning Approach for Quantifying Vocal Fold Dynamics during Connected Speech Using Laryngeal High-Speed Videoendoscopy
QuelleIn: Journal of Speech, Language, and Hearing Research, 65 (2022) 6, S.2098-2113 (16 Seiten)
PDF als Volltext Verfügbarkeit 
ZusatzinformationORCID (Naghibolhosseini, Maryam)
Spracheenglisch
Dokumenttypgedruckt; online; Zeitschriftenaufsatz
ISSN1092-4388
SchlagwörterVoice Disorders; Speech; Video Technology; Equipment; Automation; Artificial Intelligence; Ohio (Cincinnati)
AbstractPurpose: Voice disorders are best assessed by examining vocal fold dynamics in connected speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV), which enables us to study vocal fold mechanics with high temporal details. Analysis of vocal fold vibration using HSV requires accurate segmentation of the vocal fold edges. This article presents an automated deep-learning scheme to segment the glottal area in HSV from which the glottal edges are derived during connected speech. Method: Using a custom-built HSV system, data were obtained from a vocally healthy participant reciting the "Rainbow Passage." A deep neural network was designed for glottal area segmentation in the HSV data. A recently introduced hybrid approach by the authors was utilized as an automated labeling tool to train the network on a set of HSV frames, where the glottis region was automatically annotated during vocal fold vibrations. The network was then tested against manually segmented frames using different metrics, intersection over union (IoU), and Boundary F1 (BF) score, and its performance was assessed on various phonatory events on the HSV sequence. Results: The designed network was successfully trained using the hybrid approach, without the need for manual labeling, and tested on the manually labeled data. The performance metrics showed a mean IoU of 0.82 and a mean BF score of 0.96. In addition, the evaluation assessment of the network's performance demonstrated an accurate segmentation of the glottal edges/area even during complex nonstationary phonatory events and when vocal folds were not vibrating, thus overcoming the limitations of the previous hybrid approach that could only be applied to the vibrating vocal folds. Conclusions: The introduced automated scheme guarantees accurate glottis representation in challenging color HSV data with lower image quality and excessive laryngeal maneuvers during all instances of connected speech. This facilitates the future development of HSV-based measures to assess the running vibratory characteristics of the vocal folds in speakers with and without voice disorder. (As Provided).
AnmerkungenAmerican Speech-Language-Hearing Association. 2200 Research Blvd #250, Rockville, MD 20850. Tel: 301-296-5700; Fax: 301-296-8580; e-mail: slhr@asha.org; Web site: http://jslhr.pubs.asha.org
Erfasst vonERIC (Education Resources Information Center), Washington, DC
Update2024/1/01
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Bibliotheken, die die Zeitschrift "Journal of Speech, Language, and Hearing Research" besitzen:
Link zur Zeitschriftendatenbank (ZDB)

Artikellieferdienst der deutschen Bibliotheken (subito):
Übernahme der Daten in das subito-Bestellformular

Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: