Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inn/enChen, Zhen; Zhu, Peixi; Qiu, Wei; Guo, Jiajie; Li, Yike
TitelDeep Learning in Automatic Detection of Dysphonia: Comparing Acoustic Features and Developing a Generalizable Framework
QuelleIn: International Journal of Language & Communication Disorders, 58 (2023) 2, S.279-294 (16 Seiten)
PDF als Volltext Verfügbarkeit 
ZusatzinformationORCID (Li, Yike)
Spracheenglisch
Dokumenttypgedruckt; online; Zeitschriftenaufsatz
ISSN1368-2822
DOI10.1111/1460-6984.12783
SchlagwörterVoice Disorders; Acoustics; Mandarin Chinese; German; Models; Vowels; Feasibility Studies; Screening Tests; Artificial Intelligence
AbstractBackground: Auditory-perceptual assessment of voice is a subjective procedure. Artificial intelligence with deep learning (DL) may improve the consistency and accessibility of this task. It is unclear how a DL model performs on different acoustic features. Aims: To develop a generalizable DL framework for identifying dysphonia using a multidimensional acoustic feature. Methods & Procedures: Recordings of sustained phonations of /a/ and /i/ were retrospectively collected from a clinical database. Subjects contained 238 dysphonic and 223 vocally healthy speakers of Chinese Mandarin. All audio clips were split into multiple 1.5-s segments and normalized to the same loudness level. Mel frequency cepstral coefficients and mel-spectrogram were extracted from these standardized segments. Each set of features was used in a convolutional neural network (CNN) to perform a binary classification task. The best feature was obtained through a five-fold cross-validation on a random selection of 80% data. The resultant DL framework was tested on the remaining 20% data and a public German voice database. The performance of the DL framework was compared with those of two baseline machine-learning models. Outcomes & Results: The mel-spectrogram yielded the best model performance, with a mean area under the receiver operating characteristic curve of 0.972 and an accuracy of 92% in classifying audio segments. The resultant DL framework significantly outperformed both baseline models in detecting dysphonic subjects on both test sets. The best outcomes were achieved when classifications were made based on all segments of both vowels, with 95% accuracy, 92% recall, 98% precision and 98% specificity on the Chinese test set, and 92%, 95%, 90% and 89%, respectively, on the German set. Conclusions & Implications: This study demonstrates the feasibility of DL for automatic detection of dysphonia. The mel-spectrogram is a preferred acoustic feature for the task. This framework may be used for vocal health screening and facilitate automatic perceptual evaluation of voice in the era of big data. (As Provided).
AnmerkungenWiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Erfasst vonERIC (Education Resources Information Center), Washington, DC
Update2024/1/01
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Bibliotheken, die die Zeitschrift "International Journal of Language & Communication Disorders" besitzen:
Link zur Zeitschriftendatenbank (ZDB)

Artikellieferdienst der deutschen Bibliotheken (subito):
Übernahme der Daten in das subito-Bestellformular

Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: