Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inMukund, Smruthi
TitelAn NLP Framework for Non-Topical Text Analysis in Urdu--A Resource Poor Language
Quelle(2012), (233 Seiten)
PDF als Volltext Verfügbarkeit 
Ph.D. Dissertation, State University of New York at Buffalo
Spracheenglisch
Dokumenttypgedruckt; online; Monographie
ISBN978-1-2674-5770-7
SchlagwörterHochschulschrift; Dissertation; Guidelines; Urdu; Natural Language Processing; Cues; Language Role; Cultural Awareness; Emotional Response; Electronic Publishing; Classification; English; Phonemes; Computational Linguistics; Written Language; Search Strategies; News Reporting; Sociocultural Patterns; Language Usage; Semitic Languages; Correlation; Gender Differences; Code Switching (Language); Discourse Analysis; Public Opinion
AbstractLanguage plays a very important role in understanding the culture and mindset of people. Given the abundance of electronic multilingual data, it is interesting to see what insight can be gained by automatic analysis of text. This in turn calls for text analysis which is focused on non-topical information such as emotions being expressed that is in contrast to topical text analysis designed to elicit factual information or classify documents into subject categories. Non-topical tasks such as sentiment analysis or emotion detection are dependent on identifying several useful linguistic cues or indicators and go beyond the bag of words model. Performing such tasks is additionally challenging when the text is written in a language such as Urdu. This is due to: (i) the paucity of annotated Urdu data, and (ii) the lack of natural language processing tools to preprocess text and extract useful features. The tasks of interest in Urdu NLP include analyzing data sources such as blogs and comments to news articles, which in turn provide insight into social and human behavior. All of this requires a robust NLP system. The first objective of this work is to develop an NLP infrastructure for Urdu that is customizable and capable of providing basic analysis on which more advanced information extraction tools can be built. Novel techniques based on bootstrap learning and resource sharing are developed to augment available annotated Urdu data needed to train the learning models. A unique Urdu-to-English named-entity transliteration method based on phoneme alignments is also provided to enable faceted search using entities keyed in Latin script. Each of the new Urdu text processing modules is further integrated into a general text-mining platform for future ease of use. The second objective of this work is to detect emotions in Urdu newswire data. In the process, interesting socio-cultural aspects of language usage, such as the marked use of formal Arabic words when expressing intense emotions and the correlation between gender and emotion being expressed are exposed. To facilitate such discoveries, we provide an annotated Urdu newswire corpus for emotion detection using the newly developed language specific non-topical annotation guidelines. Language specific features, resources borrowed from other languages and co-training techniques are leveraged to generate modules needed to quantify subjective cues. Novel methods that identify opinion entities, intensity of the opinions and contexts in which the opinions are expressed are also illustrated. Our analyses provide valuable insights into how language usage frames the reporting of news and thereby influences readers. The work here is not limited to only Urdu newswire data. Novel techniques to generate part of speech information and sentiment polarity in blog data exhibiting code-mixing and code-switching behavior are also illustrated. The work reported here advances the state of the art in both Urdu NLP and non-topical analysis; much of the newly developed framework can be extended to other Indic languages as well. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.] (As Provided).
AnmerkungenProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Erfasst vonERIC (Education Resources Information Center), Washington, DC
Update2017/4/10
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Die Wikipedia-ISBN-Suche verweist direkt auf eine Bezugsquelle Ihrer Wahl.
Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: