Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inKim, Hyounghun
TitelMultimodal and Embodied Learning with Language as the Anchor
Quelle(2022), (171 Seiten)
PDF als Volltext Verfügbarkeit 
Ph.D. Dissertation, The University of North Carolina at Chapel Hill
Spracheenglisch
Dokumenttypgedruckt; online; Monographie
ISBN979-8-4387-9847-7
SchlagwörterHochschulschrift; Dissertation; Video Technology; Task Analysis; Language Role; Artificial Intelligence; Information Sources; Visual Aids; Benchmarking; Navigation (Information Systems); Scoring; Human Body; Guidance; Editing; Computer Software; Multimedia Materials; Problem Solving
AbstractSince most worldly phenomena can be expressed via language, language is a crucial medium for transferring information and integrating multiple information sources. For example, humans can describe what they see, hear and feel, and also explain how they move with words. Conversely, humans can imagine scenes, sounds, and feelings, and move their body from language descriptions. Therefore, language plays an important role in solving machine learning (ML) and artificial intelligence (AI) problems with multimodal input sources. This thesis studies how different modalities can be integrated with language in multimodal learning settings as follows. First, we explore the possibility to integrate external information from the textual description about an image into a visual question answering system which integrates the key words/phrases in paragraph captions in semi-symbolic form, to make the alignment between features easier. We expand the direction to a video question answering task. We employ dense captions, which generate object-level descriptions of an image, to help localize the key frames in a video clip for answering a question. Next, we build benchmarks to evaluate embodied agents to perform tasks according to natural language instruction from humans. We introduce a new instruction-following navigation and object assembly system, called ArraMon in which agents follow the natural language instructions to collect an object and put it in a target location, requiring agents to deeply understand referring expressions and the concept of direction from the egocentric perspective. We also suggest a new task setup for the useful Cooperative Vision-and-Dialog Navigation (CVDN) dataset. We analyze scoring behaviors of models and find issues from the existing Navigation from Dialog History (NDH) task and propose a more realistic and challenging task setup, called NDH-Full which better appreciates the purpose of the CVDN dataset. Finally, we explore AI assistant systems which help humans with different tasks. We introduce a new correctional captioning dataset on human body pose, called FixMyPose, to encourage the ML/AI community to build such guidance systems that require models to learn to distinguish different levels of pose difference to describe desirable pose change. Also, we introduce a new conversational image search and editing assistant system, called CAISE, in which an agent helps a user to search images and edit them by holding a conversation. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.] (As Provided).
AnmerkungenProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Erfasst vonERIC (Education Resources Information Center), Washington, DC
Update2024/1/01
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Die Wikipedia-ISBN-Suche verweist direkt auf eine Bezugsquelle Ihrer Wahl.
Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: