The 2nd AMICUS Workshop will be held on Thursday October20, 2011

Antal van den Bosch (Radboud University Nijmegen)

10:45-11:15 Thierry Declerck (DFKI Saarbruecken, Germany, and ICLTT, Vienna, Austria) and Piroska Lendvai (Hungarian Academy of Sciences, Research Institute for Linguistics)

Linguistic and semantic enrichment of Stith Thompson's Motif-Index of Folk Literature

In this work we aim at providing a taxonomical or ontological "upgrade" of the Thompson Motif-Index of Folk Literature (TMI) for supporting the corresponding automatized semantic annotation/indexation of folktales. As a first step for this we propose a linguistic annotation of the labels (the motifs) used in the classification of Thomson, associating thus no longer strings of the label with the classes (or Indexes), but linguistic objects. Those linguistic objects serve as the interface for mapping folk tale texts to the TMI resource. We also expect our approach to support a multilingual extension of TMI. Our work build on CTL (Declerck & Lendvai, LREC 2010) and on the lemon model for the representation of lexicon information in ontologies.

11:15-11:45 Sándor Darányi (University of Borås, Swedish School of Library and Information Science, Borås, Sweden), and László Forró (Abádszalók, Hungary)

Detecting multiple motif co-occurrences in the Aarne-Thompson-Uther tale type catalog: A preliminary survey

Catalogs project subject field experience onto a multidimensional map which is then converted to a hierarchical list. In the case of the Aarne-Thompson-Uther Tale Type Catalog (ATU), this subject field is the global pattern of tale content defining tale types as canonical motif sequences. To extract and visualize such a map, we considered ATU as a corpus and analysed two segments of it, "Supernatural adversaries" (types 300-399) in particular and "Tales of magic" (types 300-749) in general. The two corpora were scrutinized for multiple motif co-occurrences and visualized by two-mode clustering of a bag-of-motif co-occurrences matrix. Findings indicate the presence of canonical content units above motif level as well. The organization scheme of folk narratives utilizing motif sequences is reminiscent of nucleotid sequences in the genetic code.

13:30-14:00 Theo Meder (Meertens Institute, Amsterdam, The Netherlands) and Antal van den Bosch (Radboud University Nijmegen, Nijmegen, The Netherlands)

Folktales as classifiable texts and motif sequences: The FACT and Tunes & Tales projects

As of October 2011, the Meertens Institute (Amsterdam) has announced the start of the e-Laboratory Oral Culture. The lab will start with two projects on the computational modeling of higher-level annotations and structures in folktales. In the Tunes & Tales project, which also has a musical component, both tales and folksongs are represented as (layered) sequences of motifs. Given this, can variations of orally transmitted and changed tales and tunes be recognized through their motif structures? The FACT project adds complementary knowledge by focusing on automatic classification of folktales by their international folktale type (based on the Aarne-Thompson-Uther index, among others) and on unsupervised clustering of folktales.

14:00-14:30 Anita de Waard (Elsevier Science Publishers)

Identifying rhetorical moves in scholarly text: Towards a model for scientific epistemic markup

In this talk, I'd like to discuss two efforts: 1) Ongoing work within the aegis of the W3C Health Care and Life Sciences group on scientific discourse; 2) Work with the University of Utrecht on defining linguistic markers for rhetorical moves in scientific text.

The first project focuses on providing a semantic system to allow epistemic markup of scientific and medical content. The current main use case aims to enable links between pharmaceutical Product Inserts and clinical research papers. The second project is an attempt to classify the key linguistic parameters that indicate truth value in biological research articles. We have defined 20 elementary Discourse Segments (roughly corresponding to a clause) and three linguistic markers to identify them: verb tense/mood/voice; verb class; and modality markers. In both cases, we hope these models will enable and facilitate automated/NLP systems to define core rhetorical components.

15:30-16:00 Peter Wittek (University of Borås, Swedish School of Library and Information Science, Borås, Sweden)

Encoding sequences of motifs: Moving towards concept combinations

Extracting and analyzing latent topics and motifs in text corpora made good progress over the years, and the computational time required has become acceptable as algorithms and computing hardware improved. The next frontier is combination of concepts, that is, higher-level conglomerates of content. Encoding these combinations and sequences asks for more sophisticated mathematical tools: there have been promising experiments with complex vectors, Hilbert spaces, tensors, tensor products, compressed tensor products, and convolution. The aim of this talk is to briefly overview the challenges and opportunities in this emergent field.

16:00-16:45 Mariët Theune (Twente University, Enschede, The Netherlands)

Invited talk: Fabulating stories with the Virtual Storyteller

In this talk I give an overview of our efforts to generate stories using an 'emergent narrative' approach, where stories emerge from the actions of autonomous intelligent agents. In our story generation system, the Virtual Storyteller, the actions of the story characters ('played' by intelligent agents) are captured in a causal network, based on story comprehension theory. This fabula representation forms the input for generation of a natural language story text (in Dutch), which can in turn be presented by a virtual human embodying the Storyteller. Our long-term aim is to build virtual agents that interact with humans as characters in an emergent story. A first step in this direction has recently been made with the development of a multi-user tabletop interface that allows for interactive recreation of (variations of) the story of Little Red Riding Hood.

