Session 3-1

Language Processing Pipeline for Narrative Emergence: Digging into Human Rights Violations

  • Ben Miller (Georgia State University)
  • Jennifer Olive (Georgia State University)
  • Ayush Shrestha (Georgia State University)
  • Nicolas Subtirelu (Georgia State University)
  • Jin Zhao (Georgia State University)
  • Yanjun Zhao (Georgia State University)

Narratives of survival begin, according to the Russian Formalists, with the basic elements of event, person, location, and time. Computationally enabling this argument by automatically identifying these elements within a narrative then automatically correlating their correspondences across a corpus of such stories can algorithmically elicit new narratives. These new narratives run transversely through a corpus and can stitch together stories spanning many hundreds to many tens of thousands of individual narratives. Emerging from the cross-document connections made among the narratological elements, these new narratives can reveal the stories of victims and perpetrators who had not previously offered their testimony of an event. Our paper presents a new language processing pipeline integrating various low-level natural language processing (NLP) tools for the recognition of entities, spatiality, and temporality; a machine learning component for fuzzy matching their outputs; and a data visualization component for cleaning the resulting correlations to explore higher-level humanities problematics.

Mieke Bal in Narratology: Introduction to the Theory of Narrative[1] terms the Formalist's essential components of narrative as fabula, which precedes and facilitates the organizational work of narrative, or syuzhet. NLP tools, when combined, provide the processes for encoding the various fabulaic elements for extraction and incorporation into other analyses. Although the fabula/syuzhet schema has been well criticized, it serves as a good foundation for testing the possibilities of algorithmically produced transversal narratives and experimenting in methods for cross-document co-reference.

Our paper will demonstrate this methodology, developed for “Digging into Human Rights Violations,” with selected data from that project: corpora of heterogeneous documents related to the specific traumatic events surrounding the 9/11 attacks in New York, consisting of 511 first responder witness statements collected in the months following the attacks, and the longer historical trauma of apartheid in South Africa, consisting of documentation from a three-year Truth and Reconciliation Commission that heard from 22,000 witnesses and received 7,400 amnesty petitions in 12 languages. Our pipeline consists of paralleling existing modules, which reinforces the sequencing for the various textual markup tasks necessary to constitute a fabula. By parallel, we mean that multiple modules process the text for each fabular element. For example, both Stanford CoreNLP's SUTime and Brandeis' TARSQI will identify lexical indicators of temporality. The pipeline marks up language indicative of named and unnamed entities such as “Chief Ganci” or “The Lieutenant”; anaphora resolution of entities; classification of named and unnamed entities into traditional NLP categories of Person, Location, Organization, and Unknown; location entities for look up in external GIS resources and internal gazetteers; and absolute and relative temporal indicators such as “the morning of May 7” or “after that.” Following multiple layers of parallel markup, machine learning algorithms chunk the marked text into fabulaic units of person at place at time. These units are visualized using techniques developed for this project, called StoryGraph and StoryGram, showing the movements of actors across the spatiotemporal geography of a corpus and providing a mechanism for the end-user to clean the data.

computational narrative, collective memory, text mining, natural language processing, digging into data


  1. Bal, Mieke: Narratology: Introduction to the Theory of Narrative, Trans. Christine van Boheemen, Buffalo, NY: U of Toronto P (1985). Print.