Analyzing Microtext

AAAI 2013 Spring Symposium - March 25-27 - Stanford, California

Symposium ProgramOpen DiscussionPublications

What is Microtext?

Microtext are short snippets of text found in many modes of communication: microblogs (e.g., Twitter, Plurk), Short Message Streams (SMS), chat (e.g., instant messaging, Internet Relay Chat), and transcribed conversations (e.g., FBI hostage negotiations). Microtext often has the characteristics of informality, brevity, varied grammar, frequent misspellings (both accidental and purposeful), and usage of abbreviations, acronyms, and emoticons. With more conversational forms of microtext such as multiparticipant chat, there are also entangled conversation threads. These characteristics create many difficulties for analyzing and understanding microtext, often causing traditional NLP techniques to fail.

Research on microtext is becoming increasingly necessary given the explosion of on-line microtext language. Yet, very few suitable tools have been developed for analyzing it. Also, there are few sufficiently-large publicly-available data sets (such as the Twitter corpus). Currently, most NLP tools are designed to deal with grammatical, properly spelled and punctuated language corpora. However, the reality is that a vast portion of online data does not conform to the canons of standard grammar and spelling. There is a growing need for specialized tools that tolerate noisy and fragmented microtext. Bringing together researchers from various fields to discuss microtext analysis will pave the way towards bringing the NLP methods, tools, and corpora in line with the current needs of the NLP community in academia, industry, and government.

Symposium Goals

This symposium will provide a multi-day forum to bring together researchers from different communities who have an interest in analyzing microtext: artificial intelligence, machine learning, computational linguistics, information retrieval, linguistics, human-computer interaction, education, and the social sciences. It will provide enough time for the different communities to present their perspectives and methodologies, to learn one another's terminology and techniques, and to begin to form connections that will hopefully lead to fruitful collaborations.


  • Identification of message characteristics (e.g., relevancy, centrality, repeatability, trustworthiness)
  • Creation of participant profile (e.g., age, gender, expertise topics, emotional states, social roles)
  • Author attribution
  • Topic detection and monitoring
  • Topic-to-subtopic decomposition and topic stage evolution tracking and prediction
  • Thread summarization
  • Modeling of influence and attitude changes
  • Corpus creation
  • Language structure (e.g., part of speech, dialogue acts, speech acts)
  • Visualization

Publications, Presentations, & Posters

AAAI has published the proceedings of this symposium, found here.

The following authors and invited speakers have allowed for the presentations and posters to be published online:

Invited Talks

  • Macskassy, S.A. Social Media Analytics: Text Mining for User Profiles and More [Slides]


  • Liu, W., & Ruths, D. What's in a Name? Using First Names as Features for Gender Inference in Twitter [Slides] [Poster]
  • Mikros, G., & Perifanos, K. Authorship Attribution in Greek Tweets Using Author's Multilevel N-gram Profiles [Slides] [Poster]
  • Petkovic, D., & Ringsquandl, M. Analyzing Political Sentiment on Twitter [Slides] [Poster]
  • Schwartz, H.A., Eichstaedt, J., Dziurzynski, L., Blanco, E., & Ungar, L.H. Toward Personality Insights from Language Exploration in Social Media [Slides] [Poster]
  • Uthus, D.C., & Aha, D.W. The Ubuntu Chat Corpus for Multiparticipant Chat Analysis [Slides] [Poster]


Interested participants should submit papers (8 pages maximum) in AAAI-style via EasyChair. We welcome papers describing completed work, work-in-progress, interesting ideas even though they may not be completely worked through, and discussion pieces.


Paper submission: October 5, 2012
Acceptance Notification: November 7, 2012
Camera-ready Copies: January 18, 2013
Symposium: March 25-27, 2013


Twitter: @microtext2013

Organizing Commitee


Eduard Hovy (Carnegie Mellon University)
Vita Markman (Disney Interactive Media Group)
Craig Martell (Naval Postgraduate School)
David Uthus (National Research Council and Naval Research Laboratory)

Program Committee

David Aha (Naval Research Laboratory)
Kyle Dent (PARC)
Mark Drezde (John Hopkins University)
Andrew Duchon (Aptima)
Jeffrey Ellen (SPAWAR-PACIFIC)
Micha Elsner (Ohio State University)
Jennifer Foster (Dublin City University)
Fei Liu (Bosch Research)
Sofus Macskassy (USC Information Sciences Institute)
Donald Metzler (Google)
Leaora Morgenstern (Science Applications International Corporation)
Jim Nagy (Air Force Research Laboratory)
Douglas Oard (University of Maryland, College Park)
Sowmya Ramachandran (Stottler Henke Inc.)
Alan Ritter (University of Washington)
Sara Owsley Sood (Pomona College)
Joel Young (Naval Postgraduate School)

Keynote Speakers

Sofus Macskassy, USC Information Sciences Institute
Noah Smith, Carnegie Mellon University

Invited Panel

Rachel Greenstadt, Drexel University
Susan Herring, Indiana University
Bernardo Huberman, HP Labs
Alek Kolcz, Twitter


PDF format
Open Discussion Instructions

Updated: 19th April, 2013 ⚙ Copyright © 2012-13 SAM2013 Co-ChairsValid HTML 4.01 Strict