Intelligent analysis of text and video content for thematic retrieval and evaluation of accessibility for different user profiles
Speaker: Eleni Miltsakaki, Researcher at the Computer and Information Science Department and the Institute of Research in Cognitive Science of the University of Pennsylvania
Date: 31 July 2008 Time: 14:00-15:30
Location: "Stelios Orphanoudakis" Seminar Room, FORTH, Heraklion, Crete
Host: Prof. C. Stephanidis


The automatic analysis and categorization of web content has witnessed a booming interest due to increased availability of information in a wide variety of formats (txt, ppt, pdf, pictures, audio and movies, etc), content, genre and authorship. We present two intelligent search systems:

  1. Read-X, a tool that searches the web and performs in real-time a) html-free text extraction, b) classification for thematic content, and c) evaluation of expected reading difficulty. Currently, we take Read-X to its next step by modeling reader characteristics. Word frequencies built from a theme-labeled corpus are used to predict vocabulary difficulty relative to the reader's prior familiarity with thematic content.
  2. Intelligent video content analysis focusing on recovering scene structure in movies for object tracking and action retrieval (project led by Ben Taskar). A weakly supervised algorithm uses screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels.We use NLP techniques to a) retrieve descriptions of actions from the parsed text and b) resolve referential ambiguity in the screenplay. Text and movie alignment is used to label names of characters and common actions. The resulting annotations will be shown on the video of a popular TV series.


Eleni Miltsakaki is researcher at the Computer and Information Science Department and the Institute of Research in Cognitive Science of the University of Pennsylvania. Her BA degree in Linguistics is from the School of English Philology at the Aristotle University of Thessaloniki (1988).

She holds a Master's degree in Applied Linguistics from the University of Essex (1991) and did her PhD in Computational Linguistics at the University of Pennsylvania under the supervision of Ellen Prince and Aravind Joshi.

Her major publications are in the areas of discourse parsing, anaphora resolution and topic tracking, automated evaluation of textual coherence for essay scoring systems, and semantics and retrieval of discourse relations. She has played a key role in the development of the Penn Discourse Treebank, from its initial conception to its final release in January 2008. Her current research focuses on analysis of content accessibility (in text and video) for different user profiles.

Conditions of Use | Privacy Policy