A key enabling technology for next-generation robots for the service, domestic and entertainment market is Human-Robot-Interaction. A robot that collaborates with humans on a daily basis – be this in care applications, in a professional or private context – requires interactive skills that go beyond keyboards, button clicks or metallic voices. For this class of robots, human-like interactivity is a fundamental part of their functionality.
INDIGO aims to develop human-robot communication technology for intelligent mobile robots that operate and serve tasks in populated environments. In doing so, the project will involve technologies from various sectors, and will attempt to introduce advances in respective areas, i.e. natural language interaction, autonomous navigation, visual perception, dialogue systems, and virtual emotions. The project will address human-robot communication from two sides: by enabling robots to correctly perceive and understand natural human behaviour and by making them act in ways that are familiar to humans.
The INDIGO agenda - scientific and technological objectives
In order to achieve the desired overall goals, INDIGO will exploit, integrate and extend a number of enabling technologies that have recently become available. More specifically, INDIGO will conduct RTD activities as follows:
Evidently, all the above topics are open-ended research issues, and therefore RTD activities within them may virtually encompass any possible research direction. Given the limited resources of the current project, we’re going to tackle the research tasks at hand in specific ways, avoiding generalizations which may eventually lead to dead ends in particular project tasks. Sufficient details for the proposed approaches are given in the following sections, and in particular in section 7 that describes analytically the RTD course of INDIGO.
- Robotic hardware that includes a robotic base with appropriate mobility capabilities and a mechanical head capable of mimicking human emotions and facial expressions, achieving eye contact, and supporting a naturalistic spoken conversation. The robotic hardware will include adequate sensory apparatus to facilitate all INDIGO’s navigation and interaction competences, enough processing power to perform a number of crucial (i.e. safety-related) tasks on-board and enough communication capabilities in order to communicate, for off-board processing, all information that is required for executing computationally intensive tasks. Besides the operational functionality, a major objective for the robotic hardware design is related to the overall aesthetics of the robot which has to be as friendly and appealing to the human user as possible.
- Multilingual speech recognition technology to recognize an adequate number of simple spoken phrases in potentially noisy, densely-populated environments, and advanced speech synthesis, capable of producing spoken output of very high quality, including near-human prosody. Regarding INDIGO’s speech recognition modules, they will support relatively simple user utterances in the context of system-initiative task-specific human-robot dialogues. Besides speech recognition, text-to-speech technology will be deployed, which will take into account emotion-denoting and other intonation-related markup in the input texts. To demonstrate multilingual support, two natural languages will be supported: English and Greek.
- Natural language generation technology that is able to produce multi-sentence, coherent natural language descriptions of objects in multiple languages (English and Greek, in the context of INDIGO) from a single semantic representation; the resulting descriptions will be annotated with prosodic markup that drives the speech synthesizers, and they will vary dynamically, both in terms of contents and language expressions, depending on the interaction history (e.g., comparing to previously given information), the background and interests of the visitors, etc.
- Robust natural language interpretation (for Greek, English), to extract semantics from natural language expressions. Full semantic interpretation is beyond current technology. In INDIGO, however, it is feasible to extract reasonable semantic indicators from the users' utterances by developing a shallow interpretation approach inspired by question-answering (QA) for document collections. QA systems typically determine the type of each request (e.g., asking for a person, location, time), and rely on the recognition of entity names (e.g., person names, names of historical periods) and phrases that suggest interest to particular properties (e.g., the creator of an entity), to figure out what the user wishes to be told. A similar strategy is possible in INDIGO, because the generation resources include a domain hierarchy of entity types, the names of the entities, nouns that can be used to refer to each entity type, and phrases that can express each property. These resources can be used to identify the entities and properties an utterance relates to. Furthermore, information about the current state of the dialogue can suggest the most likely semantics of an utterance. It is also possible to guide the user towards requests the system can comprehend, for example by comparing to unseen exhibits the system wants to talk about.
- Visual perception capabilities, including detection and tracking of humans, identification of groups of humans, speaker detection/tracking and identification of hand and face gestures. For people detection and tracking, INDIGO’s goal is to develop a multi-resolution, multi-hypothesis tracker that will be able to reliably track multiple people in the vicinity of the robot even in difficult situations such as when people are partially occluded or when people form groups. Additionally, the robot will also be able to recognize and distinguish between the faces and the hands of people around it. It will also be able to “lock” to a single person (in case of a dialogue), and robustly track its face and hands even if the person is moving (the robot may have to adjust the direction of its head and/or body so that it always faces the person it has a dialogue with. Besides tracking, appropriate algorithms will be utilized in order to recognize and interpret a simple set of hand gestures, and some simple facial features and/or expressions. The set of hand-gestures that the system will be able to recognize will include the pointing gesture (“come here” and “what is this?” command), the “stop” gesture and a number of gestures that will be used to respond to simple questions (e.g. “yes”, “no”, “ok”). Moreover, facial features will be used for speaker detection but may also be used to detect simple emotions and/or identify some speaker characteristics such as whether a user has a moustache or a beard, if he/she wears glasses, etc that will be used in order to initiate specific scenario-based dialogs aimed to impress the speaker.
- Advanced mobility and navigation capabilities in order to achieve behaviours that resemble similar capabilities and behaviours of humans. INDIGO will customize and extend existing techniques in order to develop the navigation modules needed to allow the system to timely and safely arrive at its target positions, to avoid obstacles, to move around objects of interest, to localize itself in the environment and to update the model of the environment, if changes are detected. Research will also be carried out in order to achieve (re)action behaviours such as moving at the speed of humans when guiding them, addressing a person when reacting to a question, collectively addressing a group when offering a guided tour, etc.
- Appropriate modelling of user profiles and personality of INDIGO robots, which may include knowledge about the interests of different types of users and the interaction history with certain users. That is, in order, to allow the robots to adapt their behaviour, INDIGO researchers will develop mechanisms that will be used to model the backgrounds, interests and interaction histories of the people interacting with the robots. Moreover, besides people, appropriate mechanisms to represent and keep track of the personalities, knowledge, verbal and non-verbal competences of individual INDIGO robots will be developed as well.
- Multimodal dialogue management capabilities involving and combining input and output from various interaction modalities and technologies, such as speech recognition and synthesis, natural language interpretation and generation, recognition/response of/to human actions, gestures and emotions, and facial expressions. INDIGO’s multi-modal dialogue controller will be the heart of its human-robot interaction technology. It will invoke the speech recognizer and the natural language interpretation module to recognize speech and extract semantic representations from the user's spoken utterances, Moreover, it will also anchor these representations in the context of the physical surroundings and the discourse history, taking also into account non-verbal information contributed by the gesture and facial feature recognizers, as well as the modules that track the positions of the users.
Integration and evaluation of the resulting technology
The developed technologies will be integrated in a prototype robot. More specifically, INDIGO will aim at integrating navigational, visual, emotional, and natural language speech skills, in a single robotic platform with human-like interactive characteristics. This will consist a main thrust of the projects’ research efforts, aiming to implement a novel robotic platform furnished with advanced human-robot communication capabilities, such as conducting contextually-relevant and meaningful simple dialogue (interpretation and generation), interpreting gestures and other human-features, achieving eye contact, following and leading people, thereby adapting to their pace, etc.
The resulting system will be evaluated in real-world conditions; it will be deployed in an exhibition-like context in collaboration with the end user partner FHW. There, the robot will operate fully autonomously for a long period of time in a densely populated public space.
The robot’s interaction with people will be monitored and analyzed, allowing us to perform a realistic assessment of the technology's merit and its exploitation potential. Intermediate versions of the prototype robotic system and its individual components will also be subjected to detailed evaluations (in a spiral development model), as explained in the following sections.