If you are reading this article, it is very likely that you are interested in robots, artificial intelligence or technology-related applications. And even if this is not the case, there are chances that you use a computer more than once a week to facilitate your work (and your life in general). There is even greater chance that you do this through dedicated softwares. You may write a report, send an e-mail, check your budget on a spreadsheet, or process images from your digital camera, and doing so, you rely on existing user-friendly applications. Even if you are a programmer or know the technical details of how a computer works, you will start most of the tasks you want to achieve with the computer by opening the proper application instead of reprogramming the machine from scratch. This type of computer use seems obvious today, but at the beginning of the computer age, it was considered as obligatory and evident to be familiar with a programming language in order to make use of such a machine. From the apparition of the first personal computers, much time went by before end-users could finally do without programming skills to interact efficiently with their computers through simple (or at least quite simple) interfaces.
Performing tasks with a computer without knowing how its processor works sounds obvious today, but in the field of robotics, this path is unfortunately followed very slowly. A large amount of people today will not negate that robots are very useful in a wide range of applications, but many of them still do not conceive that these robots need to be used and reprogrammed by end-users as easily as computers. In other words, the common thinking is that, to use a robot to facilitate your task, you need to know how to program it.
Creating user-friendly systems to teach robots new skills is a challenging task that is at core of numerous research activities aiming at demonstrating that robots will probably have their place in our homes and offices in a couple of years. To accomplish this, we may take inspiration from the development of user-friendly applications in computer science, but we cannot simply re-apply this knowledge to robotics. Indeed, compared to a computer, the robot moves in its environment. Taking this embodiment into account is a key issue when designing the controllers required to interact with these robots. Compared to the use of a fixed 2-dimensional computer screen, creating applications for robots involves simultaneously controlling several degrees of freedom with versatile methods in order for the robot to be used in very different environments.
Robot Programming by Demonstration (PbD) covers methods by which a robot learns new skills through human guidance. Also referred to as learning by imitation or apprenticeship learning, PbD takes inspiration from the way humans learn new skills by imitation, thereby developing methods by which new skills can be transmitted to a robot. PbD covers a broad range of applications. In industrial robotics, the goal is to reduce the time and costs required to program the robot. The rationale is that PbD would allow the modification of an existing product, the creation of several versions of a similar product or the assembly of new products in a very rapid way, and this could be done by lay users without help from an expert in robotics. PbD is perceived as particularly useful when it comes to service robots, i.e., robots deemed to work in direct collaboration with humans. In this case, methods for PbD go beyond transferring skills and offer new ways for the robot to interact with the human, from being capable of recognizing people’s motion to predicting their intention and seconding them in the accomplishment of complex tasks. As the technology has improved to provide these robots with more and more complex hardware, including multiple sensor modalities and numerous degrees of freedom, robot control and especially robot learning have become increasingly complex. Learning control strategies for numerous degrees of freedom platforms deemed to interact in complex and variable environments, such as households, is faced with two key challenges: first, the complexity of the tasks to be learned is such that pure trial-and-error learning would be too slow. PbD thus appears to be a good approach for speeding up learning by reducing the search space, while still allowing the robot to refine its model of the demonstration through trial and error. Second, there should be a continuum between learning and control, so that control strategies can adapt on the fly to drastic changes in the environment. The present work addresses both challenges in investigating methods by which PbD is used to learn the dynamics of robot motion, and, by so doing, provide the robot with a generic and adaptive model of control.
Robot PbD is now a regular topic at the two major conferences on robotics (IROS and ICRA), as well as at conferences on related fields, such as Human-Robot Interaction (HRI, AISB) and Biomimetic Robotics (BIOROB, HUMANOIDS).
REVIEW OF ROBOT PROGRAMMING BY DEMONSTRATION (PBD)
The birth of programmable machines
Providing the possibility to program and re-program a machine very quickly became the most important requirement for efficiently utilizing this machine. The idea of programming machines appeared with mechanical automata long before the apparition of computers or modern robots,2 and it is difficult to determine when the first programmable automaton came into existence.
Heron of Alexandria, AD 10–70, (M. E. Rosheim, 1994) is one of the greatest experimenters of antiquity, having invented numerous user-configurable automated devices. He may be more famous as a mathematician than as an engineer, but some of his inventions are extraordinarily visionary in robotics.3 His Automaton Theater indeed describes theatrical constructions that moved by means of strings wrapped around rotating drums animating figures that acted out a series of dramatic events. Energy was provided by a weight resting on a hopper full of grain, which leaked out through a small hole in the bottom. As the weight gradually sank, it pulled a rope wound around an axle of the stand to turn its wheels, thereby creating movement. Along with the power of falling weights, these figures used the basic mechanical resources of wheels, pulleys, and levers to create a variety of motion by delivering power over a relatively long period.
Despite the possibility of fine-tuning the device by hand, it remains difficult to determine whether the mechanism was aimed at being reprogrammable to perform other plays. The first automaton that was reprogrammed every day can probably be credited to the Muslim inventor Al-Jazari who fabricated one of the earliest forms of programmable humanoid robots. His Castle Water Clock, described in 1206, was an elaborate machine designed to keep time (Hill, 1973). At regular intervals throughout the day, the machine was designed to carry out specific actions such as opening doors, letting falcons drop balls and actuating automaton musicians to show the passing hours. Behind the clock was an ingenious mechanism driving this sophisticated chain of events. Water drained out of a vertical tube under gravity to provide power to turn cams on a shaft, thus actuating the events. The medieval Islamic day began at sunset and the hours were counted between sunset and the end of twilight. Since the days became longer in summer or shorter in winter, the automaton had to be re-programmed to match the different lengths of hours, which was done every day.
The first self-propelled programmable robot can probably be attributed to Leonardo da Vinci during the Renaissance, with a mechanical lion robot offered in 1515 to Francis I, King of France. The automaton was said to be capable of walking a few steps across the floor, rising on its hindquarters, opening its chest and then delivering flowers (M. Rosheim, 2006). In 1495, Leonardo also designed a humanoid automaton in knight’s armor to entertain. This automaton was supposedly able to sit up, wave its arms and move its head, but it remains unknown if the design was ever built.
The first truly modern, digitally operated and programmable robot was invented by George Devol and Joseph Engelberger in 1954 and was called Unimate (Waurzyniak, 2006). This 4000-pound robotic arm was bought by General Motors and installed in 1961 in a plant in New Jersey, obeying step-by-step commands stored on a magnetic drum, in order to lift hot pieces of metal from a die casting machine and stacking them.
Early work of PbD in software development
Programming by Example (PbE) or Programming by Demonstration (PbD) appeared both in robotics and software development research in the early 80’s. In software development, PbD provided an intuitive and flexible way for the end-user to re-program a computer simulation without having to learn a computer language (Cypher et al., 1993; Smith, Cypher, & Spohrer, 1994; Mitchell, Caruana, Freitag, McDermott, & Zabowski, 1994). The idea emerged by observing that most of the end-users were skilled at using a computer and its associated interfaces and applications (edition skills) but were only able to reprogram them in a limited way (by setting preferences). Due to the restricted impact of the numerous attempts at creating simple languages for the end-users to re-program their computers, certain researchers focused on re-formulating the problem from its source. Instead of designing an improved syntax for a programming language, they took the perspective that the computer could learn differently by taking into consideration that most computer applications follow similar graphical user interface strategies (environments with windows, menus, icons and a pointer controlled by the mouse). By taking advantages of the user’s knowledge of the task (i.e., the fact that the user knows how to perform a task but not how to program it), they designed softwares that were able to extract rules simply by using the interfaces (Cypher et al., 1993; Lieberman, 2001). The syntax of the program was then hidden from the users who were only demonstrating the skill to the computer. Due to the restricted behaviors that can be applied to the considered graphical environment, the learned skills consisted mostly of discrete sets of “if-then” rules that were automatically extracted by demonstrating the tasks to the computer. It was thus possible to generalize the skill and reproduce the behavior under other circumstances by sharing similarities with the learned skill (Smith et al., 1994). For example, Mitchell et al. (1994) proposed to use feed-forward neural networks to learn the user’s preferences when utilizing a calendar, where the application was progressively able to automatically plan adequate meetings and appointments with respect to the end-user’s needs.
Early work of PbD in robotics
At the beginning of the 1980s, PbD started attracting attention in the field of manufacturing robotics. It appeared as a promising route for automating the tedious manual programming of robots and as a way of reducing the costs involved in the development and maintenance of robots in a factory.
An operator has implicit knowledge of the task to achieve. He/she knows how to do it but does not have the necessary programming skills or the time required to reconfigure the robot. Demonstrating how to achieve the task through examples would thus permit learning the skill without explicitly programming each detail. The PbD paradigm in robotics, however, generated new perspectives as opposed to PbD in software development by allowing the demonstrations and the reproduction attempts to be performed on different media. Indeed, when considering PbD in software development, the demonstrator shares the same embodiment as the one of the imitator (the application on the 2D computer screen). In contrast, by using different architectures to demonstrate and reproduce the skills (as may be the case for robotics applications), the issues related to the difference in embodiments are also introduced, later referred to as correspondence problems.
As a first approach to PbD, symbolic reasoning was commonly adopted in robotics with processes referred to as teach-in, guiding or play-back methods (Lozano-Perez, 1983; Dufay & Latombe, 1984; Levas & Selfridge, 1984; Segre & DeJong, 1985; Segre, 1988). In these studies, PbD was performed through manual (teleoperated) control. The position of the end-effector and the forces applied on the manipulated object were stored throughout the demonstrations together with the positions and orientations of the obstacles and of the target. This sensorimotor information was then segmented into discrete subgoals (keypoints along a trajectory, see Fig. 1.1) and into appropriate primitive actions to attain these subgoals. Primitive actions were commonly chosen to be simple point-to-point movements employed by the industrial robots at this time. Examples of subgoals included for instance the robot’s gripper orientation and position in relation to the goal. Consequently, the demonstrated task was segmented into a sequence of state-action-state transitions.
To take into account the variability of human motion and the noise inherent to the sensors capturing the movements, it appeared necessary to develop a method that would consolidate all demonstrated movements. For this purpose, the state-action-state sequence was converted into symbolic “if-then” rules, describing the states and the actions according to symbolic relationships, such as “in contact”, “close-to”, “move-to”, “grasp-object”, “move-above”, etc. Appropriate numerical definitions of these symbols (i.e., at what distance an object would be considered as “close-to” or “far-from”) were given as prior knowledge to the system. A complete demonstration was thus encoded in a graph-based representation, where each state constituted a graph node and each action a directed link between two nodes. Symbolic reasoning could subsequently unify different graphical representations for the same task by merging and deleting nodes (Dufay & Latombe, 1984).
Toward the use of machine learning techniques in PbD
The PbD methods presented above still employed direct repetition, which was useful in automation to reproduce an exact copy of the motion. In order to apply this concept to products with different variants or to transfer the programs to new robots, the generalization issue became crucial. To address it, the first attempts at generalizing the skill were mainly based on the help of the user through queries about his intentions (Heise, 1989; Friedrich, Muench, Dillmann, Bocionek, & Sassin, 1996). Following this, various levels of abstractions were proposed to resolve the generalization issue, basically dichotomized in learning methods at a symbolic level or at a trajectory level. A task at a symbolic level is described by the sequential or hierarchical organization of a discrete set of primitives that are pre-determined or extracted with pre-defined rules. A task at a trajectory level, on the other hand, is described by temporally continuous signals representing different configuration properties changing over time (e.g., current, torque, position, orientation). Different levels of abstraction can be utilized (the position of the end-effector, for instance, is a representation of a higher level than the joint angle trajectories, which are also of higher level as opposed to directly considering the current sent to the motors to set the robot to this specific posture).
Machine Learning (ML) appeared to be an appealing solution for dealing with the generalization issue. Several techniques proposed in ML could be tested with multiple examples of input/output (sensory/actuators) datasets that fit most of the frameworks developed in ML perfectly, while robotics benefitted from the ML ability to cope with multivariate data and generalization capabilities. It was thus possible to extend the record/replay process to one of generalization.
In industrial robotics, Muench, Kreuziger, Kaiser, and Dillmann (1994) suggested the use of ML algorithms to analyze Elementary Operators (EOs), defining a discrete set of basic motor skills. Already in this early work, the authors pointed to several key problems of ML in robotics – which are still not completely solved today – in terms of encoding, generalization, reproduction of a skill in new situations, evaluation of a reproduction attempt, and role of the user in the learning paradigm. First they introduced the problems related to the segmentation and extraction of EOs, which was resolved in their framework through the user’s support. To leverage this task, they suggested the use of statistical analysis to maintain only the EOs that appeared repeatedly (generalization at a symbolic level).
Then, by observing several sequences of EOs, they pointed to the relevance of extracting a structure from these sequences (generalization at a structured level). In this early work, both generalization processes were supervised by the user, who is asked to account for whether an observation is exampleor task-specific. Muench et al. (1994) admitted that generalizing over a sequence of discrete actions was only one part of the problem since the controller of the robot also required the learning of continuous functions in order to control the actuators. They proposed to overcome the missing parts of the learning process by leveraging them to the user, who took an active role in it. They highlighted the importance of providing a set of examples that are usable by the robot: (1) by constraining the demonstrations to modalities that the robot can understand; and (2) by providing a sufficient number of examples to achieve a desired generality. They noted the importance of providing an adaptive controller to reproduce the task in new situations, that is, how to adjust an already acquired program.
The evaluation of a reproduction attempt was also leveraged to the user by letting him/her provide additional examples of the skill in the regions of the learning space that had not yet been covered. In this way, the teacher/expert could control the generalization capabilities of the robot. Even if research in PbD was turned toward a fully autonomous learning system, early results quickly showed the importance and advantages of using an interaction process involving the user and the robot to cope with the deficiencies of the current learning systems. This extraction of the dependencies at a symbolic level was also referred to as a functional induction problem in Dufay and Latombe (1984), still explored by many researchers today.
To summarize, early work in PbD principally adopted a user-guided generalization strategy by querying the user to provide additional sources of information for the induction process. The approach presented in this book consists in leveraging this process (or at least modifying the user’s role to avoid a burden of queries from the robot) by applying and combining several probabilistic ML algorithms initially developed for large datasets to a PbD framework where the training dataset is limited to the few demonstrations provided to the robot.