In the recent years peer-to-peer architectures have emerged as a novel architectural paradigm for managing distributed information at a large scale. In peer-to-peer architectures central points of coordination are avoided and replaced by self-organization principles, thus making the systems robust and scalable. This approach has been adopted by a number of highly successful applications on the Internet such as peer-to-peer content sharing and Internet telephony. With the growing success of Semantic Web technologies a natural need emerged for managing also structured data in peer-to-peer settings, resulting in the new research area of peer-to-peer data management systems.
Peer-to-peer data management systems can be characterized as networks of structured data collections which are related by schema mappings for overcoming semantic heterogeneity. This suggests a novel approach for addressing semantic heterogeneity. Instead of using globally integrated schemas or ontologies, as done in traditional data integration systems, semantic mismatches are resolved only locally through the available schema mappings. Thus peers form semantic overlay networks of schema mappings.
This book is devoted to the exploration of challenges and solutions in pursuing this approach in a novel and very original approach. Instead of assuming that local mappings provided by autonomous peers are semantically correct and consistent, inconsistencies and errors are admitted. This opens up a completely new perspective: rather than exploiting mappings for only accessing heterogeneous data stored at different peers and optimizing this process, the question becomes one of first validating the consistency of mappings and reducing inconsistencies where possible. Since this task is performed in a completely distributed environment, one can view this process as a mechanism where peers can converge in a self-organized and decentralized process towards the best possible semantic agreements they can globally achieve. Such a semantic agreement is called an Emergent Semantics. Emergent Semantics can be viewed as a new approach to deal with data heterogeneity at a very large scale. But there is more to it. It gives a sense on how in principle mutual understanding among autonomous agents can be established without a priori knowledge, in other words of how meaningful communication can be bootstrapped.
In this very comprehensive book the author elaborates all the relevant aspects, covering the conceptual framework, the algorithmic solutions, structural aspects of emergent semantic overlay networks, implementation in a peer-to-peer architecture, and applications. While reading this book one obtains an impression of the rich set of new methodological frameworks that are introduced into the area of data management as a result of considering large scale distribution and autonomous collaboration as they are typical for peer-to-peer environments. These tools include distributed probabilistic reasoning techniques, graph- theoretic tools and concepts from information theory. But the work described in this book does not stop in theoretical investigations. By describing the implementation of the ideas within a concrete system, GridVine, we can be reassured that the proposed solutions are technically feasible and in fact can result in performant systems.
The range of applications of these techniques is rather wide: today they include popular Web applications such as image sharing as well as scientific data management as described in the later chapters. As also large companies increasingly experience data integration problems at a large scale, with hundreds and thousands of in-house databases and other information sources, it is just a matter of time that these techniques will also enter the domain of business applications.
This book provides a first and holistic exploration of the novel concept of Emergent Semantics. This idea has the potential to be at the origin of many subsequent research efforts on how to deal with semantic interoperability by taking a decentralized and self-organized approach. For me there is little doubt that such an approach will become more and more relevant in the years to come to handle the increasingly complex information management problems in a networked world.
Philippe Cudré-Mauroux received his degree in Communication Systems from the EPFL (Swiss Federal Institute of Technology, Lausanne) and a M.Sc. degree from the Eurecom Institute, before earning a Ph.D. at the EPFL in the Distributed Information Systems Lab in 2006.