Schema Mapping

The approaches which are used by P2P systems for defining and creating the mappings between peers’ schemas can be classified as follows: 1- Pairwise schema mapping, 2-mapping based on machine learning techniques, 3- common agreement mapping, ————————————————- 4-schema mapping using information retrieval (IR) techniques. 1-Pairwise Schema Mapping: In this approach, each user defines the mapping between the local schema and the schema of any other peer that contains data that are of interest.

Relying on the transitivity of the defined mappings, the system tries to extract mappings between schemas that have no defined mapping. Piazza follows this approach : An Example of Pairwise Schema Mapping in Piazza The data are shared as XML documents, and each peer has a schema that defines the terminology and the structural constraints of the peer. When a new peer (with a new schema) joins the system for the first time, it maps its schema to the schema of some other peers in the system. Each mapping definition begins with an XML template that matches some path or subtree of an instance of the target schema.

Elements in the template may be annotated with query expressions that bind variables to XML nodes in the source. Active XML [Abiteboul et al. , 2002, 2008b] also relies on XML documents for data sharing. The main innovation is that XML documents are active in the sense that they can includeWeb service calls. Therefore, data and queries can be seamlessly integrated. another example that follows this approach: The Local Relational Model (LRM): LRM assumes that the peers hold relational databases, and each peer knows a set of peers with which it can exchange data and services. This set of peers is called peer’s acquaintances.

Each peer must define semantic dependencies and translation rules between its data and the data shared by each of its acquaintances. The defined mappings form a semantic network, which is used for query reformulation in the P2P system. Piazza Querying Reformulation Example: Hyperion [Kementsietsidis et al. , 2003]: generalizes this approach to deal with autonomous peers that form acquaintances at run-time, using mapping tables to define value correspondences among heterogeneous databases. Peers perform local querying and update processing, and also propagate queries and updates to their acquainted peers.

Table from Airline ‘A’ Table from Airline ‘B’ Mapping Tables PGrid [Aberer et al. , 2003b]: also assumes the existence of pairwise mappings between peers, initially constructed by skilled experts. Relying on the transitivity of these mappings and using a gossip algorithm, PGrid extracts new mappings that relate the schemas of the peers between which there is no predefined schema mapping. 2-Mapping based on Machine Learning Techniques: This approach is generally used when the shared data are defined based on ontologies and taxonomies as proposed for the semantic web.

It uses machine learning techniques to automatically extract the mappings between the shared schemas. The extracted mappings are stored over the network, in order to be used for processing future queries. * GLUE [Doan et al. , 2003b] uses this approach as the following: Given two ontologies,for each concept in one, GLUE finds the most similar concept in the other. It gives well founded probabilistic definitions to several practical similarity measures, and uses multiple learning strategies, each of which exploits a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve mapping accuracy, GLUE incorporates commonsense knowledge and domain constraints into the schema mapping process. * The basic idea is to provide classifiers for the concepts. To decide the similarity between two concepts A and B, the data of concept B are classified using A’s classifier and vice versa. * The amount of values that can be successfully classified into A and B represent the similarity between A and B. ————————————————- 3- Common Agreement Mapping:

In this approach, the peers that have a common interest agree on a common schema description for data sharing. The common schema is usually prepared and maintained by expert users. APPA [Akbarinia et al. , 2006a; Akbarinia and Martins, 2007] makes the assumption that peers wishing to cooperate. * e. g. , for the duration of an experiment, agree on a Common Schema Description (CSD). * Given a CSD, a peer schema can be specified using views. This is similar to the LAV approach in data integration systems, except that queries at a peer are expressed in terms of the local views, not the CSD.

Another difference between this approach and LAV is that the CSD is not a global schema, i. e. , it is common to a limited set of peers with a common interest (see Figure). Common Agreement Schema Mapping in APPA * Thus, the CSD does not pose scalability challenges. When a peer decides to share data, it needs to map its local schema to the CSD. Example: * Given two CSD relation definitions r1 and r2, an example of peer mapping at peer p is: In this example, the relation r(A;B;D) that is shared by peer p is mapped to relations r1(A;B;C), r2(C;D;E) both of which are involved in the CSD.

In APPA, the mappings between the CSD and each peer’s local schema are stored locally at the peer. Given a query Q on the local schema, the peer reformulates Q to a query on the CSD using locally stored mappings. AutoMed [McBrien and Poulovassilis, 2003]: is another system that relies on common agreements for schema mapping. It defines the mappings by using primitive bidirectional transformations defined in terms of a low-level data model. 4- Schema Mapping using IR Techniques: This approach extracts the schema mappings at query execution time using IR techniques by exploring the schema descriptions provided by users.

PeerDB [Ooiet al. , 2003a] follows this approach for query processing in unstructured P2P networks. * For each relation that is shared by a peer, the description of the relation and its attributes is maintained at that peer. The descriptions are provided by users upon creation of relations, and serve as a kind of synonymous names of relation names and attributes. When a query is issued, a request to find out potential matches is produced and flooded to the peers that return the corresponding metadata.

By matching keywords from the metadata of the relations, PeerDB is able to find relations that are potentially similar to the query relations. The relations that are found are presented to the issuer of the query who decides whether or not to proceed with the execution of the query at the remote peer that owns the relations. Edutella [Nejdl et al. , 2003] also follows this approach for schema mapping in super-peer networks. Resources in Edutella are described using the RDF metadata model, and the descriptions are stored at super-peers.

When a user issues a query at a peer p, the query is sent to p’s super-peer where the stored schema descriptions are explored and the addresses of the relevant peers are returned to the user. If the super-peer does not find relevant peers, it sends the query to other super-peers such that they search relevant peers by exploring their stored schema descriptions. In order to explore stored schemas, super-peers use the RDF-QEL query language, which is based on Datalog semantics and thus compatible with all existing query languages, supporting query functionalities that extend the usual relational query languages.