论文部分内容阅读
Received: October 22, 2011 / Accepted: December 26, 2011 / Published: January 20, 2012.
Abstract: Retrieving relevant geospatial data has become increasingly critical because of the growing volume of geospatial data made available to various users through distributed environment. In this context, semantics of geospatial data is also critical, because it allows the user to understand the meaning of shared data, and the system to automatically identify and resolve semantic heterogeneity of data. However, geospatial data often lack explicit semantics, which can lead to low performance of search engines, and misinterpretation or misuse of retrieved data. In particular, the complexity of geospatial data increases the importance of explicit semantics; we have identified a lack of semantics with respect to contexts of concepts, spatiotemporal semantics, and dependencies between concepts’ features. A solution to poor semantics of geospatial data is semantic enrichment. In this paper, we propose an approach to geospatial data retrieval based on enrichment of geospatial data semantics, which contributes to solving the identified retrieval problems caused by lack of semantics. The proposed approach is based on a semantically augmented representation of the concept. A semantic enrichment system generates enriched concepts with semantic reasoning engines and data mining techniques. Then, a semantic mapping system determines the semantic correspondences between the users’ queries and the enriched concepts of databases’ ontologies. More specifically, this retrieval system is able to compute context-dependent semantic mappings; to consider spatiotemporal semantics when comparing spatiotemporal features of concepts; and to use dependencies between features to identify“missing mappings” that could not be detected otherwise. As a result, and as illustrated in a case study, the identification of relevant data sets by the retrieval system is improved, and the system is able to point out semantic heterogeneity problems that could lead to misinterpretation of data.
Key words: Data mining, geospatial data retrieval, geospatial semantics, knowledge extraction, semantic enrichment.
databases are meant for different purposes and developed independently; therefore, the same reality is abstracted differently. To resolve semantic heterogeneity, this meaning should be available to machines into an explicit representation, so that it can automatically be processed. However, geospatial data often lack explicit semantics [10], which can lead to low performance of search engines, and misinterpretation or misuse of retrieved data. A popular solution to this problem is the development of ontologies, which are explicit and formal specifications of shared conceptualizations [11]. An ontology provides a vocabulary to describe a domain of interest (universe of discourse) and a specification of the meaning of terms being used in that vocabulary[12]. Ontologies are used to provide a formal
2. Running Example to Demonstrate the Problems Caused by Poor Semantics
Geospatial data retrieval aims at finding relevant geospatial data sets over distributed and heterogeneous data sources. The main challenges related to geospatial data retrieval are the representation of data semantics, and optimizing the matching of the user’s query with the data semantics. We first give an overview of representative approaches in geospatial data retrieval.
The Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) approach of V?gele et al. [1] proposes an information broker middleware. In this approach, each source’s semantics is formalized with a Description Logics (DL) ontology. Each source’s ontology is developed using a common vocabulary defined in a global ontology. The user can select the query concept from one of the ontologies or specify a query with necessary conditions (in term of properties and range of properties). The RACER and FaCT reasoning engines are used to retrieve the concepts that are subsumed by the query concept. While the global ontology makes the different source’s ontologies comparable to each other, assuming that local ontologies can be developed from a global ontology is not always feasible in an open and dynamic environment, where sources are developed independently.
Lutz and Klien [2] proposed a similar approach for the discovery and the retrieval of geographic information in spatial data infrastructures (SDIs). Their approach is also based on annotations of geographic feature types with DL concepts. The DL
of knowledge are not limited to some category, meaning that they can be linguistic knowledge, thesauruses, the web, instances of concepts, documents, metadata, etc..
Within the larger domain of information systems, several approaches for semantic enrichment in support of semantic mapping and semantic interoperability have been proposed. As reported by Su [18], existing semantic enrichment techniques for semantic mapping use a variety of resources, including shared thesaurus, such as WordNet, linguistic knowledge, and extensional knowledge (instances of concepts of ontology). For example, Tun [21] makes the assumption that “the more explicit semantics is specified in ontologies, the feasibility of matching will be greater”. In his approach, the semantics of concepts are enriched by adding concept-level knowledge, which is called “meta-knowledge”, according to a“MetaOntoModel”. The enrichment technique is integrated into a multi-system ontology matching architecture; the enrichment process is user-driven(user has to provide knowledge). In another semantic enrichment approach for ontology mapping, Su [18] used instances of the ontology to enrich the original ontology. In this case, instances correspond to documents that are associated with the concept; for each concept, a feature vector composed of the terms extracted from these documents is built using some Natural Language Processing (NLP) techniques. The lexical database WordNet plays the role of a global ontology that provides synonyms. The architecture of the approach is composed of a text categorizer, a feature vector constructor, a mapper and a mapping refiner. One of the limitations of feature vectors is that they are only unstructured sets or words; therefore, we cannot distinguish if one of those words corresponds, for example, to a role of the concept, a spatial relation, the description of a localization, etc.. In other words, there is still a lack of knowledge on the nature of those words and how they should be considered when comparing two concepts.
performance of geospatial data retrieval engines. Despite the widespread use of ontologies to formalize the semantics of geospatial data, it cannot be assumed, as demonstrated in this paper, that the provided semantics are sufficient to ensure understanding of shared data by users and to resolve more complex semantic heterogeneity problems, such as heterogeneity of spatiotemporal semantics. We have proposed a new approach and architecture for geospatial data retrieval which addresses the specific problems caused by the lack of geospatial data semantics.
The multi-view augmented concept model (MVAC) was used as a basis for the approach. This model is an enrichment of the traditional, property-based representation of concepts, to which it adds contexts, and associated contextual views of the concept, spatiotemporal semantics, as well as dependencies between features of concepts. These additional features were useful to improve geospatial data retrieval by identifying more semantic mappings between the query and the concepts of databases’
Abstract: Retrieving relevant geospatial data has become increasingly critical because of the growing volume of geospatial data made available to various users through distributed environment. In this context, semantics of geospatial data is also critical, because it allows the user to understand the meaning of shared data, and the system to automatically identify and resolve semantic heterogeneity of data. However, geospatial data often lack explicit semantics, which can lead to low performance of search engines, and misinterpretation or misuse of retrieved data. In particular, the complexity of geospatial data increases the importance of explicit semantics; we have identified a lack of semantics with respect to contexts of concepts, spatiotemporal semantics, and dependencies between concepts’ features. A solution to poor semantics of geospatial data is semantic enrichment. In this paper, we propose an approach to geospatial data retrieval based on enrichment of geospatial data semantics, which contributes to solving the identified retrieval problems caused by lack of semantics. The proposed approach is based on a semantically augmented representation of the concept. A semantic enrichment system generates enriched concepts with semantic reasoning engines and data mining techniques. Then, a semantic mapping system determines the semantic correspondences between the users’ queries and the enriched concepts of databases’ ontologies. More specifically, this retrieval system is able to compute context-dependent semantic mappings; to consider spatiotemporal semantics when comparing spatiotemporal features of concepts; and to use dependencies between features to identify“missing mappings” that could not be detected otherwise. As a result, and as illustrated in a case study, the identification of relevant data sets by the retrieval system is improved, and the system is able to point out semantic heterogeneity problems that could lead to misinterpretation of data.
Key words: Data mining, geospatial data retrieval, geospatial semantics, knowledge extraction, semantic enrichment.
databases are meant for different purposes and developed independently; therefore, the same reality is abstracted differently. To resolve semantic heterogeneity, this meaning should be available to machines into an explicit representation, so that it can automatically be processed. However, geospatial data often lack explicit semantics [10], which can lead to low performance of search engines, and misinterpretation or misuse of retrieved data. A popular solution to this problem is the development of ontologies, which are explicit and formal specifications of shared conceptualizations [11]. An ontology provides a vocabulary to describe a domain of interest (universe of discourse) and a specification of the meaning of terms being used in that vocabulary[12]. Ontologies are used to provide a formal
2. Running Example to Demonstrate the Problems Caused by Poor Semantics
Geospatial data retrieval aims at finding relevant geospatial data sets over distributed and heterogeneous data sources. The main challenges related to geospatial data retrieval are the representation of data semantics, and optimizing the matching of the user’s query with the data semantics. We first give an overview of representative approaches in geospatial data retrieval.
The Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) approach of V?gele et al. [1] proposes an information broker middleware. In this approach, each source’s semantics is formalized with a Description Logics (DL) ontology. Each source’s ontology is developed using a common vocabulary defined in a global ontology. The user can select the query concept from one of the ontologies or specify a query with necessary conditions (in term of properties and range of properties). The RACER and FaCT reasoning engines are used to retrieve the concepts that are subsumed by the query concept. While the global ontology makes the different source’s ontologies comparable to each other, assuming that local ontologies can be developed from a global ontology is not always feasible in an open and dynamic environment, where sources are developed independently.
Lutz and Klien [2] proposed a similar approach for the discovery and the retrieval of geographic information in spatial data infrastructures (SDIs). Their approach is also based on annotations of geographic feature types with DL concepts. The DL
of knowledge are not limited to some category, meaning that they can be linguistic knowledge, thesauruses, the web, instances of concepts, documents, metadata, etc..
Within the larger domain of information systems, several approaches for semantic enrichment in support of semantic mapping and semantic interoperability have been proposed. As reported by Su [18], existing semantic enrichment techniques for semantic mapping use a variety of resources, including shared thesaurus, such as WordNet, linguistic knowledge, and extensional knowledge (instances of concepts of ontology). For example, Tun [21] makes the assumption that “the more explicit semantics is specified in ontologies, the feasibility of matching will be greater”. In his approach, the semantics of concepts are enriched by adding concept-level knowledge, which is called “meta-knowledge”, according to a“MetaOntoModel”. The enrichment technique is integrated into a multi-system ontology matching architecture; the enrichment process is user-driven(user has to provide knowledge). In another semantic enrichment approach for ontology mapping, Su [18] used instances of the ontology to enrich the original ontology. In this case, instances correspond to documents that are associated with the concept; for each concept, a feature vector composed of the terms extracted from these documents is built using some Natural Language Processing (NLP) techniques. The lexical database WordNet plays the role of a global ontology that provides synonyms. The architecture of the approach is composed of a text categorizer, a feature vector constructor, a mapper and a mapping refiner. One of the limitations of feature vectors is that they are only unstructured sets or words; therefore, we cannot distinguish if one of those words corresponds, for example, to a role of the concept, a spatial relation, the description of a localization, etc.. In other words, there is still a lack of knowledge on the nature of those words and how they should be considered when comparing two concepts.
performance of geospatial data retrieval engines. Despite the widespread use of ontologies to formalize the semantics of geospatial data, it cannot be assumed, as demonstrated in this paper, that the provided semantics are sufficient to ensure understanding of shared data by users and to resolve more complex semantic heterogeneity problems, such as heterogeneity of spatiotemporal semantics. We have proposed a new approach and architecture for geospatial data retrieval which addresses the specific problems caused by the lack of geospatial data semantics.
The multi-view augmented concept model (MVAC) was used as a basis for the approach. This model is an enrichment of the traditional, property-based representation of concepts, to which it adds contexts, and associated contextual views of the concept, spatiotemporal semantics, as well as dependencies between features of concepts. These additional features were useful to improve geospatial data retrieval by identifying more semantic mappings between the query and the concepts of databases’