Modelling Dataspace Entity Association Using Set Theorems

来源 :Computer Technology and Application | 被引量 : 0次 | 上传用户:jianjian9527
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Abstract: The development of dataspace support systems is far from reality as individuals and enterprises are faced with the huge challenge of data management. Critical to this is the need to provide a model that represents the relationships between the entities collaborating in a dataspace. A dataspace is a new abstraction and target architecture to data management that does not require up-front semantic data integration. This paper models a dataspace using the set theory with entity mappings. A technique for identity resolution and pay-as-you-go data integration is explained. In order to provide a strong degree of assurance, the authors subject the model to certain real world entities that might form part of a global dataspace.
  Key words: Dataspaces, entity collaboration, integration, geo data, data management.
  1. Introduction
  The overall observation supports the fact that large volumes of data are continuously being stored in data repositories around the world [1]. As data is continuously stored in data stores around the world, the need for effective and efficient techniques of data management is growing. Data appear in myriad of forms some in structured sources, e.g., Database Management Systems (DBMS) and some not. There is the demand to provide coherence between these sources. These data sources are becoming a part of a dataspace. Such a new abstraction is described in Ref. [2] as a new abstraction to data integration.
  Despite of traditional (enterprise) databases with a given schema the goal is to manage a rich collection of structured, semi-structured, and unstructured data, spread in more enterprise repositories and on the Web. To control such data space of course does not mean other data integration approach. Data in data space rather coexists; semantic integration is not a necessity here, in order to operate parts of the system. Fig. 1 adopted from Ref. [2] shows a categorization of current solution of data management in two dimensions. Administrative proximity indicates how close various data sources are in terms of administrative control.“Near” means that sources have the same or at least coordinated control. Semantic integration is a measure, how closely the schemas of different data sources match [3].
  A complete dataspace should be a plug and play architecture that is customizable (can be modeled) to the domain of interest. The concept of a domain is often ignored thought very important both in efficiency and clarity.
  A dataspace should contain all of the information relevant to a particular organization regardless of its format and location, and model a rich collection of relationships between data repositories. Hence, the authors model a dataspace as a set of participants and relationships [3].
  The participants in a dataspace are the individual data sources: They can be relational databases, XML repositories, text databases, web services and software packages. They can be stored or streamed (managed locally by data stream systems), or even sensor deployments [3].
  Some participants may support expressive query languages, while others are opaque and offer only limited interfaces for posing queries (e.g., structured files, web services, or other software packages). Participants vary from being very structured (e.g., relational databases) to semi-structured (XML, code collections) to completely unstructured. Some sources will support traditional updates, while others may be append-only (for archiving purposes), and still others may be immutable [4].
  Two of the main services that a Dataspace Support Platform (DSSP) will support are search and query. While DBMSs have excelled at providing support for querying, search has emerged as a primary mechanism for end users to deal with large collections of unfamiliar data. Search has the property that it is more forgiving than query, being based on similarity and providing ranked results to end users, and supporting interactive refinement so that users can explore a data set and incrementally improve their results. A DSSP should enable a user to specify a search query and iteratively refine it, when appropriate, to a database-style query. A key tenet of the dataspaces approach is that search should be applicable to all of the contents of a dataspace, regardless of their formats[4].
  Universal search and query should extend to meta-data as well as data. Users should be able to discover relevant data sources and inquire about their completeness, correctness and freshness. In fact, a DSSP should also be aware of gaps in its coverage of the domain [3]. The paper is organized as follows: Section 2 reviews related literature particularly sets and dataspaces; section 3 introduces dataspace entity relationships; section 4 presents results and discussions; section 5 gives conclusions.
  References
  [1] B. Shibwabo, I. Ateya, Respository integration: The disconnect and way forward through repository virtualization supporting business intelligence, International Journal of Current Research 3 (4) (2011) 015-020.
  [2] M. Franklin, A. Halevy, D. Maier, From databases to dataspaces: A new abstraction for information management, ACM SIGMOD Record 34 (4) (2005) 27-33.
  [3] J. Pokorny, Databases in the 3rd millennium: Trends and research directions, Journal of Systems Integration 1(2010) 3-15.
  [4] M. Franklin, A. Halevy, D. Maier, Principles of dataspace systems, in: Proc. of 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2006), ACM Press, pp. 1-9.
  [5] F. Diego, K. Jónsdóttir, D. Maier, Associative operations on a three-element set, The Montana Mathematics Enthusiast 5 (2&3) (2008) 257-268.
  [6] Available online at: http://www.sgi.com/tech/stl/set.html.
  [7] P. Ziegler, K. Dittrich, Data integration—problems, approaches, and perspectives, in: J. Krogstie, A.L. Opdahl, S. Brinkkemper (Eds.), Conceptual Modelling in Information Systems Engineering, Springer, Berlin Heidelberg, 2007, pp. 39-58.
  [8] S. Stefanov, V. Dragieva, Evolution of sets systems and homotopy groups of spheres, in: Proceedings of the 41st Spring Conference of the Union of Bulgarian Mathematicians, 2012, pp. 202-206.
其他文献
The most important aspect while doingbusiness in China is to learn, understandand adapt to the Chinese way of doingbusiness. Since business behavior is closely relatedto cultural behavior, differences
期刊
More than 100 people diedwhen a plane crashed at Tripoliinternational airport on May 12.The Afriqiyah Airways flight waslanding in the Libyan capital,having taken off from Johannesburg,South Africa. N
期刊
April 10 proved a triumphant night for Nigeria atthe sixth annual African Movie Academy Awards(AMAA). At a theater in Yenagoa, Bayelsa State, 53countries competed for victory in 24 categories, but Nol
期刊
2013春夏妆容发型流行重点公布!
期刊
纵观2013年春夏各大品牌的秀场成衣,在今年春夏又是怎样的趋势呢?有哪些东西是值得关注或者值得尝试的呢?编辑提取六大关键词为你一一揭晓:性感的透视薄纱;马卡龙甜饼色系;棋盘图案黑白搭配;厚绸缎水墨印花;浪漫甜美蕾丝&碎花;原色牛仔。
期刊
a nation of diverse terrainwith forests in thesouth, hills in the west andsavannah plains in the north,the Republic of Côte d’Ivoire isinhabited by more than 60 ethnicgroups. These ethnic groups
期刊
All those who have complainedinterminablyabout all the negativessurrounding the World Cup– enough already. Westernmedia have maintained anegative stream of reportingsince the 2010 WorldCup host was an
期刊
The Republic of Togo  Ambassador: Nolana Ta Anna  Add: No.11, Dongzhimenwai Dajie, Cha-  oyang District, Beijing 100600, China  Tel: +8610-65322202, 65322444  Fax: +8610-65325884
期刊
south Africa’s 2010 World Cup “feel good”factor is addictive. At taxi ranks, streetbazaars and tea-rooms, South Africancitizens everywhere are filled with elation - andpride. Just 16 years ago, within
期刊
Newlyweds Zhang Di and Dai Aijingcouldn’t contain their excitement whenthey took Beijing’s subway line 2 homeas usual, but it had nothing to do with the usualFriday evening escape from work. It was be
期刊