Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13
…
3 pages
1 file
In this paper, we demonstrate the FusionDB system; an extended relational database engine for managing conflicts in small-science databases. In small sciences, groups-each consists of few scientists-may share and exchange parts of their own databases among each other to foster collaboration. The goal of such sharing, especially when done at early stages of the discovery process, is not to build a warehouse or a unified schema, instead the goal is to compare and verify results, detect and assess conflicts, and possibly modify or redesign the discovery process. FusionDB is designed to meet the requirements and address the challenges of such sharing model. We will demonstrate the key functionalities of FusionDB including: (1) Detecting conflicts using a rule-based model over heterogeneous schemas, (2) Assessing conflicts and providing probabilistic estimates for values' correctness, (3) Extended querying capabilities in the presence of conflicts, and (4) Providing curation operations to help scientists resolve and investigate conflicts according to different priorities. FusionDB is realized on top of PostgreSQL DBMS.
Proceedings of the VLDB Endowment, 2009
In CIDR 2009, we presented a collection of requirements for SciDB, a DBMS that would meet the needs of scientific users. These included a nested-array data model, science-specific operations such as regrid, and support for uncertainty, lineage, and named versions. In this paper, we present an overview of SciDB's key features and outline a demonstration of the first version of SciDB on data and operations from one of our lighthouse users, the Large Synoptic Survey Telescope (LSST).
Communications of The ACM, 2010
Needed are generic, rather than one-off, DBMS solutions automating storage and analysis of data from scientific collaborations.
2010
KNAW Narcis. Back to search results. Publication Emerging database systems in support of scientific data (2010) Open access. Pagina-navigatie: Main. ...
Comput. Sci. Eng., 2013
S c i e n c e D a t a M a n a g e m e n t SQLShare is a Web-based application for collaborative data analysis that emphasizes a simple upload-query-share protocol over conventional database design and ad hoc interactive query over general-purpose programming. Here, a case study examines the use of SQLShare as an alternative to script-based scientific workflows for a project in observational biological oceanography.
1993
Scienti c databases have recently become a challenging research area for a number of reasons: 1) the amount of data stored in scienti c databases is rapidly increasing, with orders of magnitude increases on the horizon, 2) the data are becoming increasing complex, as more complicated data structures and data relationships must be captured, 3) there is a need to integrate incompatible data formats, commercial databases, and analysis tools into a seamless environment, and 4) scienti c databases are becoming distributed, i.e., no single site can archive all the data potentially required to conduct some experiments. Unless these challenges can be met, the scienti c researcher will spend an inordinate amount of time manipulating bits and bytes, instead of focusing on the scienti c problems of most interest. In this paper we discuss these database issues in more depth and then describe the Gaea system, a spatiotemporal database management system under development at Worcester Polytechnic Institute. Gaea's main objectives include: 1) providing DBMS support to all phases of scienti c investigations by management of scienti c data and meta-data, 2) extending database technology with an intrinsic class of operators which is extensible and responds to the growing needs of scienti c research, 3) integrating spatial and temporal domains involving very large amounts of data, 4) allowing a clean extension to a distributed computing environment containing heterogeneous, specialized database and computing resources.
2013
We consider a case study using SQL-as-a-Service to support "instant analysis" of weakly structured relational data at a multi-investigator science retreat. Here, "weakly structured" means tabular, rows-and-columns datasets that share some common context, but that have limited a priori agreement on file formats, relationships, types, schemas, metadata, or semantics. In this case study, the data were acquired from hundreds of distinct locations during a multi-day oceanographic cruise using a variety of physical, biological, and chemical sensors and assays. Months after the cruise when preliminary data processing was complete, 40+ researchers from a variety of disciplines participated in a two-day "data synthesis workshop." At this workshop, two computer scientists used a web-based query-as-a-service platform called SQLShare to perform "SQL stenography": capturing the scientific discussion in real time to integrate data, test hypotheses, and populate visualizations to then inform and enhance further discussion. In this "field test" of our technology and approach, we found that it was not only feasible to support interactive science Q&A with essentially pure SQL, but that we significantly increased the value of the "face time" at the meeting: researchers from different fields were able to validate assumptions and resolve ambiguity about each others' fields. As a result, new science emerged from a meeting that was originally just a planning meeting. In this paper, we describe the details of this experiment, discuss our major findings, and lay out a new research agenda for collaborative science database services.
Many environmental scientists today need to assemble, use, share and save data from a diverse set of sources. These "synthesis" efforts are often interdisciplinary and blend data from ground-based sensors, satellites, field observations, and the literature. At even moderate scales of both data size and diversity, the cost and time required to find, gather, collate, normalize, and customize data in order to build a synthesis dataset can be daunting at best.. By explicitly identifying and addressing the different requirements for each data role (author, curator, data valet, publisher, and consumer), our data management architecture for large-scale shared environmental data enables the creation of such synthesis datasets that continue to grow and evolve with new data, data annotations, participants, and use rules. We show the effectiveness of our approach in the context of the FLUXNET Carbon-Climate Synthesis Dataset, one of the largest ongoing biogeophysical field experiments.
2001
Next-generation problem solving environments (PSEs) promise significant advances over those now available. They will span scientific disciplines and incorporate collaboration capabilities.
2013
Abstract: From 1990s onwards, biological and chemical research in both the public and private sectors throughout the world has been transformed into industrial scale by the creation of databases with large amounts of high-quality, freely available DNA sequence data. These databases have not only enabled the comprehensive cataloging of human genes but have also accelerated the discovery of new forms of cellular regulation rendering biology and chemistry a discovery science thus providing the basis for novel experimental approaches. We however feel that the potential opportunities, accessibility and power of open source science and publicly available data have not transformed into gains and significant impact on scientific discovery. In this paper we have identified many issues with the existing conventional chemical biology and molecular biology databases and propose the development of ChemBank v3.
Proceedings of the eighth ACM international workshop on Web information and data management - WIDM '06, 2006
Increasingly, scientists are seeking to collaborate and share data among themselves. Such sharing is can be readily done by publishing data on the World-Wide Web. Meaningful querying and searching on such data depends upon the availability of accurate and adequate metadata that describes the data and the sources of the data. In this paper, we outline the architecture of an implemented cyber-infrastructure for chemistry that provides tools for users to upload datasets and their metadata to a database. Our proposal combines a two level metadata system with a centralized database repository and analysis tools to create an effective and capable data sharing infrastructure.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2006
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), 2010
PLOS ONE, 2015
Cluster Computing, 2005
Multiscale Modeling & Simulation - MULTISCALE MODEL SIMUL, 2010
2007
Guide to e-Science, 2011
Journal of physics, 2018