Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Advances in Methods and Practices in Psychological Science
…
18 pages
1 file
Containers have become increasingly popular in computing and software engineering and are gaining traction in scientific research. They allow packaging up all code and dependencies to ensure that analyses run reliably across a range of operating systems and software versions. Despite being a crucial component for reproducible science, containerization has yet to become mainstream in psychology. In this tutorial, we describe the logic behind containers, what they are, and the practical problems they can solve. We walk the reader through the implementation of containerization within a research workflow with examples using Docker and R. Specifically, we describe how to use existing containers, build personalized containers, and share containers alongside publications. We provide a worked example that includes all steps required to set up a container for a research project and can easily be adapted and extended. We conclude with a discussion of the possibilities afforded by the large-sc...
Meta-Psychology, 2019
Scientific progress relies on the replication and reuse of research. Recent studies suggest, however, that sharing code and data does not suffice for computational reproducibility —defined as the ability of researchers to reproduce “par- ticular analysis outcomes from the same data set using the same code and software” (Fidler and Wilcox, 2018). To date, creating long-term computationally reproducible code has been technically challenging and time-consuming. This tutorial introduces Code Ocean, a cloud-based computational reproducibility platform that attempts to solve these problems. It does this by adapting software engineering tools, such as Docker, for easier use by scientists and scientific audiences. In this article, we first outline arguments for the importance of computational reproducibility, as well as some reasons why this is a nontrivial problem for researchers. We then provide a step-by-step guide to getting started with containers in research using Code Ocean. (Disclai...
2018 IEEE International Conference on Electro/Information Technology (EIT), 2018
Numerical reproducibility has received increased emphasis in the scientific community. One reason that makes scientific research difficult to repeat is that different computing platforms calculate mathematical operations differently. Software containers have been shown to improve reproducibility in some instances and provide a convenient way to deploy applications in a variety of computing environments. However, there are software patterns or idioms that produce inconsistent results because mathematical operations are performed in different orders in different environments resulting in reproducibility errors. The performance of software in containers and the performance of software that improves numeric reproducibility may be of concern for some scientists. An existing algorithm for reproducible sum reduction was implemented, the runtime performance of this implementation was found to be between 0.3x and 0.5x the speed of the non-reproducible sum reduction. Finally, to evaluate the impact of using a container on performance, the runtime performance of the WRF (Weather Research Forecasting) package was tested and found to be 0.98x of the performance in a native Linux environment.
2018
NOTE: Accepted in principle at Meta-Psychology, submission number MP2018.892, link: https://osf.io/ps5ru/. Anyone can participate in peer review by sending the editor an email, or through discussion on social media. The preferred way of open commenting, however, is to use the hypothes.is integration at PsyArXiv and directly comment on this preprint. Editor: Rickard Carlsson, [email protected]: https://open.lnu.se/index.php/metapsychology ABSTRACT: Scientific progress relies on the replication and reuse of research. However, despite an emerging culture of sharing code and data in psychology, the research practices needed to achieve computational reproducibility -- the quality of a research project entailing the provision of sufficient code, data and documentation to allow an independent researcher to re-obtain the project's results -- are not widely adopted. Historically, the ability to share and reuse computationally reproducible research was technically challenging...
PloS one, 2016
Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Lin...
Journal of Neurology Research Review & Reports, 2023
Reproducibility is a key component of scientific research, and its significance has been increasingly recognized in the field of Neuroscience. This paper explores the origin, need, and benefits of reproducibility in Neuroscience research, as well as the current landscape surrounding this practice, and further adds how boundaries of current reproducibility should be expanded to computing infrastructure. The reproducibility movement stems from concerns about the credibility and reliability of scientific findings in various disciplines, including Neuroscience. The need for reproducibility arises from the importance of building a robust knowledge base and ensuring the reliability of research findings. Reproducible studies enable independent verification, reduce the dissemination of false or misleading results, and foster trust and integrity within the scientific community. Collaborative efforts and knowledge sharing are facilitated, leading to accelerated scientific progress and the translation of research into practical applications. On the data front, we have platforms such as openneuro for open data sharing, on the analysis front we have containerized processing pipelines published in public repos which are reusable. There are also platforms such as openneuro, NeuroCAAS, brainlife etc which caters to the need for a computing platform. However, along with benefits these platforms have limitations as only set types of processing pipelines can be run on the data. Also, in the world of data integrity and governance, it may not be far in the future that some countries may require to process the data within the boundaries limiting the usage of the platform. To introduce customized, scalable neuroscience research, alongside open data, containerized analysis open to all, we need a way to deploy cloud infrastructure required for the analysis with templates. These templates are a blueprint of infrastructure required for reproducible research/analysis in a form of code. This will empower anyone to deploy computational infrastructure on cloud and use data processing pipeline on their own infrastructure of their choice and magnitude. Just as docker files are created for any analysis software developed, an IAC template accompanied with any published analysis pipeline, will enable users to deploy infrastructure on cloud required to carry out analysis on their data.
International Journal of Electrical and Computer Engineering (IJECE), 2018
Cloud-based research collaboration platforms render scalable, secure and inventive environments that enabled academic and scientific researchers to share research data, applications and provide access to high-performance computing resources. Dynamic allocation of resources according to the unpredictable needs of applications used by researchers is a key challenge in collaborative research environments. We propose the design of Cloud Container based Collaborative Research (CCCORE) framework to address dynamic resource provisioning according to the variable workload of compute and data-intensive applications or analysis tools used by researchers. Our proposed approach relies on-demand, customized containerization and comprehensive assessment of resource requirements to achieve optimal resource allocation in a dynamic collaborative research environment. We propose algorithms for dynamic resource allocation problem in a collaborative research environment, which aim to minimize finish time, improve throughput and achieve optimal resource utilization by employing the underutilized residual resources. 1. INTRODUCTION In mid-1990's various grid-based cyberinfrastructures or e-infrastructures were constituted that integrated high-speed research networks and middleware services and endorsed researchers for collaborative sharing of distributed resources. These firmly unified science gateways served as resource providers for specialized as well as generic research initiatives [1]. However, restricted interface to the data, domain-specific nature of science gateways did not match the requirement of the researchers outside those domains [2]. With the advent of cloud computing, easy reconfigurable and adaptive Virtual private research environments and science clouds became a preferred alternative to a traditional grid or cluster-based e-infrastructures. Cloud-based collaborative research platforms provide the researchers with computing, storage resources required to run their applications, and they can collaborate to share data and application, while he concentrates on his area of research. Cloud platform offers compute environment with the huge set of computing resources much bigger than what an individual research organization can afford. Organizations can scale up, scale down the resources, and pay for it according to the usage. Multitenancy provided by cloud architecture enabled the creation of domain and requirement specific virtual private research environments that expedited researchers for collaboration and sharing of the resources [3]. Several science clouds such as Nectar Research cloud [4] provides the infrastructure to run compute-intensive scientific applications [5], [6]. Even though a substantial amount of research work has been carried out with regard to cloud-based collaborative research platforms, ample work does not exist in view of dynamic resource allocation in collaborative research cloud frameworks.
EPJ Web of Conferences
The revalidation, reinterpretation and reuse of research data analyses requires having access to the original computing environment, the experimental datasets, the analysis software, and the computational workflow steps which were used by researchers to produce the original scientific results in the first place. REANA (Reusable Analyses) is a nascent platform enabling researchers to structure their research data analyses in view of enabling future reuse. The analysis is described by means of a YAML file that captures sufficient information about the analysis assets, parameters and processes. The REANA platform consists of a set of micro-services allowing to launch and monitor container-based computational workflow jobs on the cloud. The REANA user interface and the command-line client enables researchers to easily rerun analysis workflows with new input parameters. The REANA platform aims at supporting several container technologies (Docker), workflow engines (CWL, Yadage), shared s...
2016
Container technologies such as Docker [1] are transforming the way distributed systems are deployed onto cloud platforms by providing a simple mechanism for packaging and isolating an application and its dependencies from the host machine on which it is running. The same ideas and technologies can be applied to computational science applications to obtain exceptional ease of installation and reproducibility of results. In this paper, we introduce endofday [2], a workflow engine that orchestrates a directed acyclic graph (DAG) of computational science apps where the nodes of the DAG are Docker containers. The endofday engine enables users to execute entire workflows of science applications without actually installing any of the applications themselves. As an example, we present the Validate [3] system, a suite of software applications for testing the accuracy and precision of Genome Wide Association methods, and illustrate how it can be run using endofday with zero installation. We a...
2023
Containers are becoming essential to support the diversity of scientific computing workloads at academic computing centers. Here, we offer perspectives and experiences from the Texas Advanced Computing Center on: the installation, configuration, and support of select containerization platforms; incorporation of containers into the module system to improve their discoverability and usability; facilitation of advanced use cases including MPI containers, GPU containers, and support for multiple instruction set architectures; and finally instruction on best practices to end users through workshops and university courses. We will briefly discuss case studies that highlight the importance of supporting containers for research computing.
Practice and Experience in Advanced Research Computing
In the last few years, the web-based interactive computational environment called Jupyter notebook has been gaining more and more popularity as a platform for collaborative research and data analysis, becoming a de-facto standard among researchers. In this paper we present a first implementation of Sophon, an extensible web platform for collaborative research based on JupyterLab. Our aim is to extend the functionality of JupyterLab and improve its usability by integrating it with Django. In the Sophon project, we integrate the deployment of dockerized JupyterLab instances into a Django web server, creating an extensible, versatile and secure environment, while also being easy to use for researchers of different disciplines. CCS CONCEPTS • Human-centered computing → Collaborative and social computing systems and tools; Synchronous editors; • Software and its engineering → Collaboration in software development; • Applied computing → Education.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
arXiv (Cornell University), 2020
2020
Containerizing in Software Engineering with Docker, 2023
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019
Journal of Open Source Software, 2020
IAEME PUBLICATION, 2024
33rd International Conference on Scientific and Statistical Database Management, 2021
Practice and Experience in Advanced Research Computing
Lecture Notes in Computer Science, 2019
2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN), 2012
PLOS Computational Biology
Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)