Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
8 pages
1 file
In distributed systems, databases migration is not an easy task. Companies will encounter challenges moving data including legacy data to the big data platform. This paper shows how to optimize migration from traditional databases to the big data platform, evaluate and estimate the cost and efficiency of migration tools.
International Journal of Engineering Technology and Management Sciences
Cloud acts as a data storage and also used for data transfer from one cloud to other. Here data exchange takes place among cloud centers of organizations. At each cloud center huge amount of data was stored, which interns hard to store and retrieve information from it. While migrating the data there are some issues like low data transfer rate, end to end latency issues and data storage issues will occur. As data was distributed among so many cloud centers from single source, will reduces the speed of migration. In distributed cloud computing it is very difficult to transfer the data fast and securely. This paper explores MapReduce within the distributed cloud architecture where MapReduce assists at each cloud. It strengthens the data migration process with the help of HDFS. Compared to existing cloud migration approach the proposed approach gives accurate results interns of speed, time and efficiency.
In recent years, the issue of large amount of growing data has gained a lot of attention. Big Data is defined as data that is too big to fit on a single server and too unstructured to fit into a traditional row-and-column database, or too continuously flowing to fit into a static data warehouse. This data is providing huge opportunities to uncover new aspects. Volume, Velocity and Veracity are three major characteristics which are used to define Big Data. Hadoop is a widely adopted open source tool which implements the Google's famous computation model, MapReduce. It is a batch processing Java based programming model which can process large amount of data sets in a distributed environment. Hadoop consist of two major components Hadoop Distributed File System (HDFS) and processing unit called YARN. Hadoop Distributed File System (HDFS) is a distributed file system to store large amount of data on cluster, and Yet another Resource Negotiator (YARN) provides distributed processing of data on cluster. In this paper, we have studied journey of open-source software framework called Hadoop. Being an open-source project Hadoop has evolved tremendously over the years. Every version has improved the capabilities of the platform to help users to solve big data challenges.
2019
More than 10,000 enterprises worldwide use the big data stack composed of multiple distributed systems. At Unravel, we build the next-generation APM platform for the big data stack, and we have worked with a representative sample of these enterprises that covers most industry verticals. This sample covers the spectrum of choices for deploying the big data stack across on-premises datacenters, private and public cloud deployments, and hybrid combinations of these. In this paper, we present a solution for assisting enterprises planning the migration of their big data stacks from on-premises deployments to the cloud. Our solution is goal driven and adapts to various migration scenarios. We present the system architecture we built and several cloud mapping options. We also describe a demonstration script that involves practical, real-world use-cases of the path to cloud adoption.
Data migration is a critical process in the evolution of enterprise systems, particularly when transitioning from traditional relational databases like SQL Oracle to NoSQL solutions like MongoDB. This paper explores comprehensive strategies for successful data migration in large-scale environments, emphasizing meticulous planning, efficient execution, and thorough post-migration verification. Drawing on practical experiences, this study provides actionable insights and best practices to mitigate risks and ensure data integrity during migration, with a focus on the limitations and challenges of migrating transactional data to NoSQL databases [1][2][3]. Keywords Seamless Data Migration, SQL, NoSQL Introduction Data migration is an essential aspect of maintaining and evolving large-scale enterprise systems. As organizations strive to adopt more scalable, flexible, and efficient data management solutions, transitioning from traditional relational databases like SQL Oracle to modern NoSQL databases like MongoDB becomes increasingly common. This migration is driven by the need for better performance, higher availability, and enhanced scalability. However, the process is fraught with challenges, including data loss, extended downtime, and integration issues. One critical consideration is the handling of transactional data, as NoSQL databases have limitations that can impact functionality. Planning Phase A. Assessing the Current Data Landscape Before initiating a data migration project, it is crucial to understand the existing data environment thoroughly. This includes assessing the volume, variety, and velocity of data and identifying data dependencies and relationships. A detailed data inventory and analysis help create a comprehensive migration plan that addresses potential risks and challenges. [1]. Database Schema Analysis: Analyze the SQL Oracle schema to understand tables, relationships, indexes, and constraints. [2]. Data Volume and Growth: Assess the current data volume and expected growth to plan infrastructure needs in MongoDB. [3]. Data Dependencies: Identify dependencies, including stored procedures, triggers, and application-level dependencies that need re-implementation in MongoDB. B. Defining Migration Goals and Scope Clear objectives and a well-defined scope are essential for a successful migration. Goals should align with the organization's broader strategic vision, such as improving data accessibility, enhancing performance, or reducing
Software updates often involve data migration, especially when converting legacy software implemented in outdated Relational Database Management Systems (RDBMS) or other non-relational database file formats. Moreover, many software applications rely on data migration to import data from a variety of formats. Usually, database migrations are time consuming and error-prone. There are quite a few independent general database conversion products in the market. But what features does such a tool need to provide in order to be successful? Based on our recent experience designing and implementing a custom utility to convert a large number of legacy databases and files in different formats in a project for the United States government, we developed five benchmarks to help practitioners and software development project managers make informed decisions for data conversion, help software developers assess design and implementation considerations for future Database Migration Tool (DMT) products, and help database administrators evaluate a general DMT.
Journal of Software Engineering (JSE), 2024
The migration of legacy data warehouses like Teradata to cloud-native platforms such as Google BigQuery is a transformative step toward modernizing data management and analytics. This paper presents a structured framework to address technical, operational, and strategic challenges, including schema incompatibilities, query translation, and compliance requirements. Leveraging tools like Google’s Schema Conversion Tool, Pub/Sub, and Dataflow, the framework ensures seamless schema transformation, data migration, and query optimization. Emphasis is placed on data security, regulatory compliance, and cost-efficiency. By aligning technical goals with business objectives and fostering user adoption, this guide equips organizations to unlock the full potential of scalable and real-time cloud analytics. Actionable insights and real-world use cases provide a definitive roadmap for successful migrations
Cognitive science and technology, 2023
Data is increasing exponentially in the modern world which requires more proficiency from the available technologies of data storage and data processing. This continuous growth in the amount of structured, semi-structured, and unstructured data is called as big data. The storage and processing of big data through traditional relational database systems are not possible due to increased complexity and volume. Due to improved expertise of big data solutions in handling data, such as NoSQL caused the developers in the previous decade to start preferring big data databases, such as Apache Cassandra, MongoDB, and NoSQL. NoSQL is a modern database technology designed for fast read and write operations and provides horizontal scalability to store large amount of voluminous data. Large organizations face various challenges to shift their relational database framework to NoSQL database framework. In this paper, we proposed an approach to migrate the data from a relational database to the NoSQL database. We have specifically done transformation for Cassandra and MongoDB from MySQL database. The experiments show that the proposed approach successfully transforms the relational database to a big data database, and the performance analysis of such transformed databases shows that Cassandra database requires less storage space and offers a better performance.
An important development in information technology, cloud computing allows users to share Internetbased access to pre-con Figured systems and services. While there are many benefits, such cost efficiency and scalability, security is still a big worry for everyone involved. The current practices in authentication have been found to be wanting in providing for the principles of CIA triad; confidentiality, integrity and availability. Data transfer to the cloud is also known as data migration, which takes data from on-premises databases together with other cloud services and which is normally associated with many problems such as data integrity and minimize down time. Additional barriers stem from the continuously maturing cloud environments and different levels of compatibility with the given database structures. This paper focuses on the processes that are involved in data migration and different catalogs of migration including, database migration, data center migration, application migration, business process migration and so on, stressing the significance of planning and implementing these migrations efficiently. The main issues that demand shifting to the cloud are outlined as well as the main approaches that large cloud suppliers such as AWS, M icrosoft Azure, and Google Cloud offer. Additionally, potential risks and challenges, such as vendor selection, security concerns, and resource management, are explored. This comprehensive overview highlights the significance of strategic planning and vendor solutions in ensuring successful cloud data migration, while addressing the inherent risks associated with transitioning to cloud-based infrastructures.
United International Journal for Research & Technology (UIJRT), 2021
With the current trend where organizations are moving towards cloud services and hybrid cloud technologies, the objective of this study is to develop a seamless data pipeline to perform data integration as part of platform migration, i.e. from data centers to cloud architecture. The proposed methodology is to implement these jobs by employing the Extract-Transform-Load (ETL) procedures to develop interfaces in Talend Open Studio, viz., a data integration tool. First, the data is extracted from multiple sources, such as, databases and flat files. Then, multiple transformations such as filtering, sorting and joining are done on the data. Finally, the transformed data is loaded into the staging tables of the Enterprise Data Warehouse. This is achieved by migrating the interfaces from the tool currently in use, IBM Infosphere DataStage, to re-create the functionalities. The comparison between the features of the two tools, Talend and DataStage, resulted in the identification of the pros and cons of each tool. It was inferred that Talend is equivalent to DataStage in most of the cases but with enhancements and tweaks in Talend, the execution time of few interfaces were reduced by half.
—Data migration is one of the vital tasks of Data integration process. It is always assumed to be most tedious as there will never be a systematic defined procedure. Each migration process is to be treated as unique as the input data sets will be different and the output format required is always unique based on the services provided as well as the user and data handler requirements. In the recent years data migration became the most vital process in various departments of public and private services due to technological advancements and big data handling requirements caused by the increase in acquired data volume. This paper discusses about data migration requirement, data migration strategy finalization and various stages of data migration process discussion of each stage and why complete automation of data migration is not feasible etc.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Empirical Software Engineering, 2017
Symmetry
Mediterranean Journal of Social Sciences, 2018
International Journal of Computer Trends and Technology, 2017
IAEME PUBLICATION, 2024
Proceedings of the 17th …, 1997
International Journal of Research in Engineering and Technology, 2014
International Scientific Journal of Engineering and Management(ISJEM), 2019
International Journal of Advanced Trends in Computer Science and Engineering, 2019