In an organization specifically as virtual as cloud there is need for access control systems to constrain users direct or backhanded action that could lead to breach of security. In cloud, apart from owner access to confidential data the... more
Smart cities aim to provide more digitalized, equitable, sustainable, and liveable cities. In smart cities data evolves as an important asset and citizens data in particular is being used to provide data-driven mobility services.... more
Cloud computing offers a powerful abstraction that provides a scalable, virtualized infrastructure as a service where the complexity of fine-grained resource management is hidden from the end-user. Running data analytics applications in... more
Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as... more
File-sharing semantics is used by the file systems for sharing data among concurrent client processes in a consistent manner. Session semantics is a widely used file-sharing semantics in Distributed File Systems (DFSs). The main... more
Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as... more
Service-based network infrastructure is a new network interface in which the flow of messages is controlled by class of services that generated it. Next is its content, improved shipping address specified by the sender and attached to the... more
Optimizing Map Reduce Scheduling Using Parallel Processing Model On Data Nodes in Hadoop Environment
From the most recent years, there has been a quick progress in cloud, with the growing amounts of associations turning number of associations relying upon use resources in the cloud, there is a requirement for securing the data of various... more
In the last few years an interest in native XML databases has surfaced. With other authors we argue that such databases need their own provisions for concurrency control since traditional methods are inadequate to capture the complicated... more
The main characteristics of five distributed file systems required for big data: A comparative study
These last years, the amount of data generated by information systems has exploded. It is not only the quantities of information that are now estimated in Exabyte, but also the variety of these data which is more and more structurally... more
Modern day systems are facing an avalanche of data, and they are being forced to handle more and more data intensive use cases. These data comes in many forms and shapes: Sensors (RFID, Near Field Communication, Weather Sensors),... more
With the emergence of Cloud Computing, the amount of data generated in different fields such as physics, medical, social networks, etc. is growing exponentially. This increase in the volume of data and their large scale make the problem... more
In this era of developing technologies, one of the most promising is cloud computing that has been functioning since years and used by individuals and large enterprises to provide different kind of services to the world. Cloud computing... more
In the last few years an interest in native XML databases has surfaced. With other authors we argue that such databases need their own provisions for concurrency control since traditional methods are inadequate to capture the complicated... more
Resource management is a key factor in the performance and efficient utilization of cloud systems, and many research works have proposed efficient policies to optimize such systems. However, these policies have traditionally managed the... more
Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as... more
Applications of the future will need to support large numbers of clients and will require scalable storage systems that allow state to be shared reliably. Recent research in distributed file systems provides technology that increases the... more
Applications of the future will need to support large numbers of clients and will require scalable storage systems that allow state to be shared reliably. Recent research in distributed file systems provides technology that increases the... more
This paper addresses the design and implementation of an adaptive document version management scheme. Existing schemes typically assume: (i) a priori expectations for how versions will be manipulated and (ii) fixed priorities between... more
In the rapidly evolving Cloud market, the amount of data being generated is growing continuously and as a consequence storage as a service plays an increasingly important role. In this paper, we describe and compare two new approaches,... more
Active Storage provides an opportunity for reducing the bandwidth requirements between the storage and compute elements of current supercomputing systems, and leveraging the processing power of the storage nodes used by some modern file... more
This paper introduces the Sigma algorithm that solves fault-tolerant mutual exclusion problem in dynamic systems where the set of processes may be large and change dynamically, processes may crash, and the recovery or replacement of... more
Database systems have been designed to manage business critical information and provide this information on request to connected clients, a passive model. Increasingly, applications need to share information actively with clients and/or... more
We present NeST, a flexible software-only storage appliance designed to meet the storage needs of the Grid. NeST has three key features that make it well-suited for deployment in a Grid environment. First, NeST provides a generic data... more
MapReduce-tailored distributed filesystems-such as HDFS for Hadoop MapReduce-and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of... more
Grid Datafarm architecture is designed for facilitating reliable file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains by providing a global virtual file system. Gfarm v2 is an... more
Whereas traditional Desktop Grids rely on centralized servers for data management, some recent progress has been made to enable distributed, large input data, using to peer-to-peer (P2P) protocols and Content Distribution Networks (CDN).... more
This paper addresses the problem of efficiently storing and accessing massive data blocks in a large-scale distributed environment, while providing efficient fine-grain access to data subsets. This issue is crucial in the context of... more
The recent explosion in data sizes manipulated by distributed scientific applications has prompted the need to develop specialized storage systems capable to deal with specific access patterns in a scalable fashion. In this context, a... more
Introspection is the prerequisite of an autonomic behavior, the first step towards a performance improvement and a resource-usage optimization for largescale distributed systems. In grid environments, the task of observing the application... more
As grids become more and more attractive for solving complex problems with high computational and storage requirements, the need for adequate grid programming models is considerable. To this purpose, the GridRPC model has been proposed as... more
In this paper, we show that, although P2P systems and DSM systems have been designed in rather different contexts, both can serve as major sources of inspiration for the design of a hybrid system, with intermediate hypotheses and... more
All over the process of treating data on HPC Systems, parallel file systems play a significant role. With more and more applications, the need for high performance Input-Output is rising. Different possibilities exist: General Parallel... more
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging... more
As more and more large-scale applications need to generate and process very large volumes of data, the need for adequate storage facilities is growing. It becomes crucial to efficiently and reliably store and retrieve large sets of data... more
Efficient namespace metadata management is increasingly important as next-generation file systems are designed for peta and exascales. New schemes have been proposed; however, their evaluation has been insufficient due to a lack of... more
People used to carry their documents about on CDs only a few years ago. Many people have recently turned to memory sticks. Cloud computing, in this case, refers to the capacity to access and edit data stored on remote servers from any... more
People used to carry their documents about on CDs only a few years ago. Many people have recently turned to memory sticks. Cloud computing, in this case, refers to the capacity to access and edit data stored on remote servers from any... more
Cloud-based file synchronization services are a worldwide resource for many millions of users. However, individual services often have tight resource limits, suffer from outages or shutdowns, and sometimes silently corrupt or leak user... more
Workload consolidation, sharing physical resources among multiple workloads, is a promising technique to save cost and energy in cluster computing systems. This paper highlights a few challenges of workload consolidation for Hadoop as one... more
Azure Data Lake Store (ADLS) is a fully-managed, elastic, scalable, and secure file system that supports Hadoop distributed file system (HDFS) and Cosmos semantics. It is specifically designed and optimized for a broad spectrum of Big... more
Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how... more
The data link layer in a layered communication network is designed to ensure reliable data transfer over a noisy physical channel. Formal specifications are given for physical channels and data links, in terms of I/O automata. Based on... more
An important function of communication networks is to implement reliable data transfer over an unreliable underlying network. Formal specifications are given for reliable and unreliable communication layers, in terms of 1/0 automata.... more
In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector... more
In applications ranging from radio telescopes to Internet traffic monitoring, our ability to generate data has outpaced our ability to effectively capture, mine, and manage it. These ultra-high-bandwidth data streams typically contain... more
The significant growth of the Internet of Things (IoT) is revolutionizing the way people live by transforming everyday Internet-enabled objects into an interconnected ecosystem of digital and personal information accessible anytime and... more