Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Journal of Physics: Conference Series
This paper presents a web-based Job Monitoring framework for individual Grid sites that allows users to follow in detail their jobs in quasi-real time. The framework consists of several independent components : (a) a set of sensors that run on the site CE and worker nodes and update a database, (b) a simple yet extensible web services framework and (c) an Ajax powered web interface having a look-and-feel and control similar to a desktop application. The monitoring framework supports LSF, Condor and PBS-like batch systems. This is one of the first monitoring systems where an X.509 authenticated web interface can be seamlessly accessed by both end-users and site administrators. While a site administrator has access to all the possible information, a user can only view the jobs for the Virtual Organizations (VO) he/she is a part of. The monitoring framework design supports several possible deployment scenarios. For a site running a supported batch system, the system may be deployed as a whole, or existing site sensors can be adapted and reused with the web services components. A site may even prefer to build the web server independently and choose to use only the Ajax powered web interface. Finally, the system is being used to monitor a glideinWMS instance. This broadens the scope significantly, allowing it to monitor jobs over multiple sites.
2008 9th IEEE/ACM International Conference on Grid Computing, 2008
Scheduling services are core Grid components of paramount importance to support the transparent distribution of tasks to remote shared resources in an efficient way. High availability of these core services is thus of great importance. Given the distributed nature of the system, monitoring the task lifecycle and the aggregate workflow patterns generated by users belonging to various communities is particularly challenging. This paper deals with the problem of Grid workload monitoring by reviewing the related requirements, and illustrates the architecture and implementation of a tool, the WMSMonitor, which is designed to meet the needs of various users categories, such as administrators, developers, advanced Grid users and performance testers.
2004
Grid computing involves the close coordination of many different sites which offer distinct computational and storage resources to the Grid user community. The resources at each site need to be monitored continuously. Static and dynamic site information needs to be presented to the user community in a simple and efficient manner. This paper will present both the design and implementation of the Grid3 monitoring infrastructure and the design details and the functionalities of a new application called the GridCat. The Grid3 monitoring architecture follows a useroriented design that specifies standard metrics and utilizes underlying monitoring tools to collect them into a diversified framework. Then existing tools can be integrated, their functionality extended and new tools developed. The primary tools used include the ACDC Job Monitoring system from University at Buffalo, Ganglia, a preliminary version of GridCat, Globus MDS, the University of Chicago Grid telemetry MDViewer, and US ...
The R-GMA (Relational Grid Monitoring Architecture) was developed within the EU DataGrid project, to bring the power of SQL to an information and monitoringsystem for the grid. It provides producer and consumerservices to both publish and retrieve information fromanywhere within a grid environment. Users within a Virtual Organization may define their own tables dynamically into which to publish data.Within the DataGrid project R-GMA was used for theinformation system, making details about grid resources available for use by other middleware components. R-GMA has also been used for monitoring grid jobs by members of the CMS and D0 collaborations where information about jobs is published from within a jobwrapper, transported across the grid by R-GMA and made available to users. An accounting package for processing PBS logging data and sending it to one or more GridOperation Centres using R-GMA has been written and isbeing deployed within LCG. There are many otherexisting and potential applications.R-GMA is currently being re-engineered to fit into a Web Service environment as part of the EU Enabling Grids for E-science in Europe (EGEE) project. Improvements being developed include fine grainedauthorization, an improved user interface and measures to ensure superior scaling behaviour.
Future Generation Computer Systems, 2005
Grid systems follow a new paradigm of distributed computing that enables the coordination of resources and services that are not subject to centralized control, can dynamically join and leave virtual pools, and are assigned to users by means of an explicit assignment functionality. The monitoring of a Grid is a multi-institutional and Virtual Organization (VO)-oriented service. It must deal with the dynamics, diversity, and geographical distribution of the resources available to Virtual Organizations, and the various levels of abstraction for modeling them. This paper presents the requirements, architecture and implementation of GridICE, a monitoring service for Grid systems. The suitability of this tool in real-life scenarios is analyzed and discussed.
Distributed resource property repositories and state monitoring systems are critical components of any Grid Management Architecture, providing Grid scheduler, job/execution manager and state estimation components with accurate information about network, computational and storage resource properties and status. Without an upto-date information and monitoring service, intelligent scheduling decision making would be a nearimpossible task. In this paper we describe a scalable, portable and non-intrusive Grid Information and Monitoring framework. We compare it to wellknown Grid information & monitoring systems, and measure its performance to the Globus 2.2 Monitoring and Discovery Service (MDS) and the Globus 3.2 web services Information Service (WS-IS) performance.
2007
Processing of large data sets with high through put is one of the major focus of Grid computing today. If possible, data are split up into small chunks that are processed independently. Thus, job sets of hundreds > or even thousands of individual jobs are possible. For the job submitter or the resource providers such a scenario is a nightmare currently, as it is hard to keep track of such an amount of jobs or to identify failure reasons. We present a system that will support gLite users to track and monitor their jobs and their resource usage, to nd and identify failure reasons and even to steer running applications.
This document presents the features implemented for the automatic deployment and dynamic provision of grid services, and for the scalable cloud-like management of grid site resources. These features, developed largely in Work Package 6 (WP6) are integrated into the StratusLab Toolkit by Work Package 4 (WP4). They involve cloud-like APIs, a service definition language, contextualization, scalable cloud frameworks, monitoring and accounting solutions. Some functionalities developed include TCloud and OCCI implementations, a library to process OVF, the Claudia framework and integration with Ganglia monitoring information.
2006
Contemporary Grids are characterized by a middleware that provides the necessary virtualization of computation and data resources for the shared working environment of the Grid. In a large-scale view, different middleware technologies and implementations have to coexist. The SOA approach provides the needed architectural backbone for interoperable environments, where different providers can offer their solutions without restricting users to just one specific implementation. The WMProxy (Workload Manager Proxy) is a new service providing access to the gLite Workload Management System (WMS) functionality through a simple Web Services-based interface. The WMProxy was designed to efficiently handle a large number of requests for job submission and control to the WMS and the service interface addresses the Web Services and SOA architecture standards, in particular adhering to the WS-Interoperability basic profile. In this paper we describe the WMProxy service: from its architecture, independent from the used Web Services container, up to the provided functionality, all together with the rationale behind the decisions made during both the design and implementation phases. In particular, we provide a description of how the WMProxy is integrated with the gLite Workload Management System; the used technologies, focusing on the Web Services features; the mechanisms adopted to improve performances still keeping high reliability and fault-tolerance; the changes in the job submission operation chain with respect to the previous generation of Workload Management Systems and the new operations provided in order to support bulksubmission and improve Client-Server interaction capabilities.
Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, 2002
The research described in this paper is performed as part of the Globus Project. It introduces a new Grid service called InfoGram that combines the ability of serving as information service and as a job execution service. Previously, both services were architected and implemented within the Globus Toolkit as two different services with different wire protocols. Our service demonstrates a significant simplification of the architecture while treating job submissions and information queries alike. The advantage of our service is that it provides backwards compatibility to existing Grid services, while at the same time providing forwards compatibility to the emerging Web services world. Part of the work conducted within this effort is already reused by the current Open Grid Services Architecture prototype implementation.
Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
Experience with generating simulation data of high energy physics experiments has shown that a job monitoring system (JMS) is essential to understand failures of jobs within the Grid. Such a system can give information about the status of the user job as well as the worker node in parallel while a user job is running. It should support the user directly by allowing the user to interact with the running job and should be able to make an automatic error correction. Furthermore, such a system can be extended for an automatic classification of errors which can improve the stability and performance of the Grid environment. To increase the acceptance of the Grid, a graphical user interface (GUI) has been developed and integrated with the job monitoring system. Both components are currently integrated in the computing environment for generating data for the DØ Experiment. In this paper we want to describe the basic components of the job monitoring software.
The grid is emerging
Cluster Computing and the Grid, 2006
Job monitoring in Grid systems presents an important challenge due to Grid environments are volatile, heterogeneous, not reliable and are managed by different middlewares and monitoring tools. We present the infrastructure that we have designed and implemented in the HPC-Europa European project that allows uniform access to job monitoring information from different virtual organizations. The presented system abstracts user to
2004
Abstract: This paper summarizes research on monitoring GRID resources, which resulted in theimplementation of the JIMS system. It contains an overview of the most important architectural andsoftware concepts that make the constructed system flexible and user-friendly. The paper evaluatesJMX and Web Service technologies as foundations for implementing monitoring systems. Particularattention has been paid to system adaptability, autoconfiguration and interoperability.Keywords: JMX, Grid, Monitoring, Distributed System, SOA, Web Services 1. Introduction Monitoring distributed computer system resources is an integral part of anymanagement activity. The grid computing concept [1] addresses issues related to ac-cessibility and transparent sharing of distributed computational, storage and com-munication resources among groups of users, putting the management aspects at theforefront of grid research. Recently, the grid research community has been inspiredby a new approach based on SOA (Servic...
2007
Grid systems must provide its users with precise and reliable information about the status and usage of available resources. The efficient distribution of this information enables Virtual Organizations (VOs) to optimize the utilization strategies of their resources and to complete the planned computations. In this paper, we describe the recent evolution of GridICE, a monitoring tool for Grid systems. Such evolutions are targeted at satisfying the requirements from the main categories of users: Grid operators, site administrators, Virtual Organization (VO) managers and Grid users.
2007
We propose a multi-tiered architecture for middleware-independent Grid job management. The architecture consists of a number of services for well-defined tasks in the job management process, offering complete user-level isolation of service capabilities, multiple layers of abstraction, control, and fault tolerance. The middleware abstraction layer comprises components for targeted job submission, job control and resource discovery.
Computing Research Repository, 2003
WorldGRID is an intercontinental testbed spanning Europe and the US integrating architecturally different Grid implementations based on the Globus toolkit. The WorldGRID testbed has been successfully demonstrated during the WorldGRID demos at SuperComputing 2002 (Baltimore) and IST2002 (Copenhagen) where real HEP application jobs were transparently submitted from US and Europe using "native" mechanisms and run where resources were available, independently of their location. To monitor the behavior and performance of such testbed and spot problems as soon as they arise, DataTAG has developed the EDT-Monitor tool based on the Nagios package that allows for Virtual Organization centric views of the Grid through dynamic geographical maps. The tool has been used to spot several problems during the WorldGRID operations, such as malfunctioning Resource Brokers or Information Servers, sites not correctly configured, job dispatching problems, etc. In this paper we give an overview of the package, its features and scalability solutions and we report on the experience acquired and the benefit that a GRID operation center would gain from such a tool.
Proceedings ITCC 2003. International Conference on Information Technology: Coding and Computing, 2003
The architecture and the current implementation of the grid portal GENIUS (Grid Enabled web eNvironment for site Independent User job Submission), jointly developed by INFN and NICE within the context of INFN Grid and DataGrid Projects, is presented and discussed. Particular care is devoted to the description of job submission and monitoring and to transparent access to user's data and applications.
Journal of Physics: Conference Series, 2010
Instrumentation of jobs throughout its life-cycle is not obvious, as they are quite independent after being submitted, crossing multiple environments and locations until landing on a worker node. In order to measure correctly the resources used at each step, and to compare it with the view from a Fabric Infrastructure, a solution is proposed using Messaging System for the Grids (MSG) for integrating information coming from different sources.
UK e-Science All Hands meeting
R-GMA is a realization of the Grid Monitoring Architecture (GMA) that also exploitsthe power of the relational data model and the SQL query language. The biggest challenge during the development of R-GMA was to ensure that it could be scaled to operate in a large grid reliably. The system is being used in areas as diverse as resource discovery, job logging and bookkeeping, network monitoring and accounting. A version of R-GMA is being developed within the follow-on European project EGEE. Work continues within GGF to define information services for OGSA on the basis of experience with R-GMA.
2011
This document presents the features implemented for the automatic deployment and dynamic provision of grid services, and for the scalable cloud-like management of grid site resources. These features, developed largely in Work Package 6 (WP6) are integrated into the StratusLab Toolkit by Work Package 4 (WP4). They involve cloud-like APIs, a service definition language, contextualization, scalable cloud frameworks, monitoring and accounting solutions. Some functionalities developed include TCloud and OCCI implementations, a library to process OVF, the Claudia framework and integration with Ganglia monitoring information.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.