научная статья по теме AN AGENT-BASED APPROACH FOR DYNAMIC ADJUSTMENT OF SCHEDULED JOBS IN COMPUTATIONAL GRIDS Кибернетика

Текст научной статьи на тему «AN AGENT-BASED APPROACH FOR DYNAMIC ADJUSTMENT OF SCHEDULED JOBS IN COMPUTATIONAL GRIDS»

ИЗВЕСТИЯ РАН. ТЕОРИЯ И СИСТЕМЫ УПРАВЛЕНИЯ, 2010, № 5, с. 87-94

== КОМПЬЮТЕРНЫЕ МЕТОДЫ =

УДК 62-40

AN AGENT-BASED APPROACH FOR DYNAMIC ADJUSTMENT OF SCHEDULED JOBS IN COMPUTATIONAL GRIDS

© 2010 г. T. Altameem, M. Amoon*

Dept. of Computer Science, RCC, King Saud University, P.O. Box: 28095—11437Riyadh-Saudi Arabia

Received November 10.2009

Grid computing is a newly developed technology for complex systems with large-scale resource sharing, wide-area communication, and multi-institutional collaboration. Grid scheduling is an important infrastructure in the grid computing environment. Most of the existing grids scheduling methods focus on maximizing processor utilization without taking grid load into consideration. This may lead to significant inefficiencies in performance such as large job queues and processing delays.

In this paper, we propose a multiagent-based scheduling system for computational grids with a new approach. Agent technology is suitable for a computational grid because of the dynamic, heterogeneous, and autonomous nature of the grid. The main idea of the proposed system is a combination of a static scheduling using a fixed scheduling algorithm and a dynamic adjustment through the autonomous behavior of agents. The superiority of the proposed system, in reducing the load of the grid and minimizing the response time for executing user applications, is demonstrated by simulation experiments.

Introduction. Grid technologies have enabled the development of novel applications that require close and potentially sophisticated interaction between resources that may belong to different organizations [1]. Grid can be defined as a set of resources distributed over wide-area networks that can support large-scale distributed applications [2]. The resources my involve CPU cycles, memory, bandwidth, disk, applications, databases on remote systems, scientific data in organizations for research purposes, etc. These heterogeneous resources are distributed and owned geographically by different organizations.

Grids can be classified as computational grids or data grids. A computational grid is any distributed computational network, enabled with software that allows cooperation and the sharing of computational cycles and resources. A data grid is a grid computing system that deals with data and the controlled sharing and management of large amounts of distributed data. These are often, but not always, combined with computational grid systems.

Grid computing is a newly developed technology for solving large-scale computational and data intensive problems in science, engineering, and commerce [3]. It is distinct from conventional distributed computing by autonomic heterogeneous resources and large scale resource sharing [4]. It uses resources of many separate computers connected by a network (usually internet). The tremendously large number and the heterogeneous potential of grid resources cause scheduling of user jobs to these resources is the key technology in grid computing [3, 5].

As network speeds grow, it is possible to imagine global grids that contain multiple heterogeneous resources, and allow users to submit jobs from anywhere in the world to these grids. A job is an atomic unit to be scheduled by the scheduler and assigned to a resource to be executed. In order to obtain good performance, it is necessary to select resources for use in such a manner that minimizes the computation and communication time,

K. Ranganathan and I. Foster [6] had stated that effective scheduling of jobs to resources of a grid is complicated by a number of factors. The first factor is the geographical distance. This factor means that grid resources may be connected by slow WANs compared to high-speed LAN connections. Moreover, the input datasets for jobs may be large (perhaps many Gigabytes). In these conditions, data transfer times and network latency can be significant, compelling job-scheduling decisions to consider the amount of network traffic they generate. Thus, intelligent management of data plays an important role in grid scheduling.

The second factor is the different administrative domains with different local policies which may take precedence over global scheduler policies. Distributed control is an inherent property of grids, and the scheduling framework should enable this.

The third factor is the size and heterogeneity of the grid. The grid may contain many hundreds of sites, thousands of users, and millions of heterogeneous resources. Thus, scalability and reliability are important concerns for any scheduling paradigm.

The fourth factor is the grid dynamism which means that some resources can fail at any time without advance notice. Hence, batch scheduling techniques that assume constant availability and map out the entire schedule before running jobs may not be successful. Scheduling algorithms have to work with the assumption that the information used for mapping jobs to resources will quickly be out of date. These complications lead to a need to an effective and efficient scheduling algorithm to be employed [6].

There are several solutions that currently address issues of resource scheduling in grid environments. These include Globus, Legion, NetSolve, Condor, Ninf and Nimrod/G [7]. While many of these projects utilize query-based mechanisms for resource discovery and advertisement [8], our work adopts a multi-agent-based approach. This allows agents to control the query process and to make resource discovery decisions based on its own internal logic as opposed to relying on a fixed-function query engine.

Agents have features that make them ideally qualified to be deployed for job scheduling in computational grids. Among of these features: autonomy, proactiv-ness, socialability, and interactivity. Mobile agent technology has been considered an enhancement of grid technologies as it provides powerful and efficient mechanisms to develop applications for computational grids. There is an increasing amount of data/resources available in grids, and the large number ofjobs which have to be performed to manipulate these data/resources. Mobile agent technology offers the possibility of executing these jobs in an automated way, with minimal human intervention [9]. Computational jobs are assigned to an agent and that agent can be sent to remote hosts to execute the assignment. After completing the assignment, the results will be carried back to the host (sender) by the agent.

In this paper, we introduce a multiagent-based system for scheduling jobs in computational grids a new approach. Agent technology is suitable for a computational grid because of the dynamic, heterogeneous, and autonomous nature of the grid. The main idea of the proposed system is a combination of a static scheduling using a fixed scheduling algorithm and a dynamic adjustment through the autonomous behavior of agents.

1. Grid Scheduling. Scheduling can occur prior to execution, assuming a fixed state of the grid engine (static scheduling), or can be done during execution (dynamic scheduling) [10]. In static scheduling, there is a priori fully knowledge of the execution times of jobs, the communication times between nodes in the grid, and the precedence relations between tasks before the run-time. Thus, static scheduling techniques can be applied at compile time. Each task has a static assignment to a particular processor, and each time that task is submitted for execution, it is assigned to that processor. This type of scheduling has a major drawback when applied for computational grids due to

the dynamism of the grids. This dynamism allows some resources to leave the grid or change their locations in the grid without any advance notes. Thus, the system can fail at any time. Also, some new resources may be added to the grid that may provide better performance when used by the applications. But these new resources will not be useful due to static scheduling of tasks.

On the other hand, dynamic scheduling assumes no priori knowledge about the tasks to be scheduled. Thus, all scheduling decisions are made at run-time and the overhead is excessive for most real systems. The main disadvantage of dynamic scheduling is the overhead incurred to determine the schedule while the program is running. In general, dynamic scheduling problems are NP-hard problems and generally there is no any optimal scheduling algorithm for dynamic scheduling problems. The advantage of dynamic scheduling over static scheduling is that the system need not be aware of the run-time behavior of the application before execution. It is particularly useful in a system where the primary performance goal is maximizing resource utilization and minimizing the effect of idle and faulty resources, rather than minimizing runtime for jobs [11].

Dynamic environments like computational grids require efficient techniques to assign jobs to the suitable resources for executing these jobs. Therefore, it is necessary to develop a scheduling mechanism that adapts to a dynamic grid computing environment and the heterogeneity of resources in that environment with minimal overheads.

The process of allocation a resource to a job involves three basic steps [12]. In the first step, the resources currently available in the grid are located and identified. All the required information of the resources must be collected in a database. The information includes resources speed, bandwidth between the scheduler and the resource, cache memory size of the resource, the current load of the resource, and the current availability of the resource.

In the second step, a resource from the available resources is chosen by using a scheduling algorithm. A job is assigned to the chosen resource to be executed. This step is called job scheduling. The scheduling algorithm tries to maximize some optimization criterion specified by the user. Most of the grid systems use performance-guided schedulers in which the

Для дальнейшего прочтения статьи необходимо приобрести полный текст. Статьи высылаются в формате PDF на указанную при оплате почту. Время доставки составляет менее 10 минут. Стоимость одной статьи — 150 рублей.

Показать целиком