0% found this document useful (0 votes)

542 views138 pages

SimGrid for Distributed Systems Research

This document discusses the SimGrid framework for simulating large-scale distributed systems. It begins with an introduction to large-scale distributed systems and the need for reproducible and comparable experiments. Next, it covers challenges with real-world experiments and outlines benefits of simulation approaches. The rest of the document presents an agenda on distributed systems experiments, methodologies, and tools including SimGrid.

Uploaded by

Martin Marshian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

542 views138 pages

SimGrid for Distributed Systems Research

Uploaded by

Martin Marshian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The SimGrid Framework for Research on Large-Scale Distributed Systems

Martin Quinson (Nancy University, France) Arnaud Legrand (CNRS, Grenoble University, France) Henri Casanova (Hawaii University at Manoa, USA) Presented By: Pedro Velho (Grenoble University, France)
simgrid-dev@[Link]

Large-Scale Distributed Systems Research

Large-scale distributed systems are in production today
Grid platforms for e-Science applications Peer-to-peer le sharing Distributed volunteer computing Distributed gaming

Researchers study a broad range of systems

Data lookup and caching algorithms Application scheduling algorithms Resource management and resource sharing strategies

They want to study several aspects of their system performance

Response time Throughput Scalability Robustness Fault-tolerance Fairness

Main question: comparing several solutions in relevant settings

SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (2/96)

Large-Scale Distributed Systems Science?

Requirement for a Scientic Approach
Reproducible results
You can read a paper, reproduce a subset of its results, improve

Standard methodologies and tools

Grad students can learn their use and become operational quickly Experimental scenario can be compared accurately

Current practice in the eld: quite dierent

Very little common methodologies and tools Experimental settings rarely detailed enough in literature (test source codes?)

Purpose of this workshop

Present emerging methodologies and tools Show how to use some of them in practice Discuss open questions and future directions
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (3/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(4/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches
Real-world experiments Simulation

Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(5/96)

Analytical or Experimental?

Analytical works?
Some purely mathematical models exist Allow better understanding of principles in spite of dubious applicability
impossibility theorems, parameter inuence, . . .

Theoretical results are dicult to achieve

Everyday practical issues (routing, scheduling) become NP-hard problems
Most of the time, only heuristics whose performance have to be assessed are proposed

Models too simplistic, rely on ultimately unrealistic assumptions.

One must run experiments

Most published research in the area is experimental

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(6/96)

Running real-world experiments

Eminently believable to demonstrate the proposed approach applicability Very time and labor consuming
Entire application must be functional Parameter-sweep; Design alternatives

Choosing the right testbed is dicult

My own little testbed?
Well-behaved, controlled,stable Rarely representative of production platforms

Real production platforms?

Not everyone has access to them; CS experiments are disruptive for users Experimental settings may change drastically during experiment (components fail; other users load resources; administrators change cong.)

Results remain limited to the testbed

Impact of testbed specicities hard to quantify collection of testbeds... Extrapolations and explorations of what if scenarios dicult (what if the network were dierent? what if we had a dierent workload?)

Experiments are uncontrolled and unrepeatable

No way to test alternatives back-to-back (even if disruption is part of the experiment)

Dicult for others to reproduce results

even if this is the basis for scientic advances!
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (7/96)

Simulation
Simulation solves these diculties No need to build a real system, nor the full-edged application Ability to conduct controlled and repeatable experiments (Almost) no limits to experimental scenarios Possible for anybody to reproduce results Simulation in a nutshell Predict aspects of the behavior of a system using an approximate model of it Model: Set of objects dened by a state Rules governing the state evolution Simulator: Program computing the evolution according to the rules Wanted features:
Accuracy: Correspondence between simulation and real-world Scalability: Actually usable by computers (fast enough) Tractability: Actually usable by human beings (simple enough to understand) Instanciability: Can actually describe real settings (no magical parameter)

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(8/96)

Simulation in Computer Science

Microprocessor Design
A few standard cycle-accurate simulators are used extensively [Link] Possible to reproduce simulation results

Networking
A few established packet-level simulators: NS-2, DaSSF, OMNeT++, GTNetS Well-known datasets for network topologies Well-known generators of synthetic topologies SSF standard: [Link] Possible to reproduce simulation results

Large-Scale Distributed Systems?

No established simulator up until a few years ago Most people build their own ad-hoc solutions
Naicken, Stephen et Al., Towards Yet Another Peer-to-Peer Simulator, HET-NETs06.

From 141 P2P [Link], 30% use a custom tool, 50% dont report used tool
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (9/96)

Simulation in Parallel and Distributed Computing

Used for decades, but under drastic assumptions in most cases

Simplistic platform model

Fixed computation and communication rates (Flops, Mb/s) Topology either fully connected or bus (no interference or simple ones) Communication and computation are perfectly overlappable

Simplistic application model

All computations are CPU intensive (no disk, no memory, no user) Clear-cut communication and computation phases Computation times even ignored in Distributed Computing community Communication times sometimes ignored in HPC community

Straightforward simulation in most cases

Fill in a Gantt chart or count messages with a computer rather than by hand No need for a simulation standard
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (10/96)

Large-Scale Distributed Systems Simulations?

Simple models justiable at small scale
Cluster computing (matrix multiply application on switched dedicated cluster) Small scale distributed systems

Hardly justiable for Large-Scale Distributed Systems

Heterogeneity of components (hosts, links)
Quantitative: CPU clock, link bandwidth and latency Qualitative: ethernet vs myrinet vs quadrics; Pentium vs Cell vs GPU

Dynamicity
Quantitative: resource sharing availability variation Qualitative: resource come and go (churn)

Complexity
Hierarchical systems: grids of clusters of multi-processors being multi-cores Resource sharing: network contention, QoS, batches Multi-hop networks, non-negligible latencies Middleware overhead (or optimizations) Interference of computation and communication (and disk, memory, etc)
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (11/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems
Possible designs Experimentation platforms: Grid5000 and PlanetLab Emulators: ModelNet and MicroGrid Packet-level Simulators: ns-2, SSFNet and GTNetS Ad-hoc simulators: ChicagoSim, OptorSim, GridSim, . . . Peer to peer simulators SimGrid

Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications SMPI: Running MPI applications on top of SimGrid Conclusion
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (12/96)

Models of Large-Scale Distributed Systems

Model = Set of objects dened by a state Set of rules governing the state evolution

Model objects:
Evaluated application: Do actions, stimulus to the platform Resources (network, CPU, disk): Constitute the platform, react to stimulus.
Application blocked until actions are done Resource can sometime do actions to represent external load

Expressing interaction rules

more abstract Mathematical Simulation: Based solely on equations Discrete-Event Simulation: System = set of dependant actions & events Emulation: Trapping and virtualization of low-level application/system actions Real execution: No modication

less abstract

Boundaries are blurred

Tools can combine several paradigms for dierent resources Emulators may use a simulator to compute resource availabilities
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (13/96)

Simulation options to express rules

Network
Macroscopic: Flows in pipes (mathematical & coarse-grain d.e. simulation) Data sizes are liquid amount, links are pipes Microscopic: Packet-level simulation (ne-grain d.e. simulation) Emulation: Actual ows through some network timing + time expansion

CPU
Macroscopic: Flows of operations in the CPU pipelines Microscopic: Cycle-accurate simulation (ne-grain d.e. simulation) Emulation: Virtualization via another CPU / Virtual Machine

Applications
Macroscopic: Application = analytical ow Less macroscopic: Set of abstract tasks with resource needs and dependencies
Coarse-grain d.e. simulation Application specication or pseudo-code API

Virtualization: Emulation of actual code trapping application generated events

SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (14/96)

Large-Scale Distributed Systems Simulation Tools

A lot of tools exist
Grid5000, Planetlab, MicroGrid, Modelnet, Emulab, DummyNet ns-2, GTNetS, SSFNet ChicagoSim, GridSim, OptorSim, SimGrid, . . . PeerSim, P2PSim, . . .

How do they compare?

How do they work?
Components taken into account (CPU, network, application) Options used for each component (direct execution; emulation; d.e.; simulation)

What are their relative qualities?

Accuracy (correspondence between simulation and real-world) Technical requirement (programming language, specic hardware) Scale (tractable size of systems at reasonable speed) Experimental settings congurable and repeatable, or not

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(15/96)

Experimental tools comparison

Grid5000 Planetlab Modelnet MicroGrid ns-2 SSFNet GTNetS ChicSim OptorSim GridSim P2PSim PlanetSim PeerSim SimGrid CPU direct virtualize emulation coarse d.e. coarse d.e. coarse d.e. math/d.e. Disk direct virtualize amount coarse d.e. (underway) Network direct virtualize emulation ne d.e. ne d.e. ne d.e. ne d.e. coarse d.e. coarse d.e. coarse d.e. cste time math/d.e. Application direct virtualize emulation emulation coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. coarse d.e. state machine coarse d.e. state machine d.e./emul Requirement access none lot material none C++ and tcl Java C++ C Java Java C++ Java Java C or Java Settings xed uncontrolled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled controlled Scale <5000 hundreds dozens hundreds <1,000 <100,000 <177,000 few 1,000 few 1,000 few 1,000 few 1,000 100,000 1,000,000 few 100,000

Direct execution no experimental bias (?) Experimental settings xed (between hardware upgrades), but not controllable Virtualization allows sandboxing, but no experimental settings control Emulation can have high overheads (but captures the overhead) Discrete event simulation is slow, but hopefully accurate To scale, you have to trade speed for accuracy
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (16/96)

Grid5000 (consortium INRIA)

French experimental platform
1500 nodes (3000 cpus, 4000 cores) over 9 sites Nation-wide 10Gb dedicated interconnection [Link]

Scientic tool for computer scientists

Nodes are deployable: install your own OS image Allow study at any level of the stack:
Network (TCP improvements) Middleware (scalability, scheduling, fault-tolerance) Programming (components, code coupling, GridRPC) Applications

Applications not modied, direct execution Environment controlled, experiments repeatable Relative scalability (only 1500-4000 nodes)
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (17/96)

PlanetLab (consortium)
Open platform for developping, deploying, and accessing planetary-scale services Planetary-scale 852 nodes, 434 sites, >20 countries

Distribution Virtualization each user can get a slice of the platform Unbundled Management local behavior dened per node; network-wide behavior: services multiple competing services in parallel (shared, unprivileged interfaces) As unstable as the real world Demonstrate the feasability of P2P applications or middlewares No reproducibility!
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (18/96)

ModelNet (UCSD/Duke)
Applications
Emulation and virtualization: Actual code executed on virtualized resources Key tradeo: scalability versus accuracy

Resources: system calls intercepted

gethostname, sockets

CPU: direct execution on CPU

Slowdown not taken into account!

Network: emulation through:

one emulator (running on FreeBSD) a gigabit LAN hosts + IP aliasing for virtual nodes emulation of heterogeneous links Similar ideas used in other projects (Emulab, DummyNet, Panda, . . . )
Amin Vahdat et Al., Scalability and Accuracy in a LargeScale Network Emulator, OSDI02.
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (19/96)

MicroGrid (UCSD)
Applications
Application supported by emulation and virtualization Actual application code is executed on virtualized resources Accounts for CPU and network
Application

Resources: wraps syscalls & grid tools

gethostname, sockets, GIS, MDS, NWS

Virtual Resources

CPU: direct execution on fraction of CPU

nds right mapping

Network: packet-level simulation

parallel version of MaSSF

MicroGrid Physical Ressources

Time: synchronize real and virtual time

nd the good execution rate
Andrew Chien et Al., The MicroGrid: a Scientic Tool for Modeling Computational Grids, SuperComputing 2002.
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (20/96)

More recent emulation projects

CPU/OS emulation Lately, there has been lots of eorts on OS emulation / virtualization / para-virtualization / . . . Evolution both on the hardware and OS/software part VMWare Xen kvm/Qemu VirtualPC VirtualBox ... The eort is made on portability, eciency, isolation Network emulation Probably a lot of ongoing projects but my informations are not up-to-date Two interesting projects developed in the Grid5000 community Wrekavoc (Emmanuel Jeannot). Very lightweight and easy to set up (CPU burn, suspend/resume process, tc/IProute2) P2PLab (Lucas Nussbaum). An emulation framework specically designed for the study of P2P systems, where the core of the network is not the bottleneck
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (21/96)

Packet-level simulators
ns-2: the most popular one
Several protocols (TCP, UDP, . . . ), several queuing models (DropTail, RED, . . . ) Several application models (HTTP, FTP), wired and wireless networks Written in C++, congured using TCL. Limitated scalability (< 1, 000)

SSFNet: implementation of SSF standard

Scalable Simulation Framework: unied API for d.e. of distributed systems Written in Java, usable on 100 000 nodes

GTNetS: Georgia Tech Network Simulator

Design close to real networks protocol philosophy (layers stacked) C++, reported usable with 177, 000 nodes

Simulation tools of / for the networking community

Topic: Study networks behavior, routing protocols, QoS, . . . Goal: Improve network protocols Microscopic simulation of packet movements Inadequate for us (long simulation time, CPU not taken into account)
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (22/96)

Latest news about packet-level simulators

ns-3 is the new promising project
ns-2 was written in TCL and C++ and was... dirty There was many many contributors though and had the largest available protocol code-base There is an international eort for rewriting it (cleaner, more ecient, better support for contribution, . . . )

Packet-level simulation and emulation have never been that close

Among the recent interesting features: Use real TCP stacks (e.g., from the Linux kernel) instead of TCP models The Distributed Client Extension (available in ns-2): one computer (EmuHost) runs ns-2, a set of computers running the real application (RWApps) (De)multiplexing and UDP tunneling of trac between the EmuHost and the RWApps using TAP

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(23/96)

ChicagoSim, OptorSim, GridSim, . . .

Network simulator are not adapted, emulation solutions are too heavy PhD students just need simulator to plug in their algorithm
Data placement/replication Grid economy

Many simulators. Most are home-made, short-lived; Some are released ChicSim designed for the study of data replication (Data Grids), built on ParSec
Ranganathan, Foster, Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications, HPDC02.

OptorSim developped for European Data-Grid

DataGrid, CERN. OptorSim: Simulating data access optimization algorithms

GridSim focused on Grid economy

Buyya et Al. GridSim: A Toolkit for the Modeling and Simulation of Global Grids, CCPE02.

every [sub-]community seems to have its own simulator

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(24/96)

PeerSim, P2PSim, . . .
Thee peer-to-peer community also has its own private collection of simulators: focused on P2P protocols main challenge = scale

P2PSim Multi-threaded discrete-event simulator. Constant communication time. Alpha release (april 2005)
[Link]

PlanetSim Multi-threaded discrete-event simulator. Constant communication time. Last release (2006)
[Link]

PeerSim Designed for epidemic protocols. processes = state machines. Two simulation modes: cycle-based (time is discrete) or event-based. Resources are not modeled. 1.0.3 release (december 2007)
[Link]

OverSim A recent one based on OMNeT++ (april 2008)

[Link]

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(25/96)

SimGrid
History

(Hawaii, Grenoble, Nancy)

Created just like other home-made simulators (only a bit earlier ;) Original goal: scheduling research need for speed (parameter sweep) accuracy not negligible HPC community concerned by performance

SimGrid in a Nutshell
Simulation communicating processes performing computations Key feature: Blend of mathematical simulation and coarse-grain d. e. simulation Resources: Dened by a rate (MFlop/s or Mb/s) + latency
Also allows dynamic traces and failures

Tasks can use multiple resources explicitely or implicitly

Transfer over multiple links, computation using disk and CPU

Simple API to specify an heuristic or application easily

Casanova, Legrand, Quinson. SimGrid: a Generic Framework for Large-Scale Distributed Experimentations, EUROSIM08.
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (26/96)

Experimental tools comparison

So what simulator should I use?

It really depends on your goal / resources
Grid5000 experiments very good . . . if have access and plenty of time PlanetLab does not enable reproducible experiments ModelNet, ns-2, SSFNet, GTNetS meant for networking experiments (no CPU) ModelNet requires some specic hardware setup MicroGrid simulations take a lot of time (although they can be parallelized) SimGrids models have clear limitations (e.g. for short transfers) SimGrid simulations are quite easy to set up (but rewrite needed) SimGrid does not require that a full application be written Ad-hoc simulators are easy to setup, but their validity is still to be shown, ie, the results obtained may be plainly wrong Ad-hoc simulators obviously not generic (dicult to adapt to your own need)

Key trade-o seem to be accuracy vs speed

The more abstract the simulation the fastest The less abstract the simulation the most accurate Does this trade-o really hold?
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (28/96)

Simulation Validation
Crux of simulation works
Validation is dicult Almost never done convincingly (not specic to CS: other science have same issue here)

How to validate a model (and obtain scientic results?)

Claim that it is plausible (justication = argumentation) Show that it is reasonable
Some validation graphs in a few special cases at best Validation against another validated simulator

Argue that trends are respected (absolute values may be o) it is useful to compare algorithms/designs Conduct extensive verication campaign against real-world settings

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(29/96)

Simulation Validation: the FLASH example

FLASH project at Stanford
Building large-scale shared-memory multiprocessors Went from conception, to design, to actual hardware (32-node) Used simulation heavily over 6 years

Authors compared simulation(s) to the real world

Error is unavoidable (30% error in their case was not rare) Negating the impact of we got 1.5% improvement Complex simulators not ensuring better simulation results
Simple simulators worked better than sophisticated ones (which were unstable) Simple simulators predicted trends as well as slower, sophisticated ones Should focus on simulating the important things

Calibrating simulators on real-world settings is mandatory

For FLASH, the simple simulator was all that was needed. . .
Gibson, Kunz, Ofelt, Heinrich, FLASH vs. (Simulated) FLASH: Closing the Simulation Loop, Architectural Support for Programming Languages and Operating Systems, 2000
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (30/96)

Conclusion
Large-Scale Distributed System Research is Experimental
Analytical models are too limited Real-world experiments are hard & limited Most literature rely on simulation

Simulation for distributed applications still taking baby steps

Compared for example to hardware design or networking communities but more advanced for HPC Grids than for P2P Lot of home-made tools, no standard methodology Very few simulation projects even try to:
Publish their tools for others to use Validate their tools Support other peoples use: genericity, stability, portability, documentation, . . .

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(31/96)

Conclusion
Claim: SimGrid may prove helpful to your research
User-community much larger than contributors group Used in several communities (scheduling, GridRPC, HPC infrastructure, P2P) Model limits known thanks to validation studies Easy to use, extensible, fast to execute Around since almost 10 years

Remainder of this talk: present SimGrid in detail

Under the cover:
Models used

Implementation overview
SimGrid architecture Features

Main limitations
Tool performance and scalability

Hands On
Scheduling algorithm experiences
SimGrid for Research on Large-Scale Distributed Systems Distributed Systems Experiments (32/96)

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(33/96)

SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications SMPI: Running MPI applications on top of SimGrid Conclusion
SimGrid for Research on Large-Scale Distributed Systems Resource Models (34/96)

Analytic Models underlying the SimGrid Framework

Main challenges for SimGrid design
Simulation accuracy:
Designed for HPC scheduling community dont mess with the makespan! At the very least, understand validity range

Simulation speed:
Users conduct large parameter-sweep experiments over alternatives

Microscopic simulator design

Simulate the packet movements and routers algorithms Simulate the CPU actions (or micro-benchmark classical basic operations) Hopefully very accurate, but very slow (simulation time simulated time)

Going faster while remaining reasonable?

Need to come up with macroscopic models for each kind of resource Main issue: resource sharing. Emerge naturally in microscopic approach:
Packets of dierent connections interleaved by routers CPU cycles of dierent processes get slices of the CPU
SimGrid for Research on Large-Scale Distributed Systems Resource Models (35/96)

Modeling a Single Resource

Basic model: Time = L +
size B

Resource work at given rate (B, in MFlop/s or Mb/s) Each use have a given latency (L, in s)

Application to processing elements (CPU/cores)

Very widely used (latency usually neglected) No cache eects and other specic software/hardware adequation No better analytical model (reality too complex and changing) Sharing easy in steady-state: fair share for each process

Application to networks
Turns out to be inaccurate for TCP B not constant, but depends on RTT, packet loss ratio, window size, etc. Several models were proposed in the literature

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(36/96)

Modeling TCP performance (single ow, single link)

Padhye, Firoiu, Towsley, Krusoe. Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation. IEEE/ACM Transactions on Networking, Vol. 8, Num. 2, 2000.

B = min

Wmax , RTT RTT

1 2bp/3 + T0 min(1, 3 3bp/8) p(1 + 32p 2 )

Wmax : receiver advertised window

p: loss indication rate

RTT: Round trip time b: #packages acknowledged per ACK T0 : TCP average retransmission timeout value

Model discussion
Captures TCP congestion control (fast retransmit and timeout mecanisms) Assumes steady-state (no slow-start) Accuracy shown to be good over a wide range of values p and b not known in general (model hard to instanciable)

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(37/96)

SimGrid model for single TCP ow, single link

Denition of the link l
Ll : physical latency Bl : physical bandwidth

Time to transfer size bytes over the link:

Time = Ll + size Bl

Empirical bandwidth: Bl = min(Bl , Wmax ) RTT

Justication: sender emits Wmax then waits for ack (ie, waits RTT) Upper limit: rst min member of previous model RTT assumed to be twice the physical latency Router queue time assumed to be included in this value
SimGrid for Research on Large-Scale Distributed Systems Resource Models (38/96)

Modeling Multi-hop Networks: Store & Forward

First idea, quite natural

Pay the price of going through link 1, then go through link 2, etc. Analogy to the time to go from a city to another: time on each road

Unfortunately, things dont work this way

Whole message not stored on each router Data split in packets over TCP networks (surprise, surprise) Transfers on each link occur in parallel
SimGrid for Research on Large-Scale Distributed Systems Resource Models (39/96)

Modeling Multi-hop Networks: WormHole

pi,j

l3
MTU

Remember Networking classes?

Links packetize stream according to MTU (Maximum Transmission Unit) Easy to simulate (SimGrid until 2002; GridSim 4.0 & most ad-hoc tools do)

Unfortunately, things dont work this way

IP packet fragmentation algorithms complex (when MTUs dier) TCP contention mecanisms:
Sender only emits cwnd packets before ACK Timeouts, fast retransmit, etc.

as slow as packet-level simulators, not quite as accurate

SimGrid for Research on Large-Scale Distributed Systems Resource Models (40/96)

Macroscopic TCP modeling is a eld

TCP bandwidth sharing studied by several authors
Data streams modeled as uids in pipes Same model for single stream/multiple links or multiple stream/multiple links
flow 0 link 1 flow 1 link 2 flow 2 link L flow L

Notations
L: set of links Cl : capacity of link l (Cl > 0) nl : amount of ows using link l F: set of ows; f P(L) f : transfer rate of f

Feasibility constraint
Links deliver their capacity at most: l L,
f l

f Cl

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(41/96)

Max-Min Fairness
Objective function: maximize min(f )
f F

Equilibrium reached if increasing any f decreases a f (with f > f ) Very reasonable goal: gives fair share to anyone Optionally, one can add prorities wi for each ow i maximizing min(wf f )
f F

Bottleneck links
For each ow f , one of the links is the limiting one l (with more on that link l, the ow f would get more overall) The objective function gives that l is saturated, and f gets the biggest share f F, l f ,
f l

f = Cl

and f = max{f , f

L. Massouli and J. Roberts, Bandwidth sharing: objectives and algorithms, e IEEE/ACM Trans. Netw., vol. 10, no. 3, pp. 320-328, 2002.
SimGrid for Research on Large-Scale Distributed Systems Resource Models (42/96)

Implementation of Max-Min Fairness

Bucket-lling algorithm
Set the bandwidth of all ows to 0 Increase the bandwidth of every ow by . And again, and again, and again. When one link is saturated, all ows using it are limited ( Loop until all ows have found a limiting link removed from set)

Ecient Algorithm
1. Search for the bottleneck link l so that: Cl = min nl Ck , kL nk

2. f l, f = Cll ; n Update all nl and Cl to remove these ows 3. Loop until all f are xed

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(43/96)

Max-Min Fairness on Homogeneous Linear Network

C1 = C C2 = C
link 1 flow 1 link 2 flow 2

flow 0

n1 = 2 n2 = 2

0 C /2 1 C /2 2 C /2

All links have the same capacity C Each of them is limiting. Lets choose link 1 0 = C /2 and 1 = C /2 Remove ows 0 and 1; Update links capacity Link 2 sets 1 = C /2 Were done computing the bandwidth allocated to each ow

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(44/96)

Max-Min Fairness on Backbone

C0 C1 C2 C3 C4 =1 = 1000 = 1000 = 1000 = 1000 n0 n1 n2 n3 n4 =1 =1 =2 =1 =1

Flow 1

link 1 link 2 link 0

Flow 2

link 3

link 4

1 999 2 1
1 1000 1000 1000 1000 1, 1 , 2 , 1 , 1

The limiting link is link 0 since The limiting link is link 2 This xes 1 = 999 Done. We know 1 and 2

1 1

= min

This xes 2 = 1. Update the links

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(45/96)

Side note: OptorSim 2.1 on Backbone

OptorSim (developped @CERN for Data-Grid)
One of the rare ad-hoc simulators not using wormhole

Unfortunately, strange resource sharing:

1. For each link, compute the share that each ow may get: 2. For each ow, compute what it gets: f = min
lf Cl nl

Flow 1

link 1 link 2 link 0

Flow 2

link 3

link 4

C0 C1 C2 C3 C4

=1 = 1000 = 1000 = 1000 = 1000

Cl nl n1 n1 n2 n3 n4

=1 =1 =2 =1 =1

share share share share share

= = = = =

1 1000 500 1000 1000

1 = min(1000, 500, 1000) = 500!! 2 = min( 1 , 500, 1000) = 1

1 limited by link 2, but 499 still unused on link 2

This unwanted feature is even listed in the README le...
SimGrid for Research on Large-Scale Distributed Systems Resource Models (46/96)

Proportional Fairness
Max-Min validity limits
MaxMin gives a fair share to everyone Reasonable, but TCP does not do so Congestion mecanism: Additive Increase, Muplicative Decrease (AIMD) Complicates modeling, as shown in literature

Proportional Fairness
MaxMin gives more to long ows (resource-eager), TCP known to do opposite Objective function: maximize
F

wf log(f )

(instead of min wf f for MaxMin)

log favors short ows

Kelly, Charging and rate control for elastic trac, in European Transactions on Telecommunications, vol. 8, 1997, pp. 33-37.

SimGrid for Research on Large-Scale Distributed Systems

Resource Models

(47/96)

Implementing Proportional Fairness

Karush Kuhn Tucker conditions:
Solution {f }f F is uniq Any other feasible solution {f }f F satisfy:
f F

f f 0 f

Compute the point {f } where the derivate is zero (convex optimization) Use Lagrange multipliers and steepest gradient descent

Proportional Fairness on Homogeneous Linear Network

flow 0 link 1 flow 1 link 2 flow 2 link L flow L

Maths give that:

C 0 = n+1

and

l = 0, l =

C n n+1

Ie, for C=100Mb/s and n=3, 0 = 25Mb/s, 1 = 2 = 3 = 75Mb/s Closer to practitioner expectations
SimGrid for Research on Large-Scale Distributed Systems Resource Models (48/96)

Recent TCP implementation

More protocol renement, more model complexity
Every agent changes its window size according to its neighbors one
(selsh net-utility maximization)

Computing a distributed gradient for Lagrange multipliers

same updates

TCP Vegas converges to a weighted proportional fairness

Objective function: maximize Lf log(f ) (Lf being the latency)

TCP Reno is even worse

Objective function: maximize
f F

arctan(f )

Low, S.H., A Duality Model of TCP and Queue Management Algorithms, IEEE/ACM Transactions on Networking, 2003.

Ecient implementation: possible, but not so trivial

Computing distributed gradient for Lagrange multipliers: useless in our setting Lagrange multipliers computable with ecient optimal-step gradient descent
SimGrid for Research on Large-Scale Distributed Systems Resource Models (49/96)

So, what is the model used in SimGrid?

--cfg=network model command line argument
CM02 Vegas Reno MaxMin fairness Vegas TCP fairness (Lagrange approach) Reno TCP fairness (Lagrange approach)

By default in SimGrid v3.3: CM02 Example: ./my simulator --cfg=network model:Vegas

CPU sharing policy

Default MaxMin is sucient for most cases cpu model:ptask L07 model specic to parallel tasks

Want more?
network model:gtnets use Georgia Tech Network Simulator for network Accuracy of a packet-level network simulator without changing your code (!) Plug your own model in SimGrid!!
(usable as scientic instrument in TCP modeling eld, too)
SimGrid for Research on Large-Scale Distributed Systems Resource Models (50/96)

How are these models used in practice?

Simulation kernel main loop
Data: set of resources with working rate 1. Some actions get created (by application) and assigned to resources 2. Compute share of everyone (resource sharing algorithms) 3. Compute the earliest nishing action, advance simulated time to that time 4. Remove nished actions 5. Loop back to 2

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
11 00 11 00 11 00 11 00 11 00

111 000 1 0 11 00 111 000 1 0 11 00 111 000 111 000 1 0 11 00 111111 000000 1 0 11 00

11 00 1 0 11 00 1 0
Resource Models

Simulated time
(51/96)

SimGrid for Research on Large-Scale Distributed Systems

Adding Dynamic Availabilities to the Picture

Trace denition
List of discrete events where the maximal availability changes t0 100%, t1 50%, t2 80%, etc.

Adding traces doesnt change kernel main loop

Availability changes: simulation events, just like action ends

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
SimGrid for Research on Large-Scale Distributed Systems

11 11111 00 00000 11 111 00 000 11111 00000 111111 11 000000 00 111 000 11 111 00 000 111111 000000 11 1111111 00 0000000 11 00 111 000
Resource Models

Simulated time

SimGrid also accept state changes (on/o)

(52/96)

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(53/96)

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(54/96)

User-visible SimGrid Components

SimDag
Framework for DAGs of parallel tasks

MSG
Simple applicationlevel simulator

GRAS

AMOK

SMPI
Library to run MPI applications on top of a virtual environment

Framework toolbox to develop distributed applications

XBT: Grounding features (logging, etc.), usual data structures (lists, sets, etc.) and portability layer

SimGrid user APIs

SimDag: model applications as DAG of (parallel) tasks MSG: model applications as Concurrent Sequential Processes GRAS: develop real applications, studied and debugged in simulator AMOK: set of distributed tools (bandwidth measurement, failure detector, . . . ) SMPI: simulate MPI codes XBT: grounding toolbox

Which API should I choose?

Your application is a DAG SimDag You have a MPI code SMPI You study concurrent processes, or distributed applications
You need performance graphs about several heuristics for a paper MSG You develop a real application (or want experiments on real platform) GRAS

Most popular API (for now): MSG

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (55/96)

Argh! Do I really have to code in C?!

No, not necessary
Some bindings exist: Java bindings to the MSG interface (new in v3.3) More bindings planned:
C++, Python, and any scripting language SimDag interface

Well, sometimes yes, but...

SimGrid itself is written from C for speed and portability (no dependency) All components naturally usable from C (most of them only accessible from C) XBT eases some diculties of C
Full-featured logs (similar to log4j), Exception support (in ANSI C) Popular abstract data types (dynamic array, hash tables, . . . ) Easy string manipulation, Conguration, Unit testing, . . .

What about portability?

Regularly tested under: Linux (x86, amd64), Windows and MacOSX Supposed to work under any other Unix system (including AIX and Solaris)
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (56/96)

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(57/96)

SimDag: Comparing Scheduling Heuristics for DAGs

Root 1 2 3 2 3 4 5 1 4 1 6 5
Time

3
Time

6 End

Main functionalities
1. Create a DAG of tasks
Vertices: tasks (either communication or computation) Edges: precedence relation

2. Schedule tasks on resources 3. Run the simulation (respecting precedences)

Compute the makespan
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (58/96)

The SimDag interface

DAG creation
Creating tasks: SD task create(name, data) Creating dependencies: SD task dependency {add/remove}(src,dst)

Scheduling tasks
SD task schedule(task, workstation number, *workstation list, double *comp amount, double *comm amount, double rate)
Tasks are parallel by default; simply put workstation number to 1 if not Communications are regular tasks, comm amount is a matrix Both computation and communication in same task possible rate: To slow down non-CPU (resp. non-network) bound applications

SD task unschedule, SD task get start time

Running the simulation

SD simulate(double how long) (how long < 0 until the end) SD task {watch/unwatch}: simulation stops as soon as tasks state changes

Full API in the doxygen-generated documentation

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (59/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications SMPI: Running MPI applications on top of SimGrid Conclusion
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (60/96)

MSG: Heuristics for Concurrent Sequential Processes

(historical) Motivation
Centralized scheduling does not scale SimDag (and its predecessor) not adapted to study decentralized heuristics MSG not strictly limited to scheduling, but particularly convenient for it

Main MSG abstractions

Agent: some code, some private data, running on a given host
set of functions + XML deployment le for arguments

Task: amount of work to do and of data to exchange

MSG task create(name, compute duration, message size, void *data) Communication: MSG task {put,get}, MSG task Iprobe Execution: MSG task execute MSG process sleep, MSG process {suspend,resume}

Host: location on which agents execute Mailbox: similar to MPI tags

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (61/96)

The MSG master/workers example: the worker

The master has a large number of tasks to dispatch to its workers for execution int worker(int argc, char *argv[ ]) {
m_task_t task; int id = atoi(argv[1]); char mailbox[80]; int errcode;

sprintf(mailbox,"worker-%d",id); while(1) { errcode = MSG_task_receive(&task, mailbox); xbt_assert0(errcode == MSG_OK, "MSG_task_get failed"); if (!strcmp(MSG_task_get_name(task),"finalize")) { MSG_task_destroy(task); break; } INFO1("Processing %s", MSG_task_get_name(task)); MSG_task_execute(task); INFO1("%s done", MSG_task_get_name(task)); MSG_task_destroy(task); } INFO0("Im done. See you!"); return 0; }
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (62/96)

The MSG master/workers example: the master

int master(int argc, char *argv[ ]) {
int number_of_tasks = atoi(argv[1]); double task_comm_size = atof(argv[3]); char mailbox[80]; int i; double task_comp_size = atof(argv[2]); int workers_count = atoi(argv[4]); char buff[64];

/* Dispatching (dumb round-robin algorithm) */ for (i = 0; i < number_of_tasks; i++) { sprintf(buff, "Task_%d", i); task = MSG_task_create(sprintf_buffer, task_comp_size, task_comm_size, NULL); sprintf(mailbox,"worker-%d",i % workers_count); INFO2("Sending %s to mailbox , task->name, mailbox); %s" MSG_task_send(task, mailbox); } /* Send finalization message to workers */ INFO0("All tasks dispatched. Lets stop workers"); for (i = 0; i < workers_count; i++) MSG_task_put(MSG_task_create("finalize", 0, 0, 0), workers[i], 12); INFO0("Goodbye now!"); return 0; }

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(63/96)

The MSG master/workers example: deployment le

Specifying which agent must be run on which host, and with which arguments XML deployment le
<?xml version=1.0?> <!DOCTYPE platform SYSTEM "[Link]"> <platform version="2">  <process host="Tremblay" function="master"> <argument value="6"/>  <argument value="50000000"/>  <argument value="1000000"/>  <argument value="3"/>  </process>  value="0"/></process> value="1"/></process> value="2"/></process>

</platform>

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(64/96)

The MSG master/workers example: the main()

Putting things together int main(int argc, char *argv[ ]) {

/* Declare all existing agent, binding their name to their function */ MSG_function_register("master", &master); MSG_function_register("worker", &worker); /* Load a platform instance */ MSG_create_environment("my_platform.xml"); /* Load a deployment file */ MSG_launch_application("my_deployment.xml"); /* Launch the simulation (until its end) */ MSG_main(); INFO1("Simulation took %g seconds",MSG_get_clock()); }

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(65/96)

The MSG master/workers example: raw output

[Tremblay:master:(1) 0.000000] [example/INFO] Got 3 workers and 6 tasks to process [Tremblay:master:(1) 0.000000] [example/INFO] Sending Task_0 to worker-0 [Tremblay:master:(1) 0.147613] [example/INFO] Sending Task_1 to worker-1 [Jupiter:worker:(2) 0.147613] [example/INFO] Processing Task_0 [Tremblay:master:(1) 0.347192] [example/INFO] Sending Task_2 to worker-2 [Fafard:worker:(3) 0.347192] [example/INFO] Processing Task_1 [Tremblay:master:(1) 0.475692] [example/INFO] Sending Task_3 to worker-0 [Ginette:worker:(4) 0.475692] [example/INFO] Processing Task_2 [Jupiter:worker:(2) 0.802956] [example/INFO] Task_0 done [Tremblay:master:(1) 0.950569] [example/INFO] Sending Task_4 to worker-1 [Jupiter:worker:(2) 0.950569] [example/INFO] Processing Task_3 [Fafard:worker:(3) 1.002534] [example/INFO] Task_1 done [Tremblay:master:(1) 1.202113] [example/INFO] Sending Task_5 to worker-2 [Fafard:worker:(3) 1.202113] [example/INFO] Processing Task_4 [Ginette:worker:(4) 1.506790] [example/INFO] Task_2 done [Jupiter:worker:(2) 1.605911] [example/INFO] Task_3 done [Tremblay:master:(1) 1.635290] [example/INFO] All tasks dispatched. Lets stop workers. [Ginette:worker:(4) 1.635290] [example/INFO] Processing Task_5 [Jupiter:worker:(2) 1.636752] [example/INFO] Im done. See you! [Fafard:worker:(3) 1.857455] [example/INFO] Task_4 done [Fafard:worker:(3) 1.859431] [example/INFO] Im done. See you! [Ginette:worker:(4) 2.666388] [example/INFO] Task_5 done [Tremblay:master:(1) 2.667660] [example/INFO] Goodbye now! [Ginette:worker:(4) 2.667660] [example/INFO] Im done. See you! [2.667660] [example/INFO] Simulation time 2.66766

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(66/96)

The MSG master/workers example: colorized output

$ ./my_simulator | MSG_visualization/[Link] [ 0.000][ Tremblay:master ] Got 3 workers and 6 tasks to process [ 0.000][ Tremblay:master ] Sending Task_0 to worker-0 [ 0.148][ Tremblay:master ] Sending Task_1 to worker-1 [ 0.148][ Jupiter:worker ] Processing Task_0 [ 0.347][ Tremblay:master ] Sending Task_2 to worker-2 [ 0.347][ Fafard:worker ] Processing Task_1 [ 0.476][ Tremblay:master ] Sending Task_3 to worker-0 [ 0.476][ Ginette:worker ] Processing Task_2 [ 0.803][ Jupiter:worker ] Task_0 done [ 0.951][ Tremblay:master ] Sending Task_4 to worker-1 [ 0.951][ Jupiter:worker ] Processing Task_3 [ 1.003][ Fafard:worker ] Task_1 done [ 1.202][ Tremblay:master ] Sending Task_5 to worker-2 [ 1.202][ Fafard:worker ] Processing Task_4 [ 1.507][ Ginette:worker ] Task_2 done [ 1.606][ Jupiter:worker ] Task_3 done [ 1.635][ Tremblay:master ] All tasks dispatched. Lets stop workers. [ 1.635][ Ginette:worker ] Processing Task_5 [ 1.637][ Jupiter:worker ] Im done. See you! [ 1.857][ Fafard:worker ] Task_4 done [ 1.859][ Fafard:worker ] Im done. See you! [ 2.666][ Ginette:worker ] Task_5 done [ 2.668][ Tremblay:master ] Goodbye now! [ 2.668][ Ginette:worker ] Im done. See you! [ 2.668][ ] Simulation time 2.66766

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(67/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

MSG bindings for Java: master/workers example

import [Link].*; public class BasicTask extends [Link] { public BasicTask(String name, double computeDuration, double messageSize) throws JniException { super(name, computeDuration, messageSize); } } public class FinalizeTask extends [Link] { public FinalizeTask() throws JniException { super("finalize",0,0); } } public class Worker extends [Link] { public void main(String[ ] args) throws JniException, NativeException { String id = args[0]; while (true) { Task t = [Link]("worker-" + id); if (t instanceof FinalizeTask) break; BasicTask task = (BasicTask)t; [Link]("Processing " + [Link]() + ""); [Link](); [Link]("" + [Link]() + " done "); } [Link]("Received Finalize. Im done. See you!"); } }
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (69/96)

MSG bindings for Java: master/workers example

import [Link].*; public class Master extends [Link] { public void main(String[ ] args) throws JniException, NativeException { int numberOfTasks = [Link](args[0]).intValue(); double taskComputeSize = [Link](args[1]).doubleValue(); double taskCommunicateSize = [Link](args[2]).doubleValue(); int workerCount = [Link](args[3]).intValue(); [Link]("Got "+ workerCount + " workers and " + numberOfTasks + " tasks.");

for (int i = 0; i < numberOfTasks; i++) { BasicTask task = new BasicTask("Task_" + i ,taskComputeSize,taskCommunicateSize); [Link]("worker-" + (i % workerCount)); [Link]("Send completed for the task " + [Link]() + " on the mailbox worker-" + (i % workerCount) + ""); } [Link]("Goodbye now!"); } }

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(70/96)

MSG bindings for Java: master/workers example

Rest of the story
XML les (platform, deployment) not modied No need for a main() function glueing things together
Java introspection mecanism used for this [Link] contains an adapted main() function Name of XML les must be passed as command-line argument

Output very similar too

What about performance XXX XXXworkers 100 500 XXX tasks X 1,000 native .16 .19 java .41 .59 10,000 native .48 .52 java 1.6 1.9 100,000 native 3.7 3.8 java 14. 13. 1,000,000 native 36. 37. java 121. 130.

loss?
1,000 .21 .94 .54 2.38 4.0 15. 38. 134. 5,000 .42 7.6 .83 13. 4.4 29. 41. 163. 10,000 0.74 27. 1.1 40. 4.5 77. 40. 200.

Small platforms: ok Larger ones: not quite. . .

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(71/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

Implementation of CSPs on top of simulation kernel

Idea
Each process is implemented in a thread Blocking actions (execution and communication) reported into kernel A maestro thread unlocks the runnable threads (when action done)

Example
Thread A:
Send toto to B Receive something from B
Maestro
Simulation Kernel:
whos next?

Thread A

Thread B

Send "toto" to B

Thread B:
Receive something from A Send blah to A
Receive from B

Receive from A

Maestro schedules threads

Order given by simulation kernel

Send "blah" to A (done) (done)

Mutually exclusive execution

(dont fear)
SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(73/96)

A Glance at SimGrid Internals

SMPI SMURF
SimIX network proxy

SimDag

MSG

GRAS

SimIX
POSIX-like API on a virtual platform

SURF
virtual platform simulator

XBT

SURF: Simulation kernel, grounding simulation

Contains all the models (uses GTNetS on need)

SimIX: Eases the writting of user APIs based on CSPs

Provided semantic: threads, mutexes and conditions on top of simulator

SMURF: Allows to distribute the simulation over a cluster (still to do)

Not for speed but for memory limit (at least for now)
More on SimGrid internals

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(74/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

Some Performance Results

Master/Workers on amd64 with 4Gb
#tasks 1,000 Context mecanism ucontext pthread java ucontext pthread java ucontext pthread java ucontext pthread java 100 0.16 0.15 0.41 0.48 0.51 1.6 3.7 4.7 14. 36. 42. 121. 500 0.19 0.18 0.59 0.52 0.56 1.9 3.8 4.4 13. 37. 44. 130. #Workers 1,000 5,000 0.21 0.42 0.19 0.35 0.94 7.6 0.54 0.83 0.57 0.78 2.38 13. 4.0 4.4 4.6 5.0 15. 29. 38. 41. 46. 48. 134. 163. 10,000 0.74 0.55 27. 1.1 0.95 40. 4.5 5.23 77. 40. 47. 200. 25,000 1.66

10,000

1.97

: #semaphores reached system limit (2 semaphores per user process,

100,000

5.5

System limit = 32k semaphores)

1,000,000

41.

Extensibility with UNIX contextes

#tasks 1,000 10,000 100,000 1,000,000 5,000,000 Stack size 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 128Kb 12Kb 25,000 1.6 0.5 2 0.8 5.5 3.7 41 33 206 161 #Workers 50,000 100,000 0.9 1.7 1.2 2 4.1 4.8 33.6 33.7 167 161 200,000 3.2 3.5 6.7 35.5 165

Scalability limit of GridSim

1 user process = 3 java threads
(code, input, output)

System limit = 32k threads at most 10,922 user processes

: out of memory
SimGrid Architecture and Features (76/96)

SimGrid for Research on Large-Scale Distributed Systems

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (77/96)

Goals of the GRAS project (Grid Reality And Simulation) Ease development of large-scale distributed apps
Development of real distributed applications using a simulator
Research
Code

Development

Research & Development

Code

rewrite

Code

GRAS
Simulation Application

API GRDK GRE 111 SimGrid 000 11 00

1 0

Without GRAS

With GRAS

Framework for Rapid Development of Distributed Infrastructure

Develop and tune on the simulator; Deploy in situ without modication How: One API, two implementations

Ecient Grid Runtime Environment (result = application = prototype)

Performance concern: ecient communication of structured data How: Ecient wire protocol (avoid data conversion) Portability concern: because of grid heterogeneity How: ANSI C + autoconf + no dependency
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (78/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (79/96)

Main concepts of the GRAS API

Agents (acting entities)
Code (C function) Private data Location (hosting computer)

Sockets (communication endpoints)

Server socket: to receive messages Client socket: to contact a server (and receive answers)

Messages (what gets exchanged between agents)

Semantic: Message type Payload described by data type description (xed for a given type)

Callbacks (code to execute when a message is received)

Also possible to explicitly wait for given messages
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (80/96)

Emulation and Virtualization

Same code runs without modication both in simulation and in situ
In simulation, agents run as threads within a single process In situ, each agent runs within its own process Agents are threads, which can run as separate processes

Emulation issues
How to get the process sleeping? How to get the current time?
System calls are virtualized: gras os time; gras os sleep

How to report computation time into the simulator?

Asked explicitly by user, using provided macros Time to report can be benchmarked automatically

What about global data?

Agent status placed in a specic structure, ad-hoc manipulation API

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(81/96)

Example of code: ping-pong (1/2)

Code common to client and server
#include "gras.h" XBT_LOG_NEW_DEFAULT_CATEGORY(test,"Messages specific to this example" ); static void register_messages(void) { gras_msgtype_declare("ping", gras_datadesc_by_name("int" )); gras_msgtype_declare("pong", gras_datadesc_by_name("int" )); }

Client code
int client(int argc,char *argv[ ]) { gras_socket_t peer=NULL, from ; int ping=1234, pong; gras_init(&argc, argv); gras_os_sleep(1); /* Wait for the server startup */ peer=gras_socket_client("[Link]",4000); register_messages(); gras_msg_send(peer, "ping", &ping); INFO3("PING(%d) -> %s:%d",ping, gras_socket_peer_name(peer), gras_socket_peer_port(peer)); gras_msg_wait(6000,"pong",&from,&pong); gras_exit(); return 0; }
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (82/96)

Example of code: ping-pong (2/2)

Server code
typedef struct { /* Global private data */ int endcondition; } server_data_t; int server (int argc,char *argv[ ]) { server_data_t *globals; gras_init(&argc,argv); globals = gras_userdata_new(server_data_t); globals->endcondition=0; gras_socket_server(4000); register_messages(); gras_cb_register("ping", &server_cb_ping_handler); while (!globals->endcondition) { /* Handle messages until our state change */ gras_msg_handle(600.0); /* Actually, one ping is enough for that */ } free(globals); gras_exit(); return 0; } int server_cb_ping_handler(gras_msg_cb_ctx_t ctx, void *payload_data) { server_data_t *globals = (server_data_t*)gras_userdata_get(); /* Get the globals */ globals->endcondition = 1; int msg = *(int*) payload_data; /* Whats the content? */ gras_socket_t expeditor = gras_msg_cb_ctx_from(ctx); /* Who sent it?*/ /* Send data back as payload of a pong message to the pings expeditor */ gras_msg_send(expeditor, "pong", &msg); return 0; }
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (83/96)

Exchanging structured data

GRAS wire protocol: NDR (Native Data Representation)
Avoid data conversion when possible: Sender writes data on socket as they are in memory If receivers architecture does match, no conversion Receiver able to convert from any architecture

GRAS message payload can be any valid C type

Structure, enumeration, array, pointer, . . . Classical garbage collection algorithm to deep-copy it Cycles in pointed structures detected & recreated

Describing a data type to GRAS

Manual description (excerpt)

Automatic description of vector

GRAS_DEFINE_TYPE(s_vect, struct s_vect { gras_datadesc_type_t gras_datadesc_struct(name); int cnt; gras_datadesc_struct_append(struct type,name,field type); double*data GRAS_ANNOTE(size,cnt); gras datadesc struct close(struct type); } );

C declaration stored into a char* variable to be parsed at runtime

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (84/96)

Agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches Tools for Experimentations in Large-Scale Distributed Systems Resource Models Analytic Models Underlying SimGrid SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes GRAS: Developing and Debugging Real Applications
Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (85/96)

Assessing communication performance

Only communication performance studied since computation are not mediated Experiment: timing ping-pong of structured data (a message of Pastry)
typedef struct { int id, row_count; double time_sent; row_t *rows; int leaves[MAX_LEAFSET]; } welcome_msg_t; typedef struct { int which_row; int row[COLS][MAX_ROUTESET]; } row_t ;

Tested solutions
GRAS PBIO (uses NDR) OmniORB (classical CORBA solution) MPICH (classical MPI solution) XML (Expat parser + handcrafted communication)

Platform: x86, PPC, sparc (all under Linux)

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(86/96)

Performance on a LAN
Sender: ppc
10
-2

22.7ms 10
-2

sparc
7.7ms 3.9ms 2.4ms 10-3

40.0ms

x86
10
-2

8.2ms 4.3ms

17.9ms

3.1ms 10-3

5.4ms

ppc
Receiver

10-3

0.8ms

10-4

n/a GRAS MPICH OmniORB PBIO 26.8ms XML 42.6ms

10-4

n/a GRAS MPICH OmniORB PBIO XML 55.7ms

10-4

n/a

n/a XML 38.0ms 20.7ms

GRAS MPICH OmniORB PBIO

-2

6.3ms 1.6ms

-2

4.8ms 2.5ms

7.7ms 7.0ms

-2

5.7ms

6.9ms

sparc

10-3

10-4

n/a GRAS MPICH OmniORB PBIO XML

10-4

GRAS MPICH OmniORB PBIO

XML 34.3ms

10-4

n/a GRAS MPICH OmniORB PBIO XML

18.0ms 10-2 3.4ms 5.2ms 10-2 2.9ms 10-3 5.4ms 5.6ms 10-2 2.3ms 10-3 3.8ms 2.2ms

12.8ms

x86

10-3

0.5ms

10-4

n/a

n/a XML

GRAS MPICH OmniORB PBIO

10-4

n/a GRAS MPICH OmniORB PBIO XML

10-4

GRAS MPICH OmniORB PBIO

XML

MPICH twice as fast as GRAS, but cannot mix little- and big-endian Linux PBIO broken on PPC XML much slower (extra conversions + verbose wire encoding)

GRAS is the better compromise between performance and portability

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (87/96)

Assessing API simplicity

Experiment: ran code complexity measurements on code for previous experiment
McCabe Cyclomatic Complexity Number of lines of code GRAS 8 48 MPICH 10 65 PBIO 10 84 OmniORB 12 92 XML 35 150

Results discussion
XML complexity may be artefact of Expat parser (but fastest) MPICH: manual marshaling/unmarshalling PBIO: automatic marshaling, but manual type description OmniORB: automatic marshaling, IDL as type description GRAS: automatic marshaling & type description (IDL is C)

Conclusion GRAS is the least demanding solution from developer perspective

SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (88/96)

Conclusion: GRAS eases infrastructure development

SimDag MSG SMPI SMURF
SimIX network proxy

GRAS
GRE: GRAS in situ

Research & Development

SimIX
POSIX-like API on a virtual platform

11111111 00000000 API 11111111 00000000 GRDK GRE 1111 0000 SimGrid 1111 0000
With GRAS

Code

SURF
virtual platform simulator

XBT

GRDK: Grid Research & Development Kit

API for (explicitly) distributed applications Study applications in the comfort of the simulator

GRE: Grid Runtime Environment

Ecient: twice as slow as MPICH, faster than OmniORB, PBIO, XML Portable: Linux (11 CPU archs); Windows; Mac OS X; Solaris; IRIX; AIX Simple and convenient:
API simpler than classical communication libraries (+XBT tools) Easy to deploy: C ANSI; no dependency; autotools; <400kb
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (89/96)

GRAS perspectives Future work on GRAS

Performance: type precompilation, communication taming and compression GRASPE (GRAS Platform Expender) for automatic deployment Model-checking as third mode along with simulation and in-situ execution
Details

Ongoing applications
Comparison of P2P protocols (Pastry, Chord, etc) Use emulation mode to validate SimGrid models Network mapper (ALNeM): capture platform descriptions for simulator Large scale mutual exclusion service

Future applications
Platform monitoring tool (bandwidth and latency) Group communications & RPC; Application-level routing; etc.

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(90/96)

SimGrid for Research on Large-Scale Distributed Systems

SimGrid Architecture and Features

(91/96)

SMPI: Running MPI applications on top of SimGrid

Motivations
Reproducible experimentation of MPI code (debugging) Test MPI code on still-to-build platform (dimensioning)

How it works
smpicc changes MPI calls into SMPI ones (gettimeofday also intercepted) smpirun starts a classical simulation obeying -hostfile and -np Runs unmodied MPI code after recompilation

Implemented calls
Isend; Irecv. Recv; Send. Wait; Waitall; Waitany. Barrier; Bcast; Reduce; Allreduce (cmd line option to choose binary or at tree) Comm size; Comm rank; Comm split. Wtime. Init; Finalize; Abort.

Future Work
Implement the rest of the API Test it more througfully Use it to validate SimGrid at application level (with NAS et Al.)
SimGrid for Research on Large-Scale Distributed Systems SimGrid Architecture and Features (92/96)

SimGrid for Research on Large-Scale Distributed Systems

Conclusion

(93/96)

Conclusions on Distributed Systems Research

Research on Large-Scale Distributed Systems
Reexion about common methodologies needed (reproductible results needed) Purely theoritical works limited (simplistic settings NP-complete problems) Real-world experiments time and labor consuming; limited representativity Simulation appealing, if results remain validated

Simulating Large-Scale Distributed Systems

Packet-level simulators too slow for large scale studies Large amount of ad-hoc simulators, but discutable validity Coarse-grain modelization of TCP ows possible (cf. networking community) Model instantiation (platform mapping or generation) remains challenging

SimGrid provides interesting models

Implements non-trivial coarse-grain models for resources and sharing Validity results encouraging with regard to packet-level simulators Several orders of magnitude faster than packet-level simulators Several models availables, ability to plug new ones or use packet-level sim.
SimGrid for Research on Large-Scale Distributed Systems Conclusion (94/96)

SimGrid provides several user interfaces

SimDag: Comparing Scheduling Heuristics for DAGs of (parallel) tasks
Declare tasks, their precedences, schedule them on resource, get the makespan

MSG: Comparing Heuristics for Concurrent Sequential Processes

Declare independent agents running a given function on an host Let them exchange and execute tasks Easy interface, rapid prototyping, Java bindings New in SimGrid v3.3.1: Trace-driven simulations

GRAS: Developing and Debugging Real Applications

Develop once, run in simulation or in situ (debug; test on non-existing platforms) Resulting application twice slower than MPICH, faster than omniorb Highly portable and easy to deploy

SMPI: Running MPI applications on top of SimGrid (new in 3.3.1)

Runs unmodied MPI code after recompilation (still partial implementation)

Other interfaces possible: OpenMP, BSP-like (any volunteer?)

SimGrid for Research on Large-Scale Distributed Systems Conclusion (95/96)

SimGrid is an active and exciting project

Future Plans
Improve usability
(statistics tools, campain management)
SimDag MSG SMPI SMURF
SimIX network proxy

GRAS
GRE: GRAS in situ

Extreme Scalability for P2P Model-checking of GRAS applications Emulation solution ` la MicroGrid a

SimIX
POSIX-like API on a virtual platform

SURF
virtual platform simulator

XBT

Large community
[Link] 130 subscribers to the user mailling list (40 to -devel) 40 scientic publications using the tool for their experiments
15 co-signed by one of the core-team members 25 purely external

LGPL, 120,000 lines of code (half for examples and regression tests) Examples, documentation and tutorials on the web page

Use it in your works!

SimGrid for Research on Large-Scale Distributed Systems Conclusion (96/96)

Detailed agenda
Distributed Systems Experiments Methodological Issues Main Methodological Approaches
Real-world experiments Simulation

Appendix (extra material)

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work SimGrid Internals SURF
Big Picture Models How Models get used Actions and Resources Writing your own model Adding new kind of models

Tools for Experimentations in Large-Scale Distributed Systems

Possible designs Experimentation platforms: Grid5000 and PlanetLab Emulators: ModelNet and MicroGrid Packet-level Simulators: ns-2, SSFNet and GTNetS Ad-hoc simulators: ChicagoSim, OptorSim, GridSim, . . . Peer to peer simulators SimGrid

Resource Models Analytic Models Underlying SimGrid

Modeling a Single Resource Multi-hop Networks Resource Sharing

Simix
Big picture

Global Elements Simix Process

SimGrid Architecture and Features Overview of the SimGrid Components SimDag: Comparing Scheduling Heuristics for DAGs MSG: Comparing Heuristics for Concurrent Sequential Processes
Motivations, Concepts and Example of Use Java bindings A Glance at SimGrid Internals Performance Results

GRAS: Developing and Debugging Real Applications

Motivation and project goals Functionalities Experimental evaluation (performance and simplicity) Conclusion and Perspectives

SMPI: Running MPI applications on top of SimGrid Conclusion

SimGrid for Research on Large-Scale Distributed Systems

Conclusion

(97/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(98/96)

Model-checking GRAS application (ongoing work)

Executive Summary

Motivation
GRAS allows to debug an application on simulator and deploy it when it works Problem: when to decide that it works?
Demonstrate a theorem conversion to C dicult Test some cases may still fail on other cases

Model-checking
Given an initial situation (we have three nodes), test all possible executions (A gets rst message rst, B does, C does, . . . ) Combinatorial search in the tree of possibilities Fight combinatorial explosion: cycle detection, symmetry, abstraction

Model-checking in GRAS
First diculty: Checkpoint simulated processes (to rewind simulation) Induced diculty: Devise when to checkpoint processes Second diculty: Fight against combinatorial explosion
SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid
Back

(99/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(100/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(101/96)

Formal methods

Goal: Develop safe software using automated methods

Strong mathematical background Safe respect some given properties

Kind of properties shown

Safety: the car does not start without the key Liveness: if I push the break paddle, the car will eventually stop

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(102/96)

Existing Formal Methods

Algorithmic Verification Proof ModelChecking

Proof of programs
In theory, applicable to any class of program In practice, quite tedious to use often limited to help a specialist doing the actual work (system state explosion)

Model-checking
Shows that a system:
(safety) never evolves to a faulty state from a given initial state (liveness) always evolve to the wanted state (stopping) from a given state (breaking)

Less generic than proof: lack of faulty states for all initial state? Usable by non-specialists (at least, by less-specialists)

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(103/96)

Example of problem to detect: Race Condition

x is a. b. c. a shared variable; Alice adds 2, Bob adds 5; Correct result : x = 7 Read the value of shared variable x and store it locally Modify the local value (add 5 or 2) Propagate the local variable into the shared one Execution of Alice then Bob or opposite: result = 7 Interleaved execution: result = 2 or 5 (depending on last propagator) Model-checking: traverse graph of executions checking for properties
x: 2 A:{c,2} B:{ , } x: 0 A:{b,2} B:{a,0} x: 0 A:{a,0} B:{b,5} x: 5 A:{ , } B:{c,5} B x: 2 A:{c,2} B:{a,2} x: 2 A:{c,2} B:{a,0} x: 0 A:{b,2} B:{b,5} x: 5 A:{a,0} B:{c,5} x: 5 A:{a,5} B:{c,5} B x: 2 A:{c,2} B:{b,7} x: 2 A:{c,2} B:{b,5} x: 5 A:{b,2} B:{c,5} x: 5 A:{b,7} B:{c,5} B

x: 0 A:{ , } B:{ , }

A B

x: 0 A:{a,0} B:{ , } x: 0 A:{ , } B:{a,0}

A B A B

x: 0 A:{b,2} B:{ , } x: 0 A:{a,0} B:{a,0} x: 0 A:{ , } B:{b,5}

A B A B A B

A B A B

B A B A

x: 5 x: 7

x: 2 A

Safety: assertions on each node

SimGrid for Research on Large-Scale Distributed Systems

Liveness by studying graph (cycle?)

Bonus: Model-Checking in SimGrid
Back

(104/96)

Model-Checking Big Picture

1. User writes Model (formal writing of algorithm) and Specication (set of properties) 2. Each decision point in model (if, input data) a branch in model state space 3. Check safety properties on each encountered node (state) 4. Store encountered nodes (to avoid looping) and transitions (to check liveness) 5. Process until:
State space completely traversed ( model veried against this specication) One of the property does not hold (the path until here is a counter-example) We run out of resource (state space explosion)

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(105/96)

Classical eld of application

Concurrent systems (ie, multithreaded; shared memory)
Race condition: the result depends on execution order Deadlock: innite wait between processes Starvation: one process prevented from using a free resource

Project goal: Extend to distributed systems

Very little work done in this area Collaboration with Mosel team of INRIA
SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid
Back

(106/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(107/96)

Adding Model-Checking to SimGrid

Diculties in Distributed System
Race condition, Deadlock and Starvation, just as in concurrent algorithms Lack of global state: only local information available Asynchronism: no bound on communication time hard to detect failures Model-checker for distributed algorithms appealing

But wait a minute...

Wasnt the simulator meant to test distributed algorithm already?!
Simulation is better than real deployment because it is deterministic But possibly very low code coverage Model-Checking improves this, and provides counter-examples Simulation to assess performance, Model-checking to assess correctness

Do not merge 2 tools in 1 and KISS instead!

Avoid manual translation between formalisms to avoid introduction of errors Simulator and model-checker both need to:
Simulate of the environment (processes, network, messages) Control over the scheduling of the processes Intercept the communication
SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid
Back

(108/96)

Identifying state transitions in SimGrid

Main MC challenge
Concurrent systems lead to many interleavings Rapidly lead to state space explosion
(dont even dream of model-checking a system with as much as 10 processes)

Good news in our context:

Processes state spaces are isolated; interaction only through message passing
partial history, think of Lamports clocks

No need to consider all possible instruction interleavings massive state space reduction is possible (but open research question)
Maestro Thread A Thread B

Considered transitions in SimGrid

Messages send/receive only Coincide with simulation scheduling points!
See it again

Simulation Kernel:
whos next?

Send "toto" to B

Receive from A

Receive from B

Send "blah" to A

Model-Checking logic can go in the maestro

SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid

(done)
Back

(done)

(109/96)

Traversing the State Graph

for (i=0; i<slave_count; i++) if (power[i] > 12) send(big_task); else send(small_task);

How to follow several execution paths?

Need to save & restore the state in order to rewind the application Most existing tool work by somehow interpreting a DSL Other are state-less (rerun the app from the start) In C, we need system support

Checkpointing threads
Intercept memory allocation functions; use a special dynamic memory manager
#define malloc(a) my malloc(a) /* and friends */ Deroute heap in separate memory segments (with shm ops)

Also save/restore the stack (memcpy) and registers (setjmp) of the processes
SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid
Back

(110/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: Model-Checking in SimGrid

Back

(111/96)

Current Status and Future Work

We implemented a rst prototype
Works for simple unmodied C GRAS programs Checks safety properties (C boolean functions; limited by C scoping rules) Simple state saving/restoring mechanism (process-wide) Simple depth-rst search exploration (no search heuristic yet)

Current Work
Isolating each simulated processs address space Separating the network in a separated address space Support the other SimGrid APIs

Future Work
Exploit the heap symmetries (heap canonicalization) Implement partial order reductions Verication of LTL properties
SimGrid for Research on Large-Scale Distributed Systems Bonus: Model-Checking in SimGrid
Back

(112/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(113/96)

SimGrid Internals
Some Numbers
SimDag MSG SMPI SMURF
SimIX network proxy

GRAS
GRE: GRAS in situ

v3.3 is 120k sloc (w/o blanks; with comments) (Core lib: 47k; Tests: 69k; Doc:3.5k; +Build) v3.2 was 55k sloc (Core lib: 30k; Tests: 21k; Doc:3k; +Build)
SURF

SimIX
POSIX-like API on a virtual platform

virtual platform simulator

XBT

SimGrid is quite strictly layered (and built bottom-up)

XBT: Extensive toolbox
Logs, exceptions, backtraces, ADT, strings, conguration, testing; Portability

SURF: Simulation kernel, grounding simulation

Main concepts: Resources (providing power) and actions (using resources)

SimIX: Eases the writing of user APIs based on CSPs

Adds concepts: Processes (running user code), plus mutex and conditions for synchro

User interfaces: adds API-specic concepts

(gras data description or MSG mailboxes; SimDag dierent: directly on top of SURF)

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(114/96)

Agenda
Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work SimGrid Internals SURF
Big Picture Models How Models get used Actions and Resources Writing your own model Adding new kind of models

Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(115/96)

SURF
Big picture
Resources provide power; created through the XML platform le Actions are consumption of this power; created by upper layers Designed with model extensibility and simulation eciency in mind
object oriented, but not in a canonical way

Models
They express how resources get consumed by actions Act as class for action objects (heavy use of function pointers), and as fabric Several kind of models exist (one of each kind is in use during simulation)
CPU model: fabric of compute and sleep actions Link model: fabric of communicate actions Workstation model: aggregation pattern of CPU and link Routing: not exactly a model; provide the list of links between two hosts

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(116/96)

Models class diagram

class surf_model_t xbt_dict_t resource_set; char* name; xbt_swag_t states.ready_action_set; xbt_swag_t states.running_action_set; xbt_swag_t states.failed_action_set; xbt_swag_t states.done_action_set; void action_state_get(surf_action act) void action_state_set(surf_action act) virtual void action_get_start_time(surf_action act) void action_get_finish_time(surf_action act) void action_use(surf_action act) void action_free(surf_action act) virtual void action_cancel(surf_action act) void action_recycle(surf_action act) void action_data_set(surf_action act) void action_suspend(surf_action act) static void resource_used() static void share_resources() static void update_actions_state() static void update_resource_state() void finalize()

class surf_routing_t

void get_route(Integer src, Integer dst)

class surf_model_timer_t : public surf_model_t

class surf_model_cpu_t : public surf_model_t

class surf_model_network_t : public surf_model_t surf_routing_t routing;

void void (*set) (double date, void *function, void *arg)() void int (*get) (void **function, void **arg)()

static surf_action new_compute() static surf_action new_sleep() double get_cpu_speed(void* cpu) Boolean get_cpu_state(void* cpu)

static surf_action new_communicate() void get_link_bandwidth(void* link) void get_link_latency(void* link) void get_link_state(void* link)

class surf_model_workstation_t : public surf_model_cpu_t, public surf_model_network_t

static surf_action new_parallel_task()

Existing models (implement the abstract methods)

Timer: only basic one CPU: Cas01 (regular maxmin) Link: CM02, LV08, Reno, Vegas; Constant, GTNetS Workstation: workstation, L07
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(117/96)

How are these models used in practice?

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
11 00 11 00 11 00 11 00 11 00

111 000 1 0 11 00 111 000 1 0 11 00 111 000 111 000 1 0 11 00 111111 000000 1 0 11 00

11 00 1 0 11 00 1 0
Bonus: SimGrid Internals

Simulated time
Back

SimGrid for Research on Large-Scale Distributed Systems

(118/96)

Adding Dynamic Availabilities to the Picture

Trace denition
List of discrete events where the maximal availability changes t0 100%, t1 50%, t2 80%, etc.

Adding traces doesnt change kernel main loop

Availability changes: simulation events, just like action ends

111 000 111 000 111 1 000 0 1111 1 0000 0 1111 1 0000 0 11111 00000 11111 00000
SimGrid for Research on Large-Scale Distributed Systems

11 11111 00 00000 11 111 00 000 11111 00000 111111 11 000000 00 111 000 11 111 00 000 111111 000000 11 1111111 00 0000000 11 00 111 000
Bonus: SimGrid Internals

Simulated time

SimGrid also accept state changes (on/o)

Back

(119/96)

SURF main function

double surf solve(void)
recall /* Search next action to end (look for min date over all models) */ xbt_dynar_foreach(model_list, iter, model) { /* check on model "model->name" */ model_next_action_end = model->model_private->share_resources(NOW); if (min < 0.0 || model_next_action_end < min) min = model_next_action_end; } if (min < 0.0) return -1.0; /* no planned action end => simulations over */

/* Handle every events occurring before min */ while ((next_event_date = tmgr_history_next_date(history)) != -1.0) { if (next_event_date > NOW + min) break; /* no further event before min */ /* apply event by updating models */ while((evt=tmgr_history_get_next_event_leq(history, next_event_date, &value, &resource))){ if (resource->model->model_private->resource_used(resource)) min = next_event_date - NOW; /* evt changes a resource currently used. Change min */
recall

/* update state of the model according to event */ resource->model->model_private->update_resource_state(resource, evt, value, NOW + min); } } NOW = NOW + min; /* Increase the simulation clock (NOW is returned by SURF_get_clock() ) */

/* Ask models to update the state of actions they are responsible for according to the clock * xbt_dynar_foreach(model_list, iter, model) model->model_private->update_actions_state(NOW, min);
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(120/96)

All the solving logic of models lies in 4 functions

share resource: compute the sharing between actions on every resource resource used: whether action actively using the resource (sleep ones never do) update resource state: apply trace events update action state: reduce actions needs by what they just got

the hard thing is about sharing Most models use a Linear MaxMin solver (lmm) to compute sharing
Other binds to external tool (gtnets), or have no sharing (constant, timer) Comes down to a linear system where actions are variable and resource constants Disclamer: I only partially understand lmm internals for now

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(121/96)

Actions and Resources

lmm resources
lmm constraint representing that resource in the system state (on/o) & handler of next state trace event power (+latency for network) & handler of next power trace event

lmm actions
lmm variable allowing the system to return the share gained by that action boolean indicating whether its currently suspended

But you are free to do otherwise

Example: constant network
When you start a communication, it will last a constant amount of seconds No sharing, so no need for link resource at all! update action state simply deduce the time delta to each remaining time

Other example: GTNetS

Simply wrap the calls to the external solver No internal intelligence
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(122/96)

Parsing platform les (instantiating the resources)

We use FleXML as XML parser generator
Write a ex le from DTD, converted in fast, dependence-free C parser Drawback: cannot cope with stu not in the DTD

Generated parser are SAX-oriented (not DOM-oriented)

You add callbacks to event lists. Naming schema:
<tag > STag surfxml tag cb list; </tag > Ex: <host> STag surfxml host cb list; ETag surfxml tag cb list

This is done (usually during init) with surfxml add callback Ex: surfxml add callback(STag surfxml host cb list, &parse host) Attributes accessible through globals: A surfxml tag attrname Ex: host->power = sscanf("%f",A surfxml host power)

Using such callbacks, models should:

Parse their resource and store them in [Link] set dictionary But thats optional: Cste network ignore links; GTNetS dont store them itself
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(123/96)

Writing your own model

What you should write (recap)
Action and resource datastructures Models methods:
Actions constructors (depending on whether a network or cpu model) Actions getter/setters The 4 methods of the sharing logic Finalize

Parser callbacks Plug your model in surf config.c so that users can select it from cmd line Possibly update the DTD to add the new info you need at instanciation

Guidelines
Reusing the existing is perfectly ne (everything is in there for lmm based models more to come) Please do not dupplicate code (as too often done till now) (at least if you want to get your code integrated in the SVN)
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(124/96)

Object Orientation of Actions and Resources

typedef struct { surf_model_t model; char *name; xbt_dict_t properties; } s_surf_resource_t, *surf_resource_t; typedef struct { s_surf_resource_lmm_t lmm_resource; double lat_current; tmgr_trace_event_t lat_event; } s_link_CM02_t, *link_CM02_t; typedef struct { double current; double max; tmgr_trace_event_t event; } s_surf_metric_t; typedef struct { s_surf_resource_t generic_resource; lmm_constraint_t constraint; e_surf_resource_state_t state_current; tmgr_trace_event_t state_event; s_surf_metric_t power; } s_surf_resource_lmm_t, *surf_resource_lmm_t;

link CM02 t are pointers to struct

(like every SimGrid datatype ending with t and not beginning with s )

They can be casted to lmm resources or generic ones

we can reuse generic implementation of services

Warning: there is no security here

SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(125/96)

(purists, please forgive)

typedef struct { double current; double max; tmgr_trace_event_t event; } s_surf_metric_t; typedef struct { s_surf_resource_t generic_resource; lmm_constraint_t constraint; e_surf_resource_state_t state_current; tmgr_trace_event_t state_event; s_surf_metric_t power; } s_surf_resource_lmm_t, *surf_resource_lmm_t;

surf_resource_t surf_resource_new(size_t childsize, surf_model_t model, char *name, xbt_dict_t p surf_resource_t res = xbt_malloc0(childsize); res->model = [...] return res; } surf_resource_lmm_t surf_resource_lmm_new(size_t childsize, /* for superclass */ surf_model_t model, char *name, xbt_dict_t props, /* our args */ [...]) surf_resource_lmm_t res = (surf_resource_lmm_t)surf_resource_new(childsize,model,name,props); res->constraint = [...] return res; } link_CM02_t CM02_link_new([...]) { link_CM02_t res = (link_CM02_t) surf_resource_lmm_new(sizeof(s_link_CM02_t), [...]); [...] } Back
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals

(126/96)

Adding new kind of models

Motivation
Disk is still missing in SimGrid You may want to model memory (although thats probably a bad idea) Everybody loves getting higher in abstraction (at least at university)

Sorry no easy path for that Whats to do?

Add a new specic section in surf model t (easy) Update the workstation model (because it aggregates other models quite easy)
careful, there is 2 workstation models: raw aggregator and L07 for parallel tasks

Allow upper lever to create specic actions to your model kind

That will be quite dicult and very long You need to change almost everything (at least augment it) MSG task disk(disk, 50) SIMIX action create diskuse(...) surf model disk->[Link] disk()

We may be able to ease this task too (but is there a real need?)
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(127/96)

Conclusion on SURF internals

Big picture
Resources (XML created) provide power to actions (upper layer created) Design goal include eectiveness and extensibility (conceptual purity? Erm, no)

Extending it
It is fairly easy to add new models for existing resource kinds It is quite long (and somehow dicult) to add new resource kinds

Future work
Ongoing cleanups not completely nished
Still some dupplicated code Resource power not handled consistantly accros models

New models:
Highly scalable ones in USS-SimGrid project Compound ones where CPU load reduce communication abilities Multi-core (pick your favorite one)
SimGrid for Research on Large-Scale Distributed Systems Bonus: SimGrid Internals
Back

(128/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix

Big picture

Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(129/96)

Simix

SimIX provides a posix-like interface for writing user level APIs

Virtualized processes that runs user level code Virtualized hosts that run the processes Synchronization primitives that rely in the surf layer
mutex condition variables

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(130/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(131/96)

Global Elements

Simix state is contained in the following global data structure: typedef struct SIMIX_Global { smx_context_factory_t context_factory; xbt_dict_t host; xbt_swag_t process_to_run; xbt_swag_t process_list; xbt_swag_t process_to_destroy; smx_process_t current_process; smx_process_t maestro_process; ... };

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(132/96)

Simix Main Loop

Simix main loop

a processes (from process to run list) Execute it

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(133/96)

Agenda

Model-Checking within SimGrid Introduction to Model-Checking Adding Model-Checking to SimGrid Current Status and Future Work

SimGrid Internals SURF Simix Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(134/96)

Simix Process 1
The process is the a central element in Simix. It is represented by the following datastructure: struct s_smx_process { ... char *name; smx_host_t smx_host; smx_context_t context; ex_ctx_t *exception; int blocked : 1; int suspended : 1; int iwannadie : 1; smx_mutex_t mutex; smx_cond_t cond; xbt_dict_t properties; void *data; };

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(135/96)

Simix Process 2

Simix keeps for each process: an execution context an exception container its running state (blocked, suspended, killed) pointers to the mutex or condition vars where the process is waiting user level data provided by the user

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(136/96)

Context
The Simix contexts are an abstraction of the execution state of a process plus an interface for controlling them.

Each context is composed of:

a pointer to the code (main function of the process associated) an execution stack the state of the registers some functions for scheduling/unscheduling the context There are 3 implementations of the context interface (factories) a pthread based one a ucontext based one a java based one (uses jprocess)

SimGrid for Research on Large-Scale Distributed Systems

Bonus: SimGrid Internals

Back

(137/96)

Simix
Big picture

Global Elements Simix Process

SimGrid for Research on Large-Scale Distributed Systems

Distributed Systems Experiments

(138/96)

SimGrid: Distributed Systems Research
No ratings yet
SimGrid: Distributed Systems Research
142 pages
SimGrid for Distributed Systems Research
No ratings yet
SimGrid for Distributed Systems Research
228 pages
Large Scale Distributed Simulation On The Grid
No ratings yet
Large Scale Distributed Simulation On The Grid
8 pages
Simulation For Grid Computing: Henri Casanova Univ. of California, San Diego Casanova@cs - Ucsd.edu
No ratings yet
Simulation For Grid Computing: Henri Casanova Univ. of California, San Diego Casanova@cs - Ucsd.edu
66 pages
LHC Data Simulation & Monitoring
No ratings yet
LHC Data Simulation & Monitoring
8 pages
Scalable Network Simulation Guide
No ratings yet
Scalable Network Simulation Guide
12 pages
Unit - V
No ratings yet
Unit - V
40 pages
Virtual Simulation Design Guide
No ratings yet
Virtual Simulation Design Guide
51 pages
Teaching Distributed Systems Effectively
No ratings yet
Teaching Distributed Systems Effectively
27 pages
GridSim Toolkit for Grid Simulation
No ratings yet
GridSim Toolkit for Grid Simulation
20 pages
Exp-1 CN
No ratings yet
Exp-1 CN
7 pages
M S 01 Intro Intro Motivation
No ratings yet
M S 01 Intro Intro Motivation
13 pages
Distributed Systems Architecture and Models
No ratings yet
Distributed Systems Architecture and Models
58 pages
Unit 3
No ratings yet
Unit 3
94 pages
Semsim Paper Presentation
No ratings yet
Semsim Paper Presentation
39 pages
A Survey On Wireless Network Simulators
No ratings yet
A Survey On Wireless Network Simulators
8 pages
Edb 0
No ratings yet
Edb 0
12 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
8 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
8 pages
ICCCNT
No ratings yet
ICCCNT
5 pages
1.FDPGridComputing An Introduction
No ratings yet
1.FDPGridComputing An Introduction
19 pages
System Modelling 1
No ratings yet
System Modelling 1
24 pages
Simulations On Computer Network An Improved Study in The Simulator Methodologies and Their Applications
No ratings yet
Simulations On Computer Network An Improved Study in The Simulator Methodologies and Their Applications
3 pages
ADSU1 VFTVF25 VF
No ratings yet
ADSU1 VFTVF25 VF
118 pages
50 Application Areas of Simulation and Modeling University Assignment
No ratings yet
50 Application Areas of Simulation and Modeling University Assignment
6 pages
Powerpoint - Simulation
No ratings yet
Powerpoint - Simulation
20 pages
Operating Systems Analysis and Advances
No ratings yet
Operating Systems Analysis and Advances
5 pages
Engineers' Guide to MBSE & Simulation
No ratings yet
Engineers' Guide to MBSE & Simulation
18 pages
Date: Study of Network Simulator Tools
No ratings yet
Date: Study of Network Simulator Tools
11 pages
Lecture 001 Basic Simulation Modeling
No ratings yet
Lecture 001 Basic Simulation Modeling
19 pages
Software Tools, Techniques and Architectures For Computer Simulation
No ratings yet
Software Tools, Techniques and Architectures For Computer Simulation
3 pages
DS Lec01
No ratings yet
DS Lec01
32 pages
Diseño y Arquitectura de Sistemas Distribuidos
No ratings yet
Diseño y Arquitectura de Sistemas Distribuidos
111 pages
Distributed System
No ratings yet
Distributed System
18 pages
Multicosim: A Python-Based Multi-Fidelity Co-Simulation Framework
No ratings yet
Multicosim: A Python-Based Multi-Fidelity Co-Simulation Framework
11 pages
Introduction To Modeling and Simulation
0% (1)
Introduction To Modeling and Simulation
13 pages
DS Mod 1
No ratings yet
DS Mod 1
44 pages
Future Generation Computer Systems: Dan Chen Lizhe Wang Samee U. Khan Joanna Kołodziej
No ratings yet
Future Generation Computer Systems: Dan Chen Lizhe Wang Samee U. Khan Joanna Kołodziej
9 pages
Scalability and Heterogeneity: Colin Perkins
No ratings yet
Scalability and Heterogeneity: Colin Perkins
25 pages
Lecture02 PDF
No ratings yet
Lecture02 PDF
25 pages
DS - Lec01 - The Completed One
No ratings yet
DS - Lec01 - The Completed One
38 pages
Communication Network Simulation Guide
No ratings yet
Communication Network Simulation Guide
53 pages
Ewald 2006 Diplomarbeit
No ratings yet
Ewald 2006 Diplomarbeit
126 pages
3-Types of Distributed Systems
No ratings yet
3-Types of Distributed Systems
14 pages
Ds 01
No ratings yet
Ds 01
41 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Introduction 1
No ratings yet
Introduction 1
7 pages
Practical Part 2
No ratings yet
Practical Part 2
143 pages
FPGA-Accelerated Simulation of Computer Systems
No ratings yet
FPGA-Accelerated Simulation of Computer Systems
82 pages
Unit 1
No ratings yet
Unit 1
9 pages
Cluster Computing in Large Scale Simulation
No ratings yet
Cluster Computing in Large Scale Simulation
9 pages
Graphic Era University: "Cloudsim Toolkit
No ratings yet
Graphic Era University: "Cloudsim Toolkit
11 pages
Computer Network Simulation With ns-3 A Systematic
No ratings yet
Computer Network Simulation With ns-3 A Systematic
25 pages
1244 200 Set1SimulationModelling
No ratings yet
1244 200 Set1SimulationModelling
15 pages
Chapter 1 Computer Simulation Approach
No ratings yet
Chapter 1 Computer Simulation Approach
24 pages
Assignment 4 Parallel 2
No ratings yet
Assignment 4 Parallel 2
6 pages
Parallel and Distributed Simulation Systems Wiley Series On Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Simulation Systems Wiley Series On Parallel and Distributed Computing
162 pages
WSN Protocol Layers & MAC Protocols
No ratings yet
WSN Protocol Layers & MAC Protocols
12 pages
Synopsys Payroll MGMT
No ratings yet
Synopsys Payroll MGMT
25 pages
Computer Forensics Tools Analysis
No ratings yet
Computer Forensics Tools Analysis
7 pages
Assignments - PR 101 - Basic - Java
No ratings yet
Assignments - PR 101 - Basic - Java
11 pages
Certified Ethical Hacker v10 Overview
No ratings yet
Certified Ethical Hacker v10 Overview
35 pages
Learn PHP in 5 Days by Darshan Sanandiya
100% (1)
Learn PHP in 5 Days by Darshan Sanandiya
238 pages
BASYS2 FPGA Board User Manual
No ratings yet
BASYS2 FPGA Board User Manual
8 pages
Power Query Data Transformation Guide
No ratings yet
Power Query Data Transformation Guide
21 pages
Computer Memory Basics
No ratings yet
Computer Memory Basics
6 pages
MCITP Training for Windows Server 2008
No ratings yet
MCITP Training for Windows Server 2008
2 pages
Exporting - Importing An Excel Spreadsheet To Primavera
No ratings yet
Exporting - Importing An Excel Spreadsheet To Primavera
3 pages
2210 s15 Ms 22
No ratings yet
2210 s15 Ms 22
6 pages
Error Worksheet
100% (1)
Error Worksheet
4 pages
Database Management System New
No ratings yet
Database Management System New
72 pages
Sed, A Stream Editor - by Ken Pizzini, Paolo Bonzini PDF
No ratings yet
Sed, A Stream Editor - by Ken Pizzini, Paolo Bonzini PDF
38 pages
Unit No. Course Contents: Course Code: BCS601 Course Name: Introduction To Embedded System Credits
No ratings yet
Unit No. Course Contents: Course Code: BCS601 Course Name: Introduction To Embedded System Credits
7 pages
AIX System Administration Guide
No ratings yet
AIX System Administration Guide
10 pages
Advanced Debugging with Questa Tools
No ratings yet
Advanced Debugging with Questa Tools
47 pages
AX 2012 Interview Questions Overview
No ratings yet
AX 2012 Interview Questions Overview
41 pages
LMS Equalizer ProjectReport
No ratings yet
LMS Equalizer ProjectReport
10 pages
Agile Estimating and Planning With SCRUM: A Short Version of A Two-Day Class. Details Are Available at
100% (1)
Agile Estimating and Planning With SCRUM: A Short Version of A Two-Day Class. Details Are Available at
77 pages
DOS Unit 1
No ratings yet
DOS Unit 1
49 pages
OWC Drive Guide Software Setup r3
No ratings yet
OWC Drive Guide Software Setup r3
9 pages
ArcSight ILO SetUp Procedure
No ratings yet
ArcSight ILO SetUp Procedure
9 pages
SIMATIC Process Control System PCS 7 SIMATIC Route Control V7.1 Rchelp - B - en-US
No ratings yet
SIMATIC Process Control System PCS 7 SIMATIC Route Control V7.1 Rchelp - B - en-US
686 pages
Problem Set Data Structure (1) C++ Concepts, Lists, and Algorithms, Abstract Data Types @solvedondemand
No ratings yet
Problem Set Data Structure (1) C++ Concepts, Lists, and Algorithms, Abstract Data Types @solvedondemand
26 pages
Troubleshooting IPSEC Encryption Issues
No ratings yet
Troubleshooting IPSEC Encryption Issues
2 pages
XSS (Cross Site Scripting) Cheat Sheet
No ratings yet
XSS (Cross Site Scripting) Cheat Sheet
19 pages
BOE Shield-Bot Programming Guide
No ratings yet
BOE Shield-Bot Programming Guide
2 pages