Distributed Operating Systems
1. Issues in Distributed Operating Systems
Distributed Operating Systems (DOS) manage a group of independent computers and make them appear to
the users as a single coherent system. Key issues include:
- Transparency: Hiding the complexity of the distributed system from users.
* Access Transparency: Users access resources uniformly.
* Location Transparency: Users don't need to know where a resource is located.
* Replication and Concurrency Transparency: Consistent behavior with multiple users.
- Fault Tolerance: The system should recover gracefully from partial failures.
- Scalability: The system must maintain performance as it grows.
- Resource Management: Optimal use of CPU, memory, and storage.
- Security: Secure communication, authentication, and data protection.
2. Communication Primitives
Communication in DOS is vital since the system components are geographically separated.
- Message Passing: Processes communicate by sending and receiving messages.
- Synchronous Communication: Sender waits until the receiver gets the message.
- Asynchronous Communication: Sender continues without waiting.
- Remote Procedure Call (RPC): Simulates function calls across machines.
- Remote Method Invocation (RMI): Java-based equivalent of RPC.
3. Lamport's Logical Clocks
In distributed systems, no global clock exists. Lamport introduced a logical clock mechanism to order events.
- Logical clocks assign a numerical timestamp to events.
- If event A happens before event B, then LC(A) < LC(B).
- Helps to determine the causal relationship between events.
- Implements the 'happened-before' relation.
Distributed Operating Systems
4. Deadlock Handling Strategies
Deadlock in distributed systems is more complex due to the lack of centralized control.
- Prevention: Avoid at least one of the four Coffman conditions.
- Avoidance: Use algorithms (e.g., Banker's) to check for safe states.
- Detection and Recovery: Allow deadlock to occur and then detect it using global wait-for graphs or
probe-based algorithms.
- Recovery involves terminating or rolling back processes.
5. Issues in Deadlock Detection and Resolution
- No centralized control makes deadlock detection complex.
- Communication delays can result in false detection (phantom deadlocks).
- Distributed snapshots are needed for consistent deadlock detection.
- Choosing a process to terminate should consider resource usage and execution time.
6. Distributed File Systems (DFS) and Design Issues
DFS allows files to be stored and accessed across multiple machines transparently.
- Design Issues:
* Transparency (location, access, replication).
* Fault Tolerance: Ensures data recovery and service continuation.
* Concurrency: Maintains consistency during concurrent accesses.
* Security: Protects against unauthorized access and corruption.
- Uses client-server architecture and may use caching for efficiency.
7. Case Studies
a. Sun Network File System (NFS):
- Developed by Sun Microsystems.
- Uses RPC and stateless servers.
Distributed Operating Systems
- Supports client-side caching and file sharing.
b. Coda File System:
- Emphasizes high availability and disconnected operation.
- Uses versioning and client-side caching.
- Designed for mobile computing and fault tolerance.