UNIT 2
DISTRIBUTED FILE
SYSTEM
INTRODUCTION
• Developed first for centralized computer systems and desktop computers.
• Provided a convenient programming interface to access disk storage.
•Access control → to restrict who can read/write a file.
•File-locking mechanisms → to prevent conflicts when multiple users/processes access the same file.
•These features made file systems useful for sharing data and programs.
•Allow sharing of both:
•Information (files, programs)
•Hardware resources (persistent storage/ intranet).
•A good DFS should give performance and reliability close to, or sometimes better than, local disk
storage.
•Achieved by designing according to the characteristics of local networks (speed, reliability).
•DFS are most effective in intranets (private organizational networks).
•They provide shared persistent storage across the network.
• First file servers: Developed by researchers in the 1970s.
• Examples: Birrell & Needham (1980), Mitchell & Dion (1982), Leach et al. (1983).
• Sun’s Network File System (NFS): Became available in the early 1980s and became very popular.
• A file service enables programs to store and access remote files exactly
as they do local ones
• Allow users to access their files from any computer in an intranet.
• Other services, such as the name service, the user authentication
service and the print service, can be more easily implemented when
they can call upon the file service to meet their needs for persistent
storage.
A file system is like the brain of how files are stored, organized, accessed, and
managed on disks. It is divided into several modules, each handling a specific
responsibility:
Characteristics of File system
• File systems are responsible for the organization, storage, retrieval, naming,
sharing and protection of files.
• File systems provide a programming interface that defines the file abstraction,
freeing programmers from dealing with storage allocation and layout details
• Files are stored on disks or other non-volatile storage media.
• Files contain both data and attributes
• The data consist of a sequence of data items (typically, 8-bit bytes), accessible
by operations to read and write any portion of the sequence.
• The attributes are held as a single record containing information such as the
length of the file, timestamps, file type, owner’s identity and access control
lists.
• The shaded attributes are managed by the file system and are not
normally updatable by user programs.
• File systems are designed to store and manage large numbers of files, with
facilities for creating, naming and deleting files.
• File system operations
• Programs use system calls like open, read, write, close, etc.
• Programmers often don’t use these directly—they use libraries (like C Standard I/O fopen, fread,
etc., or Java’s file classes) which call these system calls internally.
• When a program opens a file, the system keeps some file state information which includes:
• The file descriptor (a number identifying the open file).
• A read-write pointer (shows the current position in the file for the next read/write).
• Every file has permissions (read, write, execute).
• When you try to open a file, the file system checks:
• Who you are (your user ID).
• What rights you have for that file.
• Whether those rights match what you are asking (e.g., read mode, write mode).
• If it matches → file is opened.
• If not → access is denied.
• Once opened, the system records the mode (read/write) in the open file state, so later
operations (read/write) are properly checked.
• UNIX provides system calls to manage files.
• When a file is opened, the system tracks it with a file descriptor and read-write pointer.
• It also checks permissions before allowing access.
• This ensures programs can safely read and write files without worrying about the low-level details.
Distributed file system Requirements
1. Transparency
• Access Transparency – Same operations for local and remote files; no code changes needed.
• Location Transparency – Files keep the same name/path even if moved across servers.
• Mobility Transparency – Files can move without changing client programs or admin settings.
• Performance Transparency – System should perform well even when load varies.
• Scaling Transparency – DFS should grow easily to handle more users, data, and servers.
2. Concurrent File Updates – Multiple clients can safely update files using locking
(advisory/mandatory) to avoid conflicts.
3. File Replication – Same file can exist at multiple servers → improves scalability (load
sharing) and fault tolerance (backup if one server fails).
4. Heterogeneity – DFS should work across different hardware and operating systems
(openness).
5. Fault Tolerance – System should keep running even if clients/servers fail → achieved via
replication, retries, idempotent operations, and stateless servers, invocation sematics.
6. Consistency – All users should see the same file contents as if only one copy existed
(one-copy semantics). Replication/caching may cause slight delays.
7. Security – Must provide authentication, access control, encryption, and digital
signatures to protect files and communication.
8. Efficiency – DFS should be as powerful and fast as local file systems, while still
supporting sharing and scalability.
File Service Architecture
• The main concerns in providing access to files is obtained by structuring
the file service as three components – a flat file service, a directory service
and a client module
Flat File Service
• Deals only with file contents (not names).
• Uses Unique File Identifiers (UFIDs) to identify files.
• Provides low-level operations like Read, Write, Create, Delete, GetAttributes, SetAttributes.
• UFIDs are unique across the distributed system, ensuring no two files share the same ID.
• When a new file is created, a new UFID is generated and returned.
Directory Service
• The directory service provides a mapping between text names for files and their UFIDs.
• Provides mapping between human-readable names (like report.doc) and UFIDs.
• Allows clients to:
• Create directories.
• Add file names to directories.
• Lookup a file name to get its UFID.
• Uses the flat file service to store directory data.
• Supports hierarchical naming (like UNIX: /home/user/file.txt).
• Essentially works as the “phonebook” of files, translating names into unique IDs.
Client Module
• Runs on the client machine.
• Integrates the flat file service + directory service into a single API for user
programs.
• Provides standard file operations similar to OS-level APIs (like open, read,
write in UNIX).
• Handles:
• Translating names into UFIDs (via directory service).
• Sending file operations (via flat file service).
• Caching recently used file blocks for performance.
• Storing server locations (to know where directory and file servers are).
• Example: In UNIX, the client module would support multi-part file names
(/home/user/file) using iterative directory lookups.
Flat file service interface
• These are RPC (Remote Procedure Call) interfaces between client modules and the file server.
• It is not normally used directly by user-level programs.
• A FileId is invalid if the file that it refers to is not present in the server processing the request or if
its access permissions are inappropriate for the operation requested.
• All of the procedures in the interface except Create throw exceptions if the FileId argument
contains an invalid UFID or the user doesn’t have sufficient access rights.
• Both the Read and the Write operation require a parameter i specifying a position in the file.
• The Read operation copies the sequence of n data items beginning at item i from the specified file
into Data, which is then returned to the client.
• The Write operation copies the sequence of data items in Data into the specified file beginning at
item i, replacing the previous contents of the file at the corresponding position and extending the
file if necessary.
• Create creates a new, empty file and returns the UFID that is generated. Delete removes the
specified file.
• GetAttributes is normally available to any client that is allowed to read the file. Access to the
SetAttributes operation would normally be restricted to the directory service that provides access
to the file.
Directory service interface:
• The primary purpose of the directory service is to provide a service for
translating text names to UFIDs.
• Each directory is stored as a conventional file with a UFID, so the directory
service is a client of the file service.