Computer System Organization & Architecture
Thursday, September 8, 2016
12:03 AM
Interrupts
Computer Organization
- Signal sent to CPU Stop currently running prog Allow OS do
immediate jobs
- CPUs:
Move data from/to main mem, local
buffer
- Device controller:
In charge of a device
- Interrupt Service Routine (ISR)/Interrupt Handler:
Interrupt vector: Store all ISR addresses
Polling: Tests all devices in turn Determine which interrupt
occurred
Has local buffer
Use interrupt Inform CPU it
finished operations
- Concurrent exec of CPUs, devices:
Compete for mem cyc
Incoming interrupts disabled while another processed
Prevent loss
- Alter current prog's control flow:
Interrupt Hardware exec instruction at ISR address
- Bus:
Connect CPUs, device controllers
Provide access to shared mem
Direct Memory Access (DMA)
- Problem with interrupt-based I/O:
Processor involved throughout I/O processing
Large data transferred Many interrupts
- Solution: DMA:
Device controller transfer data block from buffer
directly to main mem without CPU intervention
Only 1 interrupt/block
Not disturb CPU too much
- Types:
External:
From I/O devices: keyboard, mouse,
Timer: Interrupt CPU every few ms
Internal/Exception:
Generated when hardware detects errors in prog
arithmetic (division by zero), addressing (bad pointer),
invalid instruction,
Prog generated/SuperVisor Call (SVC):
Used to transfer control to OS
Mean for user prog to call OS funcs
Storage Hierarchy Structure
Computer System Architecture
- Multiprocessor system:
2 processors in close comm, share bus, block, mem,
peripheral devices
COMP 3511 Page 1
- Multiprocessor system:
2 processors in close comm, share bus, block, mem,
peripheral devices
Advantages:
Throughput (more work at less time)
Reliability (1 fail, others still run)
Econ of scale (Cheaper than multi-single-processor sys)
Type:
Asym multiprocessing: Master-slave CPUs
Sym multiprocessing (SMP):
Storage device
Type
Feature
Register
Volatile
In CPU, high-speed
Cache
Volatile
In CPU/Module
Main mem (RAM)
Volatile
CPU can access directly
Second mem (SSD, HDD) Non-volatile Extension of RAM, large storage
Tertiary storage
Each processor has own reg, cache but share mem
- Uniform Mem Access (UMA):
All processor access mem through shared bus
Any CPU access any RAM in same time interval
Non-volatile For backup/archival data
Caching
- Idea: Store copies of data from slower to faster storage on
temp basis
- Assumption: Data will used again soon
- Mechanism: Faster storage fist checks if info there
HIT: Info used directly from cache
- Non-uniform Mem Access (NUMA):
Each processor has local mem, access mem of other processor
through shared bus
MISS: Copy data from slower storage & use
- Trait: Cache Storage being cached
- Clustered Systems:
2 individual sys loosely coupled
Share storage via storage-area network (SAN)
Type:
Asym clustering: 1 machine monitors other running app
Sym clustering: Multinode run app, monitor each other
COMP 3511 Page 2
Fundamentals of Operating System
Friday, September 9, 2016
11:33 PM
OS Implementation
Operating System
- Multiprogram:
Organize jobs (code + data) Keep CPU busy all times
- Prog intermediary between user & hardware:
Job scheduling:
Select & run job
When job has to wait (e.g. I/O), switch to another
Kernel: Main prog running all times
Others: sys/app prog
- Goals:
Control, coordinate use of sys res
Use hardware in efficient, protected manner
Make comp sys convenient to users (services)
- Overview Workflow:
Comp startup, boot up OS:
CPU jumps to bootstrapping code in firmware
(ROM/EPROM)
Bootstrapping code initialize all sys parts:
register, device controller, mem
Load & start OS kernel, system
processes/daemon
- Multitasking/Time sharing:
CPU switches jobs frequently Users can interact with
each job while it running Interactive platform
- Dual-mode Operation:
User mode:
Certain machine instruction, reg, mem locs not avail
I/O can't accessed
Kernel mode: No restriction
Mode bit: Provide by hardware, help distinguish user/kernel
mode
Transfer userkernel mode via sys call
Wait to handle events:
Interrupts, sys call,
Purpose: Protect OS from errant users, users among
themselves
- Timer:
Interrupt CPU after specific interval
Help prevent user prog from getting stuck/hogging
resource/never returning control back to OS
COMP 3511 Page 3
Operating System Structures
Saturday, September 10, 2016
12:53 AM
System call
- Implementation:
OS Services
Libraries in programming lang serves as link to OS sys call
A number associated with each sys call, maintained by sys
call interface through index table
- User interface:
Callers know nothing about implementation of sys call
CLI (Command Line Interface)
GUI (Graphical User Interface)
Batch interface: Exec commands, directives stored in file
Touchscreen
- Sys call: Prog request services from OS kernel
- Prog exec: Load prog into mem & run
- I/O operations
- File-sys manipulation: Read, write, create, delete, list, search files &
dirs
- Param passing:
- Comm: Exchange info between procs on same/different computer
Pass in reg (limit no of param passed)
Pass addr of mem block storing param in reg
- Error detection
Push onto stack by program, pop by OS
- Resource alloc: Alloc resources to users/jobs concurrently running
- Types:
Process control: Create, terminate, wait,
- Accounting: Keep track of which users use how much resources
File management
Device management: Request, release device,
- Protection & Security:
Info maintenance: Get/set time,
Protection: Ensure all access to sys resources under control
Comm: Send/receive msg
Security: Defend external users/ devices from invalid access
Protection: Control access to resource, get/set perm,
Separation of Policy vs Mechanism
- Policy: Define what needs to be done
- Mechanism: How it actually implemented
Changes needed when policy/mechanism changed
OS System Design
General principle: Modularity:
Divide funcs into modules Simplify debug & sys verification
Approach Monotholic
Def
Layered
Microkernel
Modular
- OS = n layers
- Remove all non-essential components from kernel,
implement them as user-lvl prog
- Core services + (Dynamically) Loadable kernel
modules
- Microkernel:
- Layered + Microkernel:
- Limited structuring
- Typically: Kernel =
Everything below
sys-call, above
physical hardware
Layer 0 = Hardware
Layer n = UI
- Each layer only interact with 2:
Built on top of lower
Provide service for upper
Minimal proc & mem management
Comm between client prog & services through msg
passing
Layered:
Each kernel section has defined, protected
interface
BUT no hierarchy: Any module can call
any others
Microkernel:
Core module: Core funcs, knowledge of
how to load & comm with sub-modules
BUT: Sub-modules no need to invoke
msg passing to comm
COMP 3511 Page 4
Microkernel:
Core module: Core funcs, knowledge of
how to load & comm with sub-modules
BUT: Sub-modules no need to invoke
msg passing to comm
Pros
Little overhead
Performance
Info hiding:
Upper not need to know how lower
implemented, only know what it does
& how to use them
Simple to construct & debug
- Easy to extend OS:
New services added to user space No need to
modify kernel
- Flexibility: New funcs added/removed from kernel
without need to recompile/reboot kernel
- Easy to port OS to new hardware
- Reliable:
Services running in user space Their failure not
affect kernel
Cons
Difficult to
implement &
maintain
- Difficult to appropriately define layers
- Performance:
Layer Overhead
Virtualization
- VM (Virtual machine):
Create illusion of multiple procs, each exec on
own processors with own (virtual) mem
Benefits:
Programming simplicity: Each procs think
it has all mem/CPU time
Fault isolation:
Procs unable to directly impact others
Bugs not crash whole machine
Protection & portability
- Virtualization: Run OS as app within another OS
- Emulation:
Used when source CPU type target CPU type
Allow apps compiled for source CPU run on
target CPU
Generally slow (Translation of source sys
instruction to equiv target sys instruction)
COMP 3511 Page 5
- Performance:
- Difficult to design: Which funcs in core, in sub-
Service User space-Kernel comm, Sys-func modules?
- Comm between sub-modules: Complex
overhead
implementation
Process
Wednesday, September 21, 2016
1:18 PM
Process Control Block (PCB)
- Each proc represented by PCB in OS
- PCB Components (single thread exec):
State: Run, wait, ready, halt,
Overview
ID: Unique no for each proc
- Def: Prog in exec
Prog = Passive entity
Prog counter: Instruction loc of next exec
CPU reg
File containing list of instructions
Mem-management info
Proc = Active entity (with prog counter specifying next
instruction, )
I/O status: I/O devices allocated, list of open files
CPU scheduling info: Priorities, scheduling queue pointers
Exec file loaded into mem: Prog Proc
Accounting info:
1 prog 1 proc
CPU used, time limits, clock time elapsed since start
- Components:
Text: Prog code
Prog counter, Processor reg
Stack: Temp data
Func params, local var,
Data: Global var
Process Scheduling
Heap: Dynamically allocated mem
- Purpose: Max CPU use by selecting suitable, avail proc for next exec
- How:
Maintain scheduling queues of proc:
Job queue: All procs in sys
Ready queue: Procs in main mem, waiting for exec
Device queues: Proc waiting for I/O
Migrate proc among queues
- Types of scheduler:
- Type:
I/O bound:
Type
Long-term
Short-term
Medium-term
Function
Job scheduler:
CPU scheduler:
Select which procs
should be exec next
& alloc CPU
Select which procs
should be brought
into ready queue
Process-swapping
scheduler:
Provide
Provide lesser
Reduce
Existence in time- Almost
sharing sys
absent/Minimal
Minimal
Part of
Speed
Lowest
Highest
Middle
Invoke every sec, min
Invoke every millisec
Spend more time on I/O than computations
Many short CPU bursts
CPU-bound:
Spend more time on computations
Few long CPU bursts
Control deg of
multiprog
- States:
New: Created
Running: Instruction exec
Waiting: Wait for some event to occur (I/O)
Ready: Wait to be assigned to CPU
Remove proc from
mem, store on disk,
bring back later to
continue exec
Terminated: Finished exec
Context Switch
- Switch of CPU from 1 proc/thread to another
Proc context in PCB
Save state of old proc
Load state of new proc
- Properties:
Overhead: Sys does no useful work while switching
Highly dependent on hardware support
Process Creation
SUN UltraSPARC: Multiple sets of reg per CPU, multiple contexts
loaded at once
- What OS does:
Create new PCB (inexpensive)
Setup new page tables for addr space (more expensive)
Copy data from parent proc (very expensive)
COMP 3511 Page 6
- What OS does:
Create new PCB (inexpensive)
Setup new page tables for addr space (more expensive)
Copy data from parent proc (very expensive)
Copy I/O state (medium expensive)
- UNIX commands:
fork(): Create new proc:
Make 2 identical copy of addr space, 1 for parent, 1
for new child proc. BUT:
In parent: fork() returns child's pid if successfully
create, -1 otherwise
In child, fork() return 0
Both proc start their exec at next statement
following fork() sys call
execl(), execlp(), execle(), execv(), execvp(), execvpe():
Child overwrite its remaining proc with new one, start
completely different prog
wait(pid):
Parent wait child to complete
PID of child & its entry in proc table released
Process Termination
- UNIX commands:
exit(): Proc exec last statement, ask OS to delete it,
dealloc resource
abort(): Parent terminates exec of child proc, when:
Child exceed usage of resources
Interprocess Communication (IPC)
Task assigned to child no longer required
- Zombie: Child proc terminated without parent waiting
- Orphan:
Child of parent terminated without calling wait()
UNIX: Assign init proc as new parent
- Why IPC:
Info sharing
Computation speedup (Subtask exec in parallel, )
Modularity (Sys func divided into multiple procs, thread, )
Convenience (Users multitask, )
- Shared Mem (Implicit: no msg exchanged)
How:
Map addr to shared-mem region
Access mem through syscall: read(), write()
Communication in Client-Server Systems
- Sockets: Comm endpoint, = IP + Port
161.25.19.8:1625
Mem: bounded (fixed size), unbounded (no limit practically)
Issue:
Multicore: Multi-cache coherency
Producer-Consumer: Make sure
Producer not add data into full buffer
Consumer not remove data from empty buffer
- Pipes: Act as conduit for 2 procs to exchange info, coordinate
activities
Ordinary pipes (UNIX)
Comm in producer-consumer style
Require parent-child rel
COMP 3511 Page 7
- Msg Passing (Explicit):
Direct Comm
Indirect Comm
- Msg sent/received through mailbox
Mecha - Proc must explicitly name
nism
recipient/sender its want to comm (port)
- Link established automatically
- Proc comm through shared mailbox
activities
Ordinary pipes (UNIX)
Direct Comm
Comm in producer-consumer style
Require parent-child rel
Unidirectional: Parent = Producer, Child = Consumer
Removed after procs finish comm & terminated
- Only 1 link between 2 procs in
comm
Sys call: pipe (int fd[n])
Return: 0 = Success, -1 = -1 Failure
fd[0] for reading, fd[n-1] for writing
Named pipes:
Bidirectional, no parent-child rel
Can be used by 2 procs
Indirect Comm
- Msg sent/received through mailbox
Mecha - Proc must explicitly name
nism
recipient/sender its want to comm (port)
- Link established automatically
- Proc comm through shared mailbox
- Each mailbox has unique ID
Comm 1-to-1
mode
suppor
t
- 1 link can associated with 2procs
- 2 procs can comm on 1 link
- 1-to-1
- 1-to-Many
- Many-to-1
- Many-to-Many
Operati - send(P, msg): Send msg to proc P - Create/Destroy mailbox
ons
- receive(Q, msg): Receive msg from - send(A, msg): Send msg to mailbox A
proc Q
- receive(A, msg): Receive msg from
mailbox A
Issues
Still exist after procs finish comm
- Limited modularity: Proc name
changed OS has to re-match
every pairs of sender & receiver
- Need to know proc name before
comm
- Complex implementation
- Latency, Overhead
- Sync (Blocking) vs Async (Non-blocking) comm:
Sync
Async
Mecha - Sender: Blocked until msg received
nism
by receiver/mailbox
- Receiver: Block until msg avail
- Sender: Send msg and resume
operation immediately
- Receiver: Retrieve valid msg or null
Pros
- Sender & Receiver dont have to
wait Can do extra
- Concurrency
- Sender: Can effectively confirm
receipt
- No buffer space problem: 1
outstanding msg
- Easy to implement
- Low overhead
Cons
- Avoid some awkward kernel
decisions:
Swap waiting proc out to disc?
- Sender: Can't do extra while waiting - Buffer space problem: Sender &
for receipt confirmation
Receiver must buffer msg until
receipt confirmed
- Receiver: Can't do extra besides
- Receiver must test very often to
waiting for msg
test if new msg waiting Difficult
in interrupt & signal prog
- Buffering: Queue of msg attached to link (direct/indirect), implement as:
Zero cap: Sender must wait for receiver
Bounded cap: Sender must wait if link full
Unbounded cap: Sender never wait
COMP 3511 Page 8
Multithreaded Programming
Thursday, September 29, 2016
5:16 PM
Parallelism vs Concurrency
- Parallelism: Sys can perform 1 tasks simultaneously
Thread Overview
- Def: Portion of code executed indept of main prog
Data parallelism: Distribute different subsets of same data
across multiple cores, same operation on each
Task parallelism: Distribute threads across cores, each
thread perform unique operation
- Concurrency: Sys support 1 tasks making progress
Single processor: Round-robin scheduling
Concurrent exec
- Benefits:
Responsiveness: Allow continuous exec if part of
process blocked, especially important for user interface
Resource sharing:
Threads within proc share resources by default
Easier than shared mem, msg passing (must
explicitly arranged by programmer)
Amdahl's Law
- Identify performance gains from adding cores to app that has
serial & parallel components
Economy:
Thread creation much cheaper than process
creation
Thread switching much lower overhead than
proc's context switching
S: Proportion of serial components
N: No of processing cores
Scalability: Multi-thread proc can take advantage of
multiprocessor architect
App: 75% parallel, 2 cores
S = 1 - 0.75 = 0.25, N = 2
- Thread params:
Private: Thread Control Block (TCB)
Exec info: CPU Register, program counter, stack
Scheduling info: state, priority, CPU time
Accounting info
Pointers (for scheduling queues implementation)
- Serial portion has disproportionate effect on performance
gained by adding cores
Public: Shared by all threads in proc:
Mem content (global var, heap)
I/O state (file sys, network )
- Thread Lifecycle:
New: Created
User Threads vs Kernel Threads
Ready: Waiting to run (in CPU)
Running: Exec (in CPU)
Waiting: Waiting for some event to occur
Terminated: Finish exec
- User threads: Managed by user-lvl thread library, no kernel support
POSIX Pthread, Win32 Thread, Java Thread
- Kernel thread: Supported by kernel
- Relationship between user & kernel threads:
COMP 3511 Page 9
Many-to-One
One-to-One
Many-to-Many
- Many user thread
mapped to single
kernel thread
- Each user thread
mapped to a
kernel thread
- Many user thread mapped
to many kernel thread
- One user thread
blocking cause all to
- More concurrency
than Many-to-One
- Allow OS to create
sufficient no of kernel
threads
blocking cause all to
block
than Many-to-One
threads
- Multiple user thread
may not run parallel
on multicore sys
because only 1 in
kernel at a time
Threading Issues
1. Semantic of forks(), clone(), exec() in multithreaded Environment
- fork() has 2 version:
Solaris Green
Windows NT
GNU Portable
Linux
Solaris 8.x
Duplicate all threads of proc
Duplicate only thread that invoke fork() sys call
- clone():
Similar to fork(), but new proc shares addr space with
calling proc (instead of using completely new addr space like
fork)
Mean of creating thread in Linux
- exec(): Replace the whole proc (as normal)
Call exec() immediately after forking & duplicate all thread is
unnecessary
1. UNIX Signal:
- Used to notify proc a particular event has occurred
- Can be received sync/async
- Follow pattern:
Signal generated (by event)
Signal delivered to proc
Once delivered, signal must be handled
- Every signal has default handler run by kernel, which can be
overridden by user-defined handler
- Signal handling func:
void handler_func_name(int sig)
- Register user-defined handler to kernel:
signal(sig_value, handler_func_name)
3. Thread Cancellation: Involve terminating thread before completion
- Async cancellation: Terminate immediately
- Deferred cancellation: Allow target thread to periodically check if
it should be cancelled
Thread-Local Storage (TLS)
- Allow each thread to have own copy of data
- Unique to each thread
- Useful when user not have control over thread
creation proc
- Different from local var:
Local var only visible during single func
invocation
TLS visible across all func invocation
COMP 3511 Page 10
Two-lvl Model = Many-to-Many + One-to-One
HP-UX, IRIX,
Process Scheduling
Thursday, September 29, 2016
5:16 PM
Dispatcher
- Module that gives CPU control to proc selected by shortterm scheduler
CPU Scheduling
- Steps:
- Def: Act of selecting next proc for CPU to service once current proc
leaves CPU idle
Switch context
Switch user mode
Have K jobs ready to run
Have N 1 CPUs
Jump to proper loc in user prog to restart that prog
Assign which jobs to which CPU?
- Dispatcher latency: Time taken for dispatcher to stop 1
proc & start another
- Criteria:
Max CPU utilization: Keep CPU as busy as possible
Max Throughput: No of proc completing exec per time unit
Min Turnaround time: Time to execute a proc (from time of
submitting proc to ready queue to time of completion)
Min Waiting time: Time proc spending waiting in ready queue
Next CPU Burst Time Prediction
Min Response time: Time from when proc submitted until its
first response produced (in time-sharing env)
(Using Exponential Avg)
NOTE: Waiting Time = Turnaround Time - Total Execution Time
n+1 = tn + (1 - ) n
= Turnaround Time - Total CPU Burst Time
n+1: Predicted length of (n+1)th CPU burst
- Circumstances to manipulate queue:
1: Proc switches from running wait state
n+1: Predicted length of nth CPU burst
tn: Actual length of nth CPU burst
I/O request, wait() called
2: Proc switch from running ready state
= 0.5
By interrupts
3: Proc switch from waiting ready
I/O complete
4: Proc terminated
10
- Scheduling type:
Preemptive (2 & 3): Currently running proc may be interrupted
& moved to ready state by OS
Non-preemptive (1 & 4): If proc in running state, it continues to
execute until termination / I/O required / Syscall required
Scheduling Algorithm
Name
Details
Preemptive / Issue
Nonpreemptive
FirstDispatch proc according to arrival time
Come,
First-Serve
Nonpreemptive
Example
- Pros: Simple, no starvation
- Cons:
Short proc behind long proc CPU
utilization
(FCFS)
CPU bound proc before many I/O
bound proc
Proc
Arrival time CPU Burst time
P1
24
P2
P3
10
Throughput not emphasized
Priority
Schedulin
g
Dispatch proc with highest prio (smallest integer)
first
COMP 3511 Page 11
Both
- Pros: Good response time for high prio proc
- Cons: Starvation (Low-prio proc may never
executed)
- Solution: Aging: As time progresses, proc prio
Non-preemptive prio-scheduling
Proc
Priority CPU Burst time
P1
10
P2
P3
P4
P5
ShortestJob-First
Form of Non-preemptive Prio Scheduling
(SJF)
Nonpreemptive
- Pros:
Optimal waiting time
- Dispatch proc with shortest CPU burst first
Response time for short proc
- Same CPU burst, dispatch proc arriving first
High throughput
- Cons: Starvation
ShortestRemaining
-TimeFirst
(SRTF)
Form of Preemptive Prio Scheduling
Preemptive
- At each time instance: Calculate remaining CPU
burst time of all procs
- If proc with shortest remaining CPU burst time in
ready queue: Preempt currently running proc by
this proc
- Each proc gets small unit of CPU time (Time
RoundNonRobin (RR) quantum)
preemptive
- After time quantum elapse / When proc request
I/O, syscall, : Proc preempted & added to end of
ready queue
- Pros: Good response time for short proc
coming at a time instance
- Cons: Starvation
- Pros:
Better response time overall
No starvation
- Cons: High turnaround time
P5
Proc
CPU Burst time
P1
P2
P3
P4
Proc
Arrival time CPU Burst time
P1
P2
P3
P4
Proc
Arrival time CPU Burst time
P1
24
P2
P3
10
- Time quantum Context switch time
Quantum not necessarily Turnaround
time
Quantum too big: Become FCFS
- n procs in ready queue, Time quantum q: All
procs wait (n-1)q
Multilevel Queue
Multilevel Feedback Queue
- Similar to Multilvl Queue, but proc can move between various
queues
- Defined params:
No of queues
Scheduling algorithm for each queue
Methods to determine when to upgrade proc
Methods to determine when to downgrade proc
Methods to determine which queue proc will enter when
it needs service
- Ready queue partitioned into separated queues
Foreground (interactive) / Background (batch),
COMP 3511 Page 12
- Ready queue partitioned into separated queues
Foreground (interactive) / Background (batch),
- Procs permanently assigned to 1 queue when entering sys,
based on their props
Mem size, prio, proc type,
- Each queue has own scheduling algorithm
- Scheduling must be done among queues:
Fix-prio preemptive
Serve all from foreground, then background
Time slice: Each queue gets certain amount of CPU
time for its own scheduling
80% foreground RR
20% background FCFS
Inter-queue scheduling: Preemptive (Finish all jobs in
Q0, then in Q1, then Q2)
Proc downgrade:
New proc enter Q0, if it not finish in 8ms (Q0
quantum), move to Q1
If still not complete within 16ms (Q1 quantum),
move to Q2
Proc upgrade: If it not consume all pre-allocated CPU
time
I/O request, syscall,
Multicore Processor Scheduling
(multi-processor on same chips)
- Mem stall: When proc access mem, it spends significant time amount
waiting for data to be avail (due to cache miss, )
Multiprocessor Scheduling
(for heterogeneous sys (all CPUs of same type))
- Approaches:
Asym:
- Scheduling takes advantage of mem stall to make progress on
another thread
Dual thread processor core (2 logical processor)
Dual thread, dual core sys = 4 logical cores
1 master processor, controlling all activities & running all
kernel code
Others runs only user code
Sym (more common):
Each processors schedules its own jobs
Ready queue: Can be common/separate
- Processor affinity: Proc has affinity for processor where it is currently
running, especially cache content
Soft affinity: OS attempt to keep proc running on same
processor, but not guarantee it
Hard affinity: Allow proc to specify subset of processors where it
may run
Thread Scheduling
- OS schedules kernel-lvl thread, not procs
- User-lvl thread must mapped into associated kernel-lvl thread,
although this mapping can be done indirectly through lightweight
proc (LWP)
1 User-lvl thread 1 LWP
1 LWP 1 Kernel thread
LWP appears to be virtual processor where proc schedule user
thread running on it
Unbound Threads /
Bound Threads /
Process-Contention Scope (PCS)
Sys-Contention Scope (SCS)
- Thread models applied:
- Thread models applied:
Many-to-1
1-to-1
Many-to-many
- Association of user-lvl thread to
LWPs managed by thread library
- Bound thread permanently
attached to 1 LWP
COMP 3511 Page 13
- Non-Uniform Access (NUMA):
Main mem architect can affect processor affinity, if particular CPUs
have faster access to mem on same chip than to mems in other
locs
If proc has affinity for particular CPU Should preferentially
assigned mem storage in "local" fast access
- Load Balancing: Balance job load between processors
Achieved naturally by sys using common ready queue
Maintained through push & pull migration:
Push migration: A proc runs periodically & moves procs
from heavily loaded processors to less loaded
Pull migration: Idle processors takes procs from ready
queues of others
Push & Pull migration:
Can run in parallel (not mutually exclusive)
Work against principle of processor affinity
Scheduling Algorithm Evaluation
- Deterministic model: Take particular pre-determined workload,
define performance of each algorithm on that workload
waiting time
- Queueing model:
Suppose distribution of CPU, I/O burst known (based on
measurement, simple estimation)
Consider computer sys as network of servers, each with queue
of waiting procs
Based on these, calculate certain performance characteristic of
individual waiting queues
Little's Formula:
n= W
n: Queue length
: Avg rate of new jobs arriving
W: Avg waiting time in queue
- Simulation
- Real implementation
COMP 3511 Page 14
Process Synchronization
Wednesday, October 19, 2016
Critical Section Problem
9:19 AM
- Situation: N cooperating procs p0, p1, , pn-1, each has structure:
do {
{ Entry Section }
{ Critical Section }
{ Exit Section }
Race Condition
- >1 procs access & manipulate shared data concurrently
- Outcome of exec depends on particular order which
access/exec occur
{ Remainder Section }
} while (true)
Entry section: Request entry into critical section
Critical section: Access & update shared vars,
Proc
C++ code
Proc instructions
P0
Counter++ Register1 = Counter
Register1 = Register1 + 1
Counter = Register1
P1
Counter--
Register2 = Counter
Register2 = Register2 - 1
Counter = Register2
Exit section: Release shared vars lock, allow other procs in
- Assumption: Procs exec at non-zero speed
- Problem: Design protocol to ensure when 1 proc in Critical Section, others
can't be in
- Solution:
Mutual Exclusion: Only 1 proc in Critical Section at any time
Initially: Count = 5. P0, P1 exec interleavingly:
Proc
Hardware Instruction
Value
P0
Register1 = Counter
Register1 = 5
P0
Register1 = Register1 + 1 Register1 = 6
P1
Register2 = Counter
P1
Register2 = Register2 - 1 Register2 = 4
P0
Counter = Register1
Counter = 6
P1
Counter = Register2
Counter = 4
Progress: No procs running outside Critical Section can block others
from entering Critical Section
Bounded Waiting: Upon request, no procs should wait arbitrary
long to enter Critical Section
Register2 = 5
Critical Section Hardware Solutions
Critical Section Algorithmic Solution - Peterson's Algorithm
General approaches:
- Protect critical regions via locks
Implemented by special atomic hardware instruction
Test/Set/Swap mem words
1. Assumption: LOAD, STORE instruction are atomic (non-interruptible)
2. Implementation:
- Disable interrupt:
Make currently running code execute without being preempted by others
Not work for multiprocessor env
Initialization:
int intendToEnter[] = {0, 0};
// turn = i: It's turn for proc i to enter Critical Section
int turn;
Lock-based Solution
1. Test-and-Set:
a. Assumption: Func TestAndSet() atomic
P0
P1
while (1) {
/* Entry section */
intendToEnter[0] = 1;
turn = 1;
while (intendToEnter[1]
&& turn == 1);
while (1) {
/* Entry section */
intendToEnter[1] = 1;
turn = 0;
while (intendToEnter[0] &&
turn == 0);
{CRITICAL SECTION}
{CRITICAL SECTION}
/* Exit section */
intendToEnter[0] = 0;
/* Exit section */
intendToEnter[1] = 0;
{REMAINDER SECTION}
}
{REMAINDER SECTION}
}
3. Proof of Correctness:
- Mutual exclusion:
For both P0 & P1 to enter Critical Section:
intendToEnter[0] = intendToEnter[1] = 1
turn = 0 & turn = 1 at same time
Impossible
b. Implementation:
// Initialization:
// lock = 1: A proc is still in Critical Section
int lock = 0;
// waiting[i] = 1: Proc i is waiting to enter Critical
Section
int waiting[MAX_N];
memset(waiting, sizeof(waiting), 0);
// Acquire lock
int TestAndSet(int* target) {
int retVal = *target;
*target = 1;
return retVal;
}
// Pi's code
do {
/* Entry Section */
waiting[i] = 1;
while (waiting[i] && TestAndSet(&lock));
waiting[i] = 0;
{CRITICAL SECTION}
- Progress:
P0 can be prevented from entering Critical Section if
intendToEnter[1] = 1 & turn = 1
COMP 3511 Page 15
/* Exit Section */
// Scan waiting[] in cyclic ordering i+1, i+2, , n-1,
{CRITICAL SECTION}
- Progress:
P0 can be prevented from entering Critical Section if
/* Exit Section */
// Scan waiting[] in cyclic ordering i+1, i+2, , n-1,
0, , i-1
// Find first proc j with waiting[j] = 1
int j = (i + 1) % n;
while (j != i && !waiting[j]) j = (j + 1) % n;
intendToEnter[1] = 1 & turn = 1
If P1 not in Critical Section:
P1 has finished Critical Section: Reset intendToEnter[1] = 0 P0
can enter Critical Section
P1 is in Entry Section: Set turn = 0 P0 can enter Critical Section
P0 can't be prevented
if (j == i) // No more procs waiting, release lock
lock = 0;
else waiting[j] = 0; // Hand in entry to Pj
- Bounded-Waiting:
If P0 is waiting in Entry Section: intendToEnter[1] = 1 & turn = 1
After P1 leaves Critical Section: Reset intendToEnter[1] = 0 P0 can
enter Critical Section
{REMAINDER SECTION}
} while (1)
P0 only needs to wait at most 1 turn to enter Critical Section
c. Sketch Proof of Correctness
- Mutual exclusion: Pi enter Critical Section only if waiting[i] = 0 or lock
=0
waiting[i] = 0 only if:
Pj just left Critical Section & changed waiting[i] from 1 0
(other waiting[j] = 1 remain unchanged)
lock = 1 kept unchanged
Spinlock
lock = 0 only if: Pi is the first to enter Critical Section. After entering
Critical Section, lock = 1
- Proc waste CPU cycle due to busy waiting (using loop to
repeated check condition) for entry to Critical Section
Only 1 proc in Critical Section at any time
- Progress: Pi exiting Critical Section will set lock = 0 or waiting[j] = 0
- Advantage: No context switch needed for waiting phase
Suitable for short waiting time
Allow another proc to enter Critical Section
- Bounded-Waiting:
- Often used in multiprocessor sys:
A proc leaves Critical Section
1 thread "spin" 1 processor
Scan waiting[] in cyclic order Find the first waiting[j] = 1 (if
avail)
1 thread perform Critical Section on another processor
Allow Pj to enter Critical Section
Waiting proc only need to wait at most (n - 1) turns to enter Critical
Section
2. Mutex lock
Semaphore
a. Assumption: Func Acquire(), Release() atomic
b. Implementation:
// Initialization
bool available = true;
- General sync tool
- Components:
Shared integer var S
2 atomic operations:
void
wait(int& S) {
while (S <= 0);
S--;
void signal(int& S) {
S++;
}
Support funct
Proc code
Acquire() {
// Busy Waiting
while (!available);
available = false;
}
do {
Acquire();
{CRITICAL SECTION}
Release();
- Type:
Counting: No restrict for S's domain
Release() {
available = true;
}
Binary: S {0, 1}
{REMAINDER SECTION}
} while (true)
- Non-busy-waiting implementation:
struct Semaphore {
int value;
ProcessWaitingQueue Q;
}
void wait(Semaphore& S) {
S.value--;
void signal(Semaphore& S) {
S.value++;
if (S.value < 0) {
/* Push proc invoking wait()
to waiting queue */
S.Q.push(GetInvokingProc());
if (S.value <= 0) {
Process P = S.Q.pop()
/* Syscall: Resume
exec of previously
blocked proc */
wakeup(P)
/* Syscall: Suspend invoking
proc */
block();
}
}
}
Deadlock & Starvation
- Deadlock: 2 waiting procs waits indefinitely for event caused by only 1 of them
S, Q: Semaphore, initialized with 1
(1)
(2)
(3)
(4)
P0
P1
wait(S);
wait(Q);
signal(S);
signal(Q);
wait(Q);
wait(S);
signal(Q);
signal(S);
After (2): Q's queue: P1, P0
S's queue: P0, P1
To resume P0: P1 must release Q's lock (by calling signal(Q))
But P1 stuck at (2) because of waiting in S
P1 can't reach (3)
Priority Inversion
Similar for P1
COMP 3511 Page 16
To resume P0: P1 must release Q's lock (by calling signal(Q))
But P1 stuck at (2) because of waiting in S
P1 can't reach (3)
Priority Inversion
Similar for P1
- Lower-prio proc holds lock needed by higher-prio proc
- Issue: Low-prio proc preempted by another proc with higher prio
(but not need lock) Lock-required high-prio proc need to wait
longer
3 proc: L, M, H (prio: L < M < H)
H needs resource R, currently locked by L
M not need R
H wait for L finish using R
During that time: M arrives, preempt L
H has to wait longer to get access R
- Solution: Prio Inheritance Protocol
Procs accessing resources needed by higher-prio proc inherit
higher-prio until they finish using resource
Prevent being preempted by higher-prio proc not
needing resource
After using resource, prio reverted to original
COMP 3511 Page 17
- Starvation (Indefinite blocking): Proc never removed from semaphore queue
where it is suspended
Semaphore queue use LIFO (Stack, =last-in first-out) order
Synchronization Problems
Sunday, October 23, 2016
4:20 PM
Readers-Writers
- Statement:
Bounded-Buffer
Data set shared among 1 writer, multiple readers
Readers: Only read data set
- Statement: Sync producer & consumer on buffer of size n
Writers: Read + Write data set
- Solution
- Solution:
Semaphore
Init
Usage
mutex
Ensure access mutually excluded
haveData
Lock access until buffer has data
haveEmptySlot n
Initialization:
// Ensure writer/reader access mutually excluded
// No separate access among readers
Semaphore rw_mutex = 1;
Lock access until buffer has empty slot
Producer
Consumer
do {
do {
{Produce item}
wait(haveData);
wait(mutex);
wait(haveEmptySlot);
wait(mutex);
{Add item to buffer}
signal(mutex);
signal(haveData);
} while (true)
{Remove item from buffer}
signal(mutex);
signal(haveEmptySlot)
{Consume item}
} while (true)
// Lock reader's access to shared var read_count
Semaphore mutex = 1;
// Number of readers currently access data
int read_count = 0;
Writer
Reader
do {
do {
wait(rw_mutex);
wait(mutex);
read_count++;
{Perform writing}
signal(rw_mutex);
} while (true)
/* Because multiple reader can work
simultaneously, only the FIRST incoming one
need to acquire rw_mutex lock */
if (read_count == 1)
wait(rw_mutex);
signal(mutex);
{Perform reading}
wait(mutex);
read_count--;
/* LAST reader that finishes accessing
release rw_mutex lock */
if (read_count == 0)
signal(rw_mutex);
signal(mutex);
while (true);
COMP 3511 Page 18
Deadlocks
Wednesday, November 9, 2016
System Model for Deadlocks
10:20 AM
- Resource types: R1, R2, , Rm
- Each Ri has Wi instances
Deadlock
- Each proc utilize resource as following:
- Set of blocked procs, each holding resource & waiting to
acquire resource held by other procs
- Happen if 4 conditions hold simultaneously (necessary, but
NOT sufficient):
Mutual exclusion: Only 1 proc use 1 resource instance
at time
Hold & Wait: Proc holding 1 resource waiting to
acquire resources held by others
No preemption: Resource can only voluntarily released
by proc holding it
Circular wait:
1: Request for resource
2: Use resource
3: Release resource
- Resource Allocation Graph:
Vertices:
Procs: P = {P1, P2, , Pn}
Resource types: R = {R1, R2, , Rm}
Edge:
Request edge: Pi Rj (Pi request 1 instance of R j)
Assignment edge: Rj Pi (1 instance of Rj assigned to Pi)
Waiting proc set {P0, P1, , Pn}
P0 waiting for resource held by P 1
P1 waiting for resource held by P 2
Pn waiting for resource held by P0
No cycle
Cycle
Cycle
No deadlock
Deadlock
No deadlock
Deadlock Prevention
Ensure 1 necessary condition can't hold by restraining ways
request can made
- Mutual exclusion:
P4 can run & terminate normally, then release 1
instance of R2
Could hard to eliminate because some resources are
inherently non-sharable
- Hold & Wait:
Whenever proc request resource, it doesn't hold any
others
Proc can only request & get allocated all
resources before beginning exec
Proc can only request resource when it has none
P3 then take that R2 instance, break cycle
Observation:
No cycles, no deadlock
Cycle:
1 instance/resource Deadlock
1 instance/resource Possibly deadlock
Low resource utilization, starvation possible
- No preemption:
When proc (currently holding resources) request another
resource not immediately allocable
All held resources released
Deadlock Recovery
Preempted resources given to waiting proc
Proc holding released resource only restarted when
it can regain its old resources
Can only applied to resources whose state can easily
saved & restored
Register, mem space
Can't applied to locks, semaphores
- Circular wait:
Impose total ordering of all resource types
Require proc to request resources in increasing order
of enumeration
R1 < R 2 < < R m
COMP 3511 Page 19
- Terminate proc:
Abort all deadlock proc:
Not desirable
Abort proc 1 by 1, until deadlock cycle eliminated
Consideration overhead (must run detection algorithm every time 1 proc
aborted)
- Preempt resource:
Successively preempt resources from proc & give them to other procs until
deadlock cycle broken
Possibly starvation (same proc always picked as victim)
Deadlock Avoidance
Friday, December 16, 2016
11:58 AM
Resource-Allocation Graph Scheme
Deadlock avoidance
- Require in advanced additional info about which resource
requested by proc
OS can decide for each resource, whether proc should wait
Avoid unsafe state
- Approach:
Proc declare max no of resources of each type it may need
Deadlock avoidance algorithm dynamically examine
resource-allocation state Ensure circular wait never exist
- Claim edge Pi Rj (dashed line): Pi MAY request Rj
P1 R2, P2 R2
- Request edge Pi Rj: Converted from claim edge, when Pi DEFINITELY
request Rj
P2 R1
- Assignment edge Rj Pi: Converted from request edge, when Rj
allocated to Pi
Safe State
R 1 P1
- When Pi release Rj, assignment edge reconverted to claim edge
- Sys in safe state if
Sequence P1, P2, , Pn
Resources needed by Pi can satisfied by currently avail
resources + resources held by Pj (j < i)
- Resource-Allocation Graph Algorithm:
Request Pi Rj can granted only if converting request edge to
assignment edge doesnt form cycle
Max pieces of resource need Current allocation
P1
P2
P3
Banker's Algorithm
Exec sequence to safe state: P2, P1, P3
Free instance = 10 - 4 - 2 - 3 = 1 P2 get 1 more, run, release
- Assumption:
Free instance = 1 - 1 + 3 = 3 P1 get 2 more, run, release
Resource has multiple instance
Free instance = 3 - 2 + 6 = 7 P3 get 6 more, run, release
Proc must declare priori max use
When proc request resource, it has to wait (First need to check whether
this allocation results in safe state)
- Implication:
If Pi's resource needs not immediately available: Wait until
all/some Pj (j < i) finish
When proc get all resources, it must return them in finite time amount
- Data structure:
- Sys in safe state: No deadlock
- Sys in unsafe state: Possibly deadlock
n = No of procs
m = No of resource types
Why: Not every instance of proc would require MAX pieces of
resource to finish
available[1 m]:
available[j] = k: Rj has k avail instances
Max pieces of resource need Current allocation
P1
10
P2
Unsafe state: Only 10 - 4 -5 = 1 piece of resource left
Deadlock happens if 1 proc request max amount of resource
In reality:
P1 can finish job with current allocation, then release
resources
P2 will have enough resource to finish
No deadlock
Max[1 n, 1 m]:
Max[i, j] = k: Pi may request at most k instances of Rj
Allocation[1 n, 1 m]:
Allocation[i, j] = k: Pi currently allocated k instance of Rj
Need[1 n, 1 m]:
Need[i, j] = Max[i, j] - Allocation[i, j]
- Safety Algorithm: Find sequence of exec proc in safe state
Initialization:
Work Available
Finish[i] = false, i = 1, 2, , n
Loop until Finish[i] = True i:
Find i:
Finish[i] = false
Need[i] Work
If i: Sys not in safe state, terminate
Add Pi to sequence
Work Work + Allocation[i]
Finish[i] True
COMP 3511 Page 20
Finish[i] True
- Resource-request algorithm:
Given Pi, Rj
Request[i, 1 m]: Pi suddenly wants k more instance of Rj (in addition to
Allocation[i])
Example of Banker Algorithm
1: If Request[i] Need[i]: Go to step 2
Otherwise, raise error (since Pi exceeds its max claim)
2: If Request[i] Available: Go to step 3
Otherwise, Pi must wait (since resources currently not avail)
3: Pretend to allocate resources to Pi:
Avail Avail - Request[i]
Safe state: P0, P3, P2, P1, P4
Allocation[i] Allocation[i] + Request[i]
Need[i] Need[i] - Request[i]
4: Run safety algorithm to determine safe state:
If safe: Additional resources can allocated to Pi
If unsafe:
Pi must wait
Restore Avail, Allocation[i], Need[i]
Request from P4 arrives: Request[4] = (0, 0, 2, 0)
Request[4] < Need[4] ((0, 0, 2, 0) < (2, 2, 3, 3))
Request[4] < Available ((0, 0, 2, 0) < (3, 3, 2, 1))
Pretending to allocate resources to P4:
Avail (3, 3, 2, 1) - (0, 0, 2, 0) = (3, 3, 0, 1)
Allocation[4] (1, 4, 3, 2) + (0, 0, 2, 0) = (1, 4, 5, 3)
Need[4] (2, 2, 3, 3) - (0, 0, 2, 0) = (2, 2, 1, 3)
Available not enough to execute any Pi (as C = 0)
P4's request (0, 0, 2, 0) can't granted immediately
COMP 3511 Page 21
Deadlock Detection
Friday, December 16, 2016
Multiple Instance of Each Resource Type
12:25 PM
Single Instance of Each Resource Type
- Data structure:
Available[1 m]:
Available[i] = k: Ri still has k avail instance
Allocation[1 n, 1 m]:
Allocation[i, j] = k: Pi currently allocated k instance of Rj
- Wait-for graph:
Vertices: Procs
Request[1 n, 1 m]:
Edge Pi Pj: Pi waiting for Pj
Request[i, j] = k: Pi request k more instances of Rj
- Detection algorithm:
1: Initialize
Work Available
Finish[i] = False, if Allocation[i] 0
True, otherwise (i = 1, 2, , n)
2: Find i:
Finish[i] = False
Request[i] Work
If i: Go to step 4
- Periodically search for cycle in graph
Cycle Deadlock
3: Work Work + Allocation[i]
Finish[i] = True
Go to step 2
- Complexity: O(n2)
4: If i: Finish[i] = False: Pi deadlocked, system in deadlock state
COMP 3511 Page 22
Memory Management
Thursday, December 15, 2016
Address protection & Dynamic relocation
9:11 PM
- Addr protection:
Addr space of proc/user defined by base & limit register
Overview
- Prog must brought into mem & placed within proc to run
Done by long-term scheduler
CPU checks every mem access generated in user mode, ensure it within base for
that proc/user
Implementation: Limit register: Contain range of logical addr
- Memory usually divided into 2 partitions:
Low mem: Resident OS, interrupt vector
High mem: User procs
- Physical addr: Addr seen by memory unit
- Logical (Virtual) addr:
Generated by CPU
User progs deal with logical addr, never see real physical one
- Symbolic addr: Addr represented by symbols in source code
x, y, count
- Relative (Relocatable) addr: Addr represented as offset from
location
- Dynamic relocation:
Routines kept on disk in relocated format & not loaded to mem until called
Utilize mem space
Unused routine never loaded
Large code portion handling infrequently occurring cases rarely loaded
Implementation:
Prog designed to support dynamic loading
Multistep Address Binding of User Program
OS provides library implementing dynamic loading
Relocation register: Contain value for smallest physical addr
- Compile time:
Compiler translates
symbolic relative addr
Same logical & physical
addr
- Loading time:
Loader translate relative
absolute addr
Same logical & physical
addr
- Exec time:
Memory Management
Unit translates absolute
physical addr
Contiguous Allocation
Different logical &
physical addr
- Hole: Block of avail mem
- When proc arrives: Allocated in hole which is large enough to accommodate it
- When proc exists: Free its partitions, adj free partition combined
Swapping
- Allocation strategy:
- Temporarily swap proc from mem to backing store, then
brought back into mem to continue exec
Support: Physical mem space < Logical mem space
Degree of multiprogramming
- Ready-queue: Manage ready-to-run proc having mem images
on disk
COMP 3511 Page 23
First fit
Best fit
Worst fit
Allocated to first, big
enough hole
Allocated to smallest, big
enough hole
Allocated to largest hole
Produce smallest leftover
hole
Produce largest leftover
hole
Must search through hole
list
Must search through hole
list
Fragmentation
- External fragmentation: Mem space exist to satisfy request, but not contiguous
Segmentation
- Program is collection of segment
- Internal fragmentation:
Allocated mem slightly larger than requested mem
However, remaining mem is inside allocated partition, can't used by other procs
- Solution
Compaction:
Shuffle mem content to place all free mem together in 1 large block
Possible only if addr relocation is dynamic, done at exec time
Can be expensive
Segmentation
- Logical addr format: (Seg no, Offset)
- Segment table:
Entry: (Base, Limit, Validation bit):
Base: Starting physical addr where segment reside in
mem
Limit: Length of segment
Validation bit: Illegal? Read/Write privilege
Referenced by 2 hardware register:
Segment-table base register (STBR): Table's location in
mem
Segment-table length register (STLR): No of segments
used by prog
COMP 3511 Page 24
Paging
Paging
Thursday, December 15, 2016
10:24 PM
Page Table
- Register:
Page-table base register (PTBR): Page table location in mem
Page-table length register (PTLR): Indicate page table's no of entries
Overview
- Divide physical mem into fixed-size block Frame
Frame size = 2n (512B 16MB)
- Divide logical mem into fixed-size block Page
Usually: Page size = Frame size
- Program of size N pages need N free frames
Page size = 2048 B
Process size = 72766 B
72766 / 2048 = 35, remain 1086
Need 36 pages
Internal fragmentation = 2048 - 1086 = 962
- Every data/instruction access requires 2 mem access:
Page table Data/Instruction
- Translation look-aside buffer (TLBs):
Portion of page table, stored in cache Efficiency
Small (64 - 1024 entries)
Some entries wired down for permanent access
Entries for key kernel code
- Effective Access Time (EAT): Avg time needed to access physical mem addr
EAT = (TLB Search Time + Mem Access Time)
+ (1 - ) (TLB Search Time + 2 Mem Access Time)
: Hit ratio: Prob of finding page no in TLB
- Logical addr format: (Page no, Offset)
Page no (p): Index into page table
Page size = 2n
Logical space size = 2m
No of page table = 2m - n
Hierarchical Paging
- Break up logical addr space into multiple-lvl page table
Memory Protection
- Protection bit:
Associated with each frame
Indicate read/write access
- Valid-invalid bit
Associated with each page table entry
Meaning:
Valid: Page in proc's logical addr space Legal page
Invalid: Page not in proc's logical addr space
- Logical addr format: (p1, p2, d)
- Violation Trap to kernel
- Motivation: Dont always need all logical addr space Don't need to create
intermediate-lvl page table at same time
Real life Paging Architecture
- Intel IA-32:
COMP 3511 Page 25
Page Size
- Page Size: Page Fault, Internal
fragmentation
- Multiple page size: Allow app to use suitable
page size to them
- Intel x86-64:
Only implement 48 bit addressing
4 lvl paging hierarchy
- ARM:
1 lvl paging for section
2 lvl paging for smaller pages
2 lvl of TLBs:
Outer lvl: 2 micro TLB (1 data + 1 instruction)
Inner: 1 TLB
COMP 3511 Page 26
Virtual Memory Management
Thursday, December 15, 2016
11:13 PM
Virtual Memory Overview
- Motivation: Separation of logical mem from physical mem
Support logical addr space > Physical addr space
Prog size not constraint by physical mem
Allow more program to run concurrently
- Proc's virtual addr space:
- Heap: Grow upward, used
for dynamic mem allocation
- Stack: Grow downward,
used for successive function
calls
- Blank space between heap
& stack:
Demand Paging
- Lazy swapper principle: Page only brought from disk to mem when needed
Part of virtual addr
space
Physical mem consumption
Only require actual
physical space if
heap/stack grow
More procs can run concurrently
No unnecessary I/O
- Valid-invalid bit: Indicate whether page in mem
v (Valid): In mem
i (Invalid): Not in mem
- Pager: Responsible for swapping page in/out mem
- Handling page fault (page not currently in mem):
- Shared library: Each proc consider shared libraries as part of its
virtual addr space
OS look at another table (usually in PCB) to decide reference is valid/invalid:
Invalid: Abort
Valid & Not in mem: Continue
Get empty frame
Swap page into frame via scheduled disk operation
Reset table to indicate page now in mem
Set Validation bit = v
Restart instruction causing page fault
- Page replacement:
Occur when no free frame avail
Swap out page in mem not really in use
Modify (dirty) bit: Indicate whether page modified Avoid unnecessary
page transfer
- Demand paging performance:
p = Page fault rate
Effective Access time (EAT) =
Prepaging
(1 - p) Memory Access
+ p Avg Page-Fault Service Time
- Prepage all/some pages before proc actually reference it
Large page faults at proc startup
- However: Prepage unused I/O + mem waste
Avg Page-Fault Service Time =
Page Fault Overhead
+ Swap Page out + Swap Page in
+ Restart Overhead
s pages prepaged
of these pages used (0 < < 1)
Cost of saving page fault s
Cost of prepaging unnecessary pages s (1 - )
0: Prepaging lose
COMP 3511 Page 27
Page Replacement
Thursday, December 15, 2016
12:01 PM
Algorithm
Usage
FIFO (First-infirst-out)
- Replace oldest page
Illustration
- FIFO queue: Can used to track page ages
- Beladys Anomaly: Available frame Page faults
OPT (Optimal)
- Replace page won't used for longest period of time
- No Beladys anomaly
- Required future knowledge Impossible to implement
LRU (LeastRecently used)
- Replace page that has not used in most amount of time
- Implementation:
Counter:
Page entry has counter
Page referenced: Counter Clock time
Replacement needed: Find page with min counter value
Expensive search, inexpensive update
Stack:
Keep double-linked list of page numbers
Page referenced: Move to beginning of list
Expensive update, no search needed
- No Beladys anomaly
Second chance - Each page has reference bit:
(LRU
Initially, 0
Approximation)
When get referenced: Set to 1
- When replacement needed: Iterate through all pages:
If reference bit = 0: Replace
If reference bit = 1: Set to 0, continue
- Implementation: FIFO queue + Hardware-provided reference bit
Counting
- Keep counter of no of reference made to each page:
LFU (Least Frequently Used) Algorithm: Replace page with min count
MFU (Most Frequently Used) Algorithm: Replace page with max count
- Not commonly used
COMP 3511 Page 28
Thrashing
Frame Allocation
Thursday, December 15, 2016
3:10 PM
- Proc not have enough pages
Having to frequently swap page in/out
High page-fault rate
Frame Allocation Overview
- Purpose: Allocate frame to proc when needed
- Approach:
Equal allocation: Each proc get equal no of frame
100 frame, 5 proc
Each proc get 20
Proportional allocation: Allocate proportionately to proc size
si = Size of proc i
- Reason:
Proc migrates between localities
S = si
m = Total no of frames
Size of locality > Total mem size
Allocation for pi:
Priority allocation:
Allocate proportionately based on proc priority
- Can limit effects by local/priority page replacement: Prevent
proc from stealing others' frame, causing thrashing to others
If proc P get page fault:
Replace 1 of its own frames, OR
Replace frame taken from proc with lower priority
Working-Set Model
- Global vs Local replacement:
Global
Local
Proc select replacement frame
from set of all frames
Each proc select replacement frame
from own frame set
May provide better sys
throughput
Per-proc performance more
consistent
Exec time can vary greatly:
Proc can't control own pagefault rate
Can underutilize mem: Page preallocated to proc not used
- m: Max no of frame provided by physical mem
- = Working-set window
10000 instruction
- WWSi (Working set of proc i) = No of pages referenced in most recent
too small: Not encompass entire locality
Page-Fault Frequency
too large: Encompass several localities
- Use local replacement policy
- Establish "acceptable" page-fault rate:
Current rate too low: Proc lose frame
Current rate too high: Proc gain frame
Prevent thrashing
= : Ecompass entire prog
- D = WSSi: Total demand frames (Approx of current locality in whole sys)
- D > m: Thrashing ( 1 proc lack mem)
- Working set strategy: If D > m: Suspend/Swap out 1 proc
Prevent thrashing
Maintain high degree of multiprogramming
- Implementation: Approx working set with timer + reference bit
= 10000
Timer interrupt after every 5000 units
2 ref bit for each page: x1, x2
Page referenced: Set x1 = 1
Whenever timer interrupt, copy all x1's to x2's, set all x1's to 0
If x1 = 0 or x2 = 0: Page in working set
Bits used Accuracy, Cost
- TLB Reach = TLB Size Page Size:
Ideally: working set of each proc stored in TLB
Otherwise: high page fault
COMP 3511 Page 29
File System
Friday, December 16, 2016
File Access Method
12:31 AM
- Sequential access:
Read next
File Overview
Write next
Reset
No read after last write (rewrite)
- Attribute:
Name, Type, Size
- Direct access (Consider file as set of fixed length record)
Read n
Time
User identification
Identifier: Unique tag (number) to identify files within file sys
- Structure:
None: Just byte/word sequence
Simple record
Line, fixed/variable length record
Complex
Formatted document, relocatable exec file
- Operation:
Create/Read/Write/Delete/Truncate
Write n
Position to n
Read next
Write next
Rewrite n
(n = Relative block number)
- Others:
Built on top of direct-access
File indexing: Fast determination of data location
Seek (Reposition within file)
Open(F):
Search directory structure on disk to entry F
Move F's content to mem
Prepage F for subsequent access
Close(F): Move F's content from mem to directory structure on
disk
- OS's open file management:
Open file table (system-wide + per-proc): Track open files
File Sharing
- Multi-user system: Per user permission, per group
- Remote file system:
Use networking to access file
Manually: FTP
File pointer (per-proc): Pointer to last read/write location
File open count:
Count no of procs currently using file
Allow removal of data from open-file table when last proc
close file
Automatically: Distributed file sys
Distributed Information System: Implement unified access
to information needed for remote computing
Active Directory, NIS, LDAP,
Disk location of file
Access right (per proc)
- Open file locking:
Shared reader lock, exclusive writer lock
Mandatory lock: OS denies access depending on locks held &
requested
Windows
Advisory: Proc can find lock status & decide what to do
(programmer decide)
Access List and Group on UNIX
- Access mode: Read/Write/Execute
- User class:
UNIX
- Ask manager to create group, then add some users to group
- Attach group to file
Disk structure
chgrp G game
- Disk divided into partition, each partition can protected against
failure using RAID
(G: group name, game: filename)
- Define access for file, subdirectory
- Volume:
Entity containing file sys
Track file sys info in: Device directory, Volume's table of
contents
COMP 3511 Page 30
File System Mounting
File sys must mounted before accessed
Directory Structure
- Collection of nodes containing file info
- Reside on disk
- Operation supported:
Search/Create/Delete/Rename file
List directory
Traverse file sys
Structure type
Illustration
Pros
Cons
Single-lvl
- Single directory for all users - Naming problem: Different user can't
create directory with same name
- Very simple
- Grouping problem: No subdirectory
Two-lvl
- Can have same filename
under different users
- More efficient search than
single-lvl directory
Tree-Structured
- Efficient search
- Group capability
Acyclic-Graph
Can share subdirectories &
COMP 3511 Page 31
- Limited-grouping capability
Deletion might lead to dangling
Acyclic-Graph
- Can share subdirectories &
files
- Deletion might lead to dangling
pointer (pointing to empty/wrong
files)
- Link: Another name (pointer) - Need to ensure no cycle:
to existing file
Allow links to file, not
subdirectories
- Resolve link: Follow pointer
to file
Link added: Use cycle detection
algorithm to detect cycle
General Graph
Allow cycle
COMP 3511 Page 32
File System Implementation
Friday, December 16, 2016
8:38 PM
File System Organization
- In-mem structure:
Mount table: Info about mounted volumes
Role of Disk
Directory-structure cache: Info about recently access directories
- Disk provide most of secondary storage where file sys
maintained:
Disk can rewritten in place (each single block
support read/write)
Disk can directly access any block
Sys-wide open-file table: FCB copy of open files
Per-proc open-file table: Pointers to entries in sys-wide open-file table
Buffer: File block read/write to disk
- On-disk structure
Simple to access file sequentially, randomly
- I/O mem Disk transfer performed in blocks I/O
efficiency
File System Organization
Managemen Info stored
t unit
UFS features
NTFS features
Boot
control
block
Per volume
- Info needed by sys to boot Boot block
OS from volume
- Usually first block on volume
(if exist)
Partition boot
sector
Volume
control
block
Per volume
- Total no of blocks
Stored in master
file table
Superblock
- No of free blocks
- Block size
- Pointers
- Free FCBs count & pointers
Layer
Roles
(high low)
App prog
Logical file sys - Metadata: All file sys structure except file content
- File Control Block (FCB): Info about file (ownership,
permission, file content location on disk)
Directory Per file sys
structure
File organization
- File name
File
control
block
File details
- inode no
Per file
Stored in master
- Associate inode file table
(FCB) no
- Permission
- Size
- Date
- Manage directory structure
- Provide info needed by file organization module
- Translate file name file identifier/handle/location
through FCB
File
organization
module
- Translate logical physical block addr
- Pass physical block addr to basic file sys for data transfer
- Manage free disk space, disk block allocation
Basic file sys
- Issue command to device driver Read/Write physical
block to disk
File Operation Implementation
- Open file:
Retrieve block 123
- Manage mem buffer:
Hold data during mem disk transit
Buffer mem allocated before real disk block transfer
occur
- Cache frequently used file sys metadata, file block
I/O control
- Manage I/O device through device driver, interrupt
handler
- Transfer data between mem disk
Read drive 1, cylinder 72, track 2, sector 10
Device
Pass filename to logical file sys
If file already in use (by other proc): Per-proc open-file table entry
created, point to existing system-wide open-file table
Otherwise: Search directory structure:
File found: Copy FCB into sys-wide open-file table in mem
- Read file:
Open call return pointer to entry in per-proc open-file table
All file operation performed via this pointer
UNIX: File descriptor, Window: File handle
Data read, copied to specified user proc mem addr
Directory Management
- Structure management:
Linear list
Hash table
List of filenames, with pointer to data blocks
COMP 3511 Page 33
Linear list store directory entries
Stored in master
file table using
relation database
(1 row/file)
List of filenames, with pointer to data blocks
- Linear list store directory entries
- Hash table entry: Key = Filename, Value = Pointer to data blocks
Simple to implement
Time to search file
Search file in linear time
Collision: 2 different filenames hash to same location
- Disk block allocation:
Method
Description
Pros
Cons
Contiguous
Each file occupy contiguous disk blocks
Simple to manage: FCB only need to save start
location & length
- File size:
Difficult to grow
Must known at time of file creation
Overestimated Internal fragmentation
- Finding free space for new file:
Time consuming
Common strategies: Best fit, first fit (more
efficient than worst fit)
High cost of compaction for large disk
Extent-based - Extent: Contiguous set of disk blocks
(Modified
- File = Set of linked-list extents
contiguous
(Each extent has link to next one)
allocation)
Linked
File = Set of linked-list disk blocks, end at null pointer
FAT:
- Internal fragmentation: Extent too large
- External fragmentation: Extent can have various
size
- No compaction needed
- Direct access: Inefficient
- No external fragmentation
- Extra disk space required for pointers
- File size can grow easily
- Reliability: When pointers lost/damaged
Disk section at beginning of volume: Set aside for
FAT table
Directory entry contain block no of first block of file
Table entry (indexed by block no) contain block no of
next block in file
Table entry for last block: Special end-of-file value
Cache FAT table in mem Improve random access
Indexed
- Index block:
Special location, store array of pointers
Type:
Direct block (normal): 1 disk block, pointers
refer directly to real data block
- No external fragmentation (Pointer can point Wasted space: Indexed's pointer overhead >
to any free disk block)
Linked's overhead
- Support direct access
Small file (1, 2 disk block)
Indexed: Still need entire index
block
Linked: Only need space for 2 pointers
Indirect, Linked scheme: 2 index blocks
together
Indirect, Multi-lvl scheme: 1st lvl index block
2nd lvl index block File block
Combined scheme: Direct block for small files,
indirect blocks for large file
- Each file has own index block
Free space Management
Method
Description
Bit vector
Pros
Cons
- Simple
Inefficient if entire vector can't
kept in mem
- Efficient: Find first free block,
first n consecutive free blocks
COMP 3511 Page 34
Linked-list
- Linked-list of free disk blocks
- Easy: Locate single free block
- Cache pointer to first free block in mem
Grouping (Modify Store addr of n free blocks in first free block:
linked-list)
First (n - 1) of these blocks actually free
Last block: Addr of another n free blocks
Counting
Keep entry (Addr of first free block, Count of following
contiguous free blocks)
COMP 3511 Page 35
Can find large no of free blocks
quickly
- Uneasy: Obtain contiguous set
of free blocks
- Inefficient: Travel entire list (but
this is not frequent action)
Mass Storage System
Sunday, December 11, 2016
Solid-State Disk
Magnetic Tape
1:49 AM
- Non-volatile mem, used like hard
drive
- Faster access speed (no moving
parts)
- More reliable than HDDs
Magnetic Disk
- Structure:
- Storage cost per MB more
expensive
- Maybe have shorter life span
- Relatively permanent
- Can hold large quantities of data
- Mainly used for backup, storage of
infrequently-used data
- Very slow access time ( 1000 slower
than disk)
Disk structure
- Disk drive
Large 1D arrays of logical blocks
Sectors, sequentially
(sector 0: 1st sector of 1st track on outermost cylinder)
Platter: Circular plate storing data
Track: Circular path on platter, formed by consecutive platter
Sector: Subdivision of track on platter, store fixed data amount
- Mapping order: Outermost Innermost, Track Cylinder
Cylinder: Vertical collection of corresponding tracks on all platters
Arm: Hold read write head
Read write head: Device for reading/writing data to disk
Disk Attachment
- SCSI:
Bus, max 16 devices/cable
SCSI initiator (controller card) requests operation
SCSI targets (devices) perform task
- Storage Array:
Attach array of disks
Ports to connect hosts to array
- Metrics:
Memory, controlling software (NVRAM, )
Seek time: Move disk arm to desired cylinder
- Storage Area Network (SAN):
Rotational latency = 60 / RPM (s)
Rotate desired sector to under disk head
Avg rotational latency = 1/2 60 / RPM (s)
Random-Access (Positioning) Time
= Seek time + Rotational latency
Transfer rate: Rate of data flowing between drive computer
mem
Multiple hosts attached to multiple storage arrays
Avg I/O time
Physical medium: Fiber channel switches
= Avg access time
+ (Transfer amount/Transfer rate)
Storage made available via LUN (Logical Unit Number)
masking from specific arrays to specific servers
+ Controller overhead
Easy to add/remove storage
Transfer 4KB on 7200RPM disk, 5ms avg seek time, 1GB/sec
transfer rate, 0.1ms controller overhead
- Network-Attached Storage (NAS):
Storage made available over network rather than local
connection
Implemented via remote procedure call (RPCs) between host
and storage over TCP/UDP
Avg I/O time
Protocol: NFS, CIFS
iSCSI protocol: Carry SCSI protocol over IP
=9.27 ms
- Head crash: Result from disk head making contact with disk structure
Disk Scheduling
200 cylinder 0 199
Disk Management
COMP 3511 Page 36
Disk Scheduling
200 cylinder 0 199
Queue = 98, 183, 37, 122, 14, 124, 65, 67
Disk Management
Head starts at 53
- Low-lvl (Physical) format:
Algorithm Explanation
Divide disk into sectors that disk controller can read/write:
Illustration
Each hold header info, data, error correction code (ECC)
FCFS
(First
come first
serve)
- Logical (File sys) format:
Partition disk into 1 groups of cylinders, each treated as logical disk
Efficiency: Group blocks into clusters:
Disk I/O in blocks
File I/O in clusters
- RAID (Redundant Array of Inexpensive Disks):
2 disk drives, provide reliability via redundancy
Total distance = (183 - 53) + (183 - 37) + (122 - 37) + (122 14) + (124 - 14) + (124 - 65) + (67 - 65) = 640
Help recover from disk failure, but not prevent/detect data corruption
- Select request with min
SSTF
(Shortest
seek time from current
Seek Time head pos
First)
Techniques:
Parity: Re-generate lost content from parity-saved info elsewhere
Stripe: Store data randomly to multiple disk (no full data in single disk)
Mirror: Store copies of data on multiple disk
- Issues:
Starvation
Hot-spare disk: Disk left unallocated, automatically replace failed disk & have data
rebuilt onto them
Calculation
overhead
RAID lvls:
Head movement: 53, 65, 67, 37, 14, 98, 122, 124, 183
Total distance
= (67 - 53) + (67 - 14) + (183 - 14) = 236
SCAN
- Head starts at 1 end,
services requests until it
gets to other ends
- Then, head movement
reversed, servicing
continues
Lvl
0
Illustration
Properties
- Blocks striped
Excellent
performance
- No mirror, No parity
No redundancy
- Perform well for systems
placing heavy load on
disk
- Blocks mirrored
Excellent redundancy
- No striping, no parity
Assume head starts from 199
Good performance
Head movement: 53, 37, 14, 0, 65, 67, 98, 122, 124, 183
Total distance = (53 - 0) + (183 - 0) = 236
LOOK
- Similar to SCAN
Assume head starts from 0
Head movement: 53, 65, 67, 98, 122, 124, 183, 37, 14
- Except head only goes as Total distance = (183 - 53) + (183 - 14) = 299
far as last request in
each direction, then
reverses immediately
C-SCAN
- Bit lvl striping
- 1 data disk group
- 1 ECC disk group
- Head moves from 1 end
to other, servicing
request as it goes
- When reaching other
end: Immediately returns
to beginning, without
servicing any requests on
return trip
- Provide more uniform
wait time than SCAN
- Perform well for systems
placing heavy load on
disk
C-LOOK
- Byte lvl striping
- 2 data disks
- 1 parity disk
Assume head starts from 0
Head movement: 53, 65, 67, 98, 122, 124, 183, 199, 0, 14, 37
Total distance = (199 - 53) + (199 - 0) + (37 - 0) = 382
- Similar to LOOK
- Block lvl striping
- 2 data disks
- Except head only goes as
far as last request in
each direction, then
reverses immediately
- 1 parity disk
Assume head starts from 0
Head movement: 53, 65, 67, 98, 122, 124, 183, 14, 37
Total distance = (183 - 53) + (183 - 14) + (37 - 14) = 322
- Block striped
Good performance
- Distributed parity
Good redundancy
COMP 3511 Page 37
Assume head starts from 0
Head movement: 53, 65, 67, 98, 122, 124, 183, 14, 37
- Block striped
Good performance
- Distributed parity
Total distance = (183 - 53) + (183 - 14) + (37 - 14) = 322
Good redundancy
- Block striped
- 2 distributed parities (2
parity block for each data
block)
0+1
Mirror of stripes
1+0
Stripe of mirrors
- ZFS:
Adds checksums of all data & metadata
Checksum not kept with block, but stored with pointer to that block
Detect/Correct block corruption
No partitions, disk allocated in pools
COMP 3511 Page 38