Unit 1 - Lecture Notes
Unit 1 - Lecture Notes
Virtualization
Introduction to Cloud Computing
• Vision of Computing Utilities: Computing is being transformed into a model where services
are commoditized and delivered like utilities such as water, electricity, gas, and telephony.
Users access services based on requirements, regardless of hosting location. Cloud computing is
the latest paradigm aiming to make "computing utilities" a reality.
• Accessibility and Flexibility: Anyone with a credit card can subscribe to cloud services, deploy
and configure servers in hours, grow and shrink infrastructure based on demand, and pay only for
resource usage time. Users don't need to invest heavily or maintain complex IT infrastructure.
• Core Principle: Cloud computing turns IT services into utilities. This is enabled by the maturity
of technologies like Web 2.0 (Internet as a rich application platform), service orientation (familiar
abstractions), and virtualization (customization, control, flexibility).
• Key Advantage: It allows integrating additional capacity or features into existing systems,
which is more attractive than buying new infrastructure whose sizing is hard to estimate and
needs are limited in time. This has made it a popular phenomenon.
• Long-Term Vision: The long-term vision is an open environment where computing, storage,
and other services are traded as computing utilities in a global digital market. This "cloud
marketplace" would enable automated discovery and integration of services into existing
software, reducing barriers between consumers and providers.
Cloud computing is computing delivered as a utility (like electricity or water). It allows users to
access computing resources on-demand with pay-as-you-go pricing. The vision originated from
Leonard Kleinrock’s ARPANET utility computing idea (1969).
• Internet-Centric: The term "cloud" historically represented networks, and in cloud computing,
it signifies an Internet-centric way of computing, as the Internet is the medium and platform for
service delivery.
• Armbrust et al. Definition: Cloud computing refers to "both the applications delivered as
services over the Internet and the hardware and system software in the datacenters that provide
those services". This encompasses the entire stack, from hardware to high-level applications,
introducing "Everything as a Service" (XaaS).
• NIST Definition: Cloud computing is "a model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction".
• Buyya et al. Definition: "A cloud is a type of parallel and distributed system consisting of a
collection of interconnected and virtualized computers that are dynamically provisioned and
presented as one or more unified computing resources based on service-level agreements
established through negotiation between the service provider and consumers".
Diagram: Cloud Vision
I cannot invest
in
infrastructure, I
just started my I have
business…. infrastructure and
middleware and I
can host
applications
I want to focus on
application logic
and not
maintenance and
scalability issues
I have
infrastructure and
provide application
services
I want to access
and edit my
documents and
photos from
everywhere..
• Large enterprises: Can offload activities. The New York Times converted its digital library
using Amazon EC2 and S3 for a short period, then relinquished resources with no additional
costs.
• Small enterprises and start-ups: Can translate ideas into business results quickly without
excessive up-front costs. Animoto scaled from 70 to 8,500 servers in one week using Amazon
Web Services, owning no servers.
• System developers: Can focus on business logic rather than infrastructure management. Little
Fluffy Toys developed a widget on Google AppEngine and was on the market in one week.
• End users: Can access documents and data anytime, anywhere, from any device. Apple iCloud
stores documents in the cloud, allowing seamless access and editing across devices (smartphone,
laptop, tablet) without physical connections.
Deployment Models
• Public Clouds: Most common, where IT infrastructure (e.g., virtualized datacenters) is
established by a third-party provider and made available to any consumer on a subscription
basis. Users' data and applications are deployed on the vendor's premises.
• • Private/Enterprise Clouds: Large organizations replicate the cloud IT service delivery
model in-house. Driven by the need to keep confidential information within premises,
preferred by governments and banks with high security and privacy concerns.
• • Hybrid Clouds: Composed of public cloud resources and privately owned infrastructures.
Used when private cloud resources cannot meet QoS requirements, offering a common way
to explore cloud possibilities.
Infrastructure as a Service
Virtualized Servers
Storage and Networking
Examples: Amazon EC2, S3, Rightscale, vCloud
Classifies cloud computing service offerings into three major categories, forming a layered view
of the computing stack:
◦ Virtual storage: Raw disk space or object store. Examples: Amazon S3.
◦ Use Case: Users building dynamically scalable computing systems requiring specific
software stacks, scalable websites, or background processing.
◦ User Responsibility: Users focus on application logic, leveraging provider's APIs and
libraries.
• Software-as-a-Service (SaaS): Provides applications and services on demand at the top of the
stack.
◦ Use Case: Targets end users benefiting from elastic scalability without software
development, installation, configuration, or maintenance. Suitable when existing SaaS fits needs
with minimal customization.
• Economic Benefits:
◦ Reduced Maintenance Costs: Responsibility shifts to the cloud service provider, who
benefits from economies of scale.
• Operational Benefits:
◦ Reduced Capacity Planning: Organizations can react rapidly to unplanned surges in demand
(e.g., adding/dismissing servers for workload spikes).
◦ Ease of Scalability: Leverage potentially huge cloud capacity to extend IT capability across
the entire computing stack (IaaS, PaaS, SaaS offerings).
• End-User Benefits: Data and processing capabilities are always available from anywhere,
anytime, through multiple devices, via Web-based interfaces. Eliminates the need for
considerable software investments for tasks like office automation or photo editing.
• New Opportunities: Service orientation and on-demand access allow creating new service
offerings by aggregating existing ones and focusing on added value with limited costs.
Challenges Ahead
Challenges include: security, privacy, compliance, legal issues due to geo-distribution,
interoperability, standardization, and technical provisioning.
• Dynamic Provisioning Challenges: Determining how many resources to provision and for how
long to maximize benefit.
• Integration Issues: Integrating real and virtual infrastructure, especially concerning security
and legislation.
• Security:
◦ Weak Point: Data needs to be decrypted in memory for processing, and virtualization allows
malicious providers to capture memory pages transparently.
• Legal Issues:
Mainframes
• Distributed Systems: Clouds are essentially large distributed computing facilities. A distributed
system is a "collection of independent computers that appears to its users as a single coherent
system". They share resources and utilize them better. Clouds exhibit properties like scalability,
concurrency, and continuous availability.
◦ Mainframes (1950s): First large computational facilities using multiple processing units,
highly reliable, "always on". Used for bulk data processing; evolved versions still used for
transaction processing. Offered computing power as a service by providers like IBM.
• Relationship to Cloud: Cloud computing embodies aspects of all three: deployed in large
datacenters by single organizations (like mainframes), virtually infinite capacity, fault-tolerant,
always on. Uses commodity machines (like clusters). Services consumed on a pay-per-use basis,
fully implementing the utility vision introduced by grids.
• Virtualization: Core technology allowing abstraction of hardware, runtime environments,
storage, and networking. Overcame past limitations on efficiency, now fundamental for cloud
computing. Confers customization and control for users, sustainability for providers.
• Web 2.0 (around 2004): The primary interface for cloud services. Transformed the Web into a
rich platform for application development, facilitating interactive information sharing,
collaboration, user-centered design, and application composition. Brings interactivity, flexibility,
enhanced user experience (e.g., AJAX, Web Services, XML). Enabled dynamic applications with
continuous updates and features without client-side deployments. Promotes loose coupling and
composition of services. Examples: Google Documents, Flickr, Facebook. Made people
accustomed to using the Internet for everyday life and accepting IT infrastructure delivery via
Web interface.
• Service-Oriented Computing (SOC): Core reference model for cloud systems, using services
as main building blocks. Supports rapid, low-cost, flexible, interoperable, and evolvable
applications.
◦ Web Services (WS): Popular expression of service orientation, making Web consumable by
applications, not just humans. Expose functionalities via HTTP, interface inferred by WSDL
(XML for service characteristics), interaction via SOAP (XML for method invocation/results).
Platform independent and accessible to WWW.
• → 1950s – Mainframes
• → 1980s – Clusters
• → 1990s – Grids
• → 2000s – Clouds
◦ Web Applications: Performance influenced by varying user demands, often complex multi-
tier applications susceptible to inappropriate infrastructure sizing or workload variability.
◦ How Cloud Enables This: Provides methods for renting compute/storage/networking; offers
scalable runtime environments; provides application services mimicking desktop apps but hosted
by provider. Leverages service orientation, accessible via simple Web interfaces (often REST
Web services).
◦ Distributed Systems: Cloud systems are distributed systems, with the major challenge being
the extreme dynamism (new nodes/services provisioned on demand). IaaS offers resource
addition/removal, PaaS embeds control algorithms for provisioning. Integration with existing
systems is a concern.
◦ Web 2.0 and Service Orientation: Web 2.0 technologies are the interface for cloud services.
Web services are primary access points programmatically. Cloud computing is summarized as
XaaS (Everything-as-a-Service), highlighting service orientation's central role.
◦ Design Considerations: Dynamism, scale, and volatility of components should guide the
design of cloud systems. Cloud computing provides mechanisms to address demand surges by
replicating components under stress.
◦ Elastic Compute Cloud (EC2): Customizable virtual hardware instances for base
infrastructure, various configurations (GPU, cluster instances), deployed via Web portal or Web
services API. Allows saving running instances as images (templates).
◦ Simple Storage Service (S3): Delivers persistent storage on demand, organized into buckets
containing objects (files, disk images) accessible globally.
• Google AppEngine: Scalable runtime environment primarily for Web applications, leveraging
Google's large infrastructure for dynamic scaling.
◦ Development: SDK for local development/testing, easy migration to AppEngine, cost quotas.
Supports Python, Java, Go.
• Microsoft Azure: Cloud operating system and platform for developing cloud applications.
◦ Roles: Applications organized around roles (distribution units embodying logic): Web role
(hosts Web app), Worker role (generic container for workload processing), Virtual Machine role
(fully customizable virtual environment including OS).
• Apache Hadoop: Open-source framework for processing large data sets on commodity
hardware.
◦ Usage: Developers provide input data and specify map/reduce functions. Yahoo! is a major
sponsor, uses Hadoop for its cloud infrastructure and business processes, manages the world's
largest Hadoop cluster.
• Force.com and Salesforce.com: Force.com is a cloud computing platform for social enterprise
applications.
Virtualization
Virtualization abstracts hardware, storage, network, and runtime environments. It enables
multiple virtual machines (VMs) on a single physical machine.
Introduction to Virtualization
• Definition: A broad umbrella of technologies and concepts providing an abstract environment
(virtual hardware, operating system) to run applications. It creates a secure, customizable, and
isolated execution environment, even for untrusted applications, without affecting others.
• Scope: While often synonymous with hardware virtualization (crucial for IaaS), virtualization
applies to operating system level, programming language level, application level, and also
to storage, memory, and networking.
◦ Increased Performance & Computing Capacity: Modern PCs and supercomputers have
ample resources to host virtual machines with acceptable performance.
◦ Lack of Space: Data centers are growing rapidly, and companies seek ways to accommodate
additional capacity without building new centers. This led to server consolidation, where
virtualization is fundamental.
◦ Greening Initiatives: Data centers are major power consumers. Server consolidation through
virtualization reduces the number of active servers, significantly cutting cooling and power
consumption, thus reducing carbon footprint.
◦ Rise of Administrative Costs: Power and cooling costs exceed IT equipment costs. More
servers mean higher administrative costs (monitoring, setup, updates, backups). Virtualization
reduces server count, lowering labor costs.
◦ Maturity of VM-based Programming Languages: The popularity of Java (1995) and .NET
Framework (2002), both based on virtual machine models, demonstrated that technology could
support virtualized solutions without significant performance overhead, paving the way for more
radical forms of virtualization.
Virtualization Layer
Software Emulation
• Components:
• Increased Security:
◦ Controlled Execution: Virtual machine manager controls and filters guest activity,
preventing harmful operations. Resources from the host can be hidden or protected.
◦ Isolation: Sensitive host information is naturally hidden. Essential for untrusted code (e.g.,
Java applets running in a sandboxed JVM with limited resource access) [162, 163, 166n].
Hardware virtualization solutions (VMware Desktop, VirtualBox) provide completely separated
file systems for guest OS from host.
◦ Sharing: Creates separate computing environments within the same host, fully exploiting
powerful host capabilities that would otherwise be underutilized. Important in virtualized data
centers for reducing active servers and power consumption.
◦ Aggregation: Groups separate hosts to be represented as a single virtual host to guests (e.g.,
cluster management software).
◦ Emulation: Controls and tunes the environment exposed to guests. Can emulate a completely
different environment from the host, useful for testing across platforms or running legacy
software on emulated hardware.
◦ Performance Tuning: Fine-tunes resource properties exposed via the virtual environment,
enabling effective Quality of Service (QoS) infrastructure and fulfilling Service-Level
Agreements (SLAs).
◦ State Capturing and Migration: Allows capturing the state of a guest program, persisting it,
and resuming execution. Virtual machine migrationenables moving a virtual image to another
machine and resuming execution transparently. Live migration moves a running instance without
interruption.
• Portability:
◦ Hardware Virtualization: Guest packaged into a virtual image that can be moved and
executed on different virtual machines, similar to picture files. Proprietary formats often require
specific VMM.
◦ General: Allows users to carry their own system ready to use, as long as the VMM is
available.
Virtual
Resources
Physical
Resources
Emulation Application
Execution
Environmen Programming
Process Level High-Level VM
Language
t
Storage Operating
Multiprogramming System
Virtualization
Network Hardware-assisted
Virtualization
Full Virtualization
System Level Hardware
…. Paravirtualization
Partial Virtualization
Virtualization enables elasticity, multitenancy, and dynamic provisioning – the foundation of
cloud computing.
Virtualization techniques
Virtualization techniques are classified by the service/entity emulated and how it's done.
◦ Machine Reference Model: Modern computing systems have layers: Hardware (ISA),
Operating System (ABI), Applications/Libraries (API). Virtualization techniques replace one of
these layers and intercept calls.
Applications Applications
API calls
API
Libraries Libraries
User
ABI System calls ISA
User
ISA
Operative System Operative System
ISA
ISA
Hardware Hardware
▪ Instruction Set Architecture (ISA): Defines instruction set for processor, registers,
memory, interrupt management. Interface between hardware and software.
Guest
In memory
representation
Virtual Image
Storage
Virtual Machine
binary translation
instruction mapping
interpretation
……
Host
• Type II (Hosted): Requires an operating system, runs as a program managed by the OS,
emulates ISA for guests. Examples: VMware Workstation, VirtualBox.
VM VM VM VM
ISA
ISA ISA
Hardware Hardware
ISA
Instructions (ISA)
Interpreter
Interpreter
Dispatcher
Routines
Routines
Allocator
▪ Popek and Goldberg Criteria (1974): VMM must satisfy Equivalence(guest behaves
same as on physical host), Resource Control (VMM has complete control of virtualized
resources), and Efficiency (statistically dominant fraction of instructions execute without VMM
intervention). Theorem 3.1 states VMM construction is possible if sensitive instructions are a
subset of privileged ones. Theorem 3.2 defines recursive virtualizability. Theorem 3.3 describes
hybrid VMM construction.
▪ Hardware Virtualization Techniques:
▪ Examples: Wine (Unix for Windows apps), CrossOver (Mac OS X for Windows apps),
VMware ThinApp (packages installed apps into isolated executable images).
▪ External: Aggregates physical networks into single logical network (e.g., Virtual LAN -
VLAN).
◦ Server Consolidation: Aggregates virtual machines over fewer, fully utilized physical
resources, reducing active resources and saving energy.
◦ Virtual Machine Migration: Movement of virtual machine instances for consolidation. Live
migration (moving while running) is more complex but more efficient, causing no service
disruption [252, 253n].
• Revamped Concepts:
◦ Storage Virtualization: Vendors with huge storage facilities can harness them into
partitionable, dynamic virtual storage services.
◦ Desktop Virtualization: Cloud computing revamps this concept, enabling a complete virtual
computer hosted by a provider and accessed by a thin client over the Internet.
VM
VM VM VM VM VM
Server A Server B
(running) (running)
Before Migration
VM VM VM
VM VM VM
Server A Server B
(running) (inactive)
After Migration
• Managed Execution and Isolation: Allows building secure and controllable computing
environments. Virtual environments configured as sandboxes prevent harmful operations.
Simplified resource allocation and partitioning, enabling fine-tuning for server consolidation and
QoS.
• Portability: Virtual machine instances are typically files, easily transported and self-contained
(few dependencies beyond VMM). Java programs "compiled once and run everywhere". Enables
migration techniques in server consolidation.
• Reduced Maintenance Costs: Fewer physical hosts mean lower maintenance burden, as guest
programs have limited ability to damage underlying hardware.
• Efficient Resource Use: Multiple systems can securely coexist and share host resources without
interference. Prerequisite for server consolidation, dynamically adjusting active physical
resources to current load, leading to energy savings and environmental benefits.
Disadvantages
• Performance Degradation:
◦ Increased Latencies: Virtualization interposes an abstraction layer between guest and host.
◦ Scheduling: If VMM runs on host OS (Type II), it shares resources with other applications,
causing performance degradation.
◦ Inaccessible Host Features: Abstraction layer may not expose all specific host features (e.g.,
device drivers, graphics card capabilities).
◦ Limited Features: Early Java had limited graphics support compared to native applications.
Xen: Paravirtualization
• Architecture:
◦ Xen Hypervisor: Runs in the highest privileged mode (Ring 0, or Ring -1 with hardware-
assisted virtualization). Controls guest OS access to hardware.
◦ Domain U (User Domain): Other domains running guest OS (typically in Ring 1 or 0 with
hardware assistance). User applications run in Ring 3, maintaining ABI unchanged.
• Limitations: Requires modified OS codebase; thus, not all OS can be guests in Xen-based
environments without hardware-assisted virtualization. Legacy hardware/OS cannot be modified
or run safely in Ring 1. Open-source OS (Linux) are easily modified; Windows generally not
supported without hardware-assisted virtualization.
User Applications
Management Domain (Domain 0) (unmodified ABI)
• VM Management
• HTTP interface
• Access to the Xen Hypervisor Ring 3
User Domains (Domain U)
Ring 2 • Guest OS
Ring 1 • Modified codebase
• Hypercalls into Xen VMM
Ring 0
Privileged
instruction
s
Xen Hypervisor (VMM)
• Memory management
• CPU state registers
• Devices I/O
Hardware
trap
Hardware (x86)
Hypervisor
• Binary translation
• Instruction caching
Hardware trap
Dynamic / cached translation (sensitive
(sensitive instructions) instructions)
Hardware (x86)
◦ x86 Virtualization: x86 architecture did not originally satisfy Popek and Goldberg's first
theorem (sensitive instructions not subset of privileged). Older VMware products (before
hardware-assisted virtualization in 2006) used dynamic binary translation to run unmodified
x86 guest OS.
◦ Mechanism: When sensitive instructions cause a trap, they are translated into an equivalent
set that avoids exceptions, and the translated instructions are cached for performance.
◦ Advantages: Guests run unmodified, crucial for OS without source code (e.g., Windows).
More portable solution for full virtualization.
• Memory and I/O Virtualization: Achieves full virtualization of memory and I/O devices.
Memory virtualization is challenging due to MMU emulation; TLB (translation look-aside buffer)
direct mapping reduces impact. Provides full virtualization of network controllers, peripherals
(keyboard, mouse, disks, USB).
• Virtualization Solutions:
Hardware (x86)
◦ Server Virtualization:
▪ VMware GSX Server: Replicates desktop approach for servers, adding remote
management. Daemon serverd controls VMware app processes, connected to VM instances via
VMware driver.
serverd
(daemon VMware
)
VMware
Web VM VM VM
Server VMware Instance Instance Instance
Hardware (x86)
▪ VMware ESX Server & ESXi Server: Type I hypervisors installed on bare metal. ESX
embeds modified Linux (service console for hypervisor access), ESXi has a very thin OS layer
with remote management interfaces.
Virtual Ethernet
Distributed VM
adapter and
file system
switch
Hardware
◦ VMware vCloud: Turns virtualized data centers into an IaaS cloud, allowing providers to
offer on-demand virtual computing environments on a pay-per-use basis. Web portal for self-
provisioning VMs and setting up virtual networks.
◦ VMware vFabric: Platform for application development in the cloud, components for
scalable Web applications on virtualized infrastructure (monitoring, data management, Java Web
app execution/provisioning).
◦ Zimbra: SaaS solution for office automation, messaging, collaboration, hosted in the cloud.
• Observations: VMware started with full x86 virtualization but integrated paravirtualization
features (e.g., VMware Tools, VMI - vendor-independent Virtual Machine Interface).
Application
Zimbra
Virtualization
Platform
vFabric
Virtualization
vCloud
vCenter vCenter
Infrastructure
vSphere vSphere vSphere vSphere Virtualization
Cloud
Microsoft Hyper-V
◦ Parent Partition: Direct access to hardware, runs virtualization stack, hosts drivers for guest
OS, creates child partitions via hypervisor. Always hosts a Windows Server 2008 R2 instance.
Manages child partition creation/execution/destruction via Virtualization Infrastructure Driver
(VID). Instantiates a Virtual Machine Worker Process (VMWP) for each child partition.
Accessible remotely via WMI provider.
◦ Child Partitions: Host guest OS, no direct hardware access. Interaction controlled by parent
partition or hypervisor. Two types: Enlightened (Hypervisor-aware, benefit from Enlightened
I/O) and Unenlightened (Hypervisor-unaware, rely on less efficient device driver emulation).
◦ Memory Service Routines (MSRs): Control memory access from partitions, leverage
hardware-assisted virtualization (I/O MMU or IOMMU) for fast device access.
• Enlightened I/O and Synthetic Devices: Optimized I/O method allowing hypervisor-aware
guests to use an interpartition communication channel (VMBus) instead of hardware emulation
stack.
◦ Benefit: Enhanced performance for I/O (storage, networking, graphics, input). Also improves
child-to-child I/O via virtual networks.
VMWPs
User Applications User Applications User Applications
(Ring 3) (Ring 3) (Ring 3)
VMMS WMI
Unenlightened Child
Root / Parent Partition Enlightened Child Partition Enlightened Child Partition Partition
◦ Windows Server Core: Reduced version of Windows Server 2008 with fewer features (no
GUI, .NET Framework) for reduced maintenance, attack surface, management, and disk space.
Managed remotely via PowerShell and WMI.