Scheduling in Openstack
Workshop CCR-INFN Eric Frizziero (INFN - Padova)
29 May 2014
The nova-scheduler LOGO
●
Nova-scheduler is the component responsible to decide which
compute node should be used when a VM instance is to be lunched
●
it interacts with other components through queue
●
for scheduling, queue is the essential communications hub
●
nova-scheduler makes decisions by collecting information about
compute resources
●
it has a number of configuration options that can be accessed and
modified in the configuration file “[Link]”
●
FilterScheduler is the default
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 2/19
29 May 2014
The nova-scheduler LOGO
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 3/19
29 May 2014
The nova-scheduler LOGO
The scheduler process is divided into the following phases:
●
Getting the current state of all the compute nodes: it will generate a list of hosts
●
Filtering phase will generate a list of suitable hosts by applying filters
●
Weighting phase will sort the hosts according to their weighted cost scores, which
are given by applying some cost functions
● The sorted list of hosts is candidated to fulfill the user's request.
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 4/19
29 May 2014
The user request LOGO
2013-12-17 [Link].895 3032 DEBUG [Link] [-] READ[3e8ea28]:
'\x0b\x01\x00\x16\x00\x01\x00\x00\x00\x00\x00\x00\x04\x01\x01\x00\x07\x00\x010\x01\x00\x03\x02\x00U\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00.\x04\x
03\x10\x01\x08amqp/map\x00\x00\x00\x1d\x00\x00\x00\x01\[Link]\x95\x00\tscheduler\x00\x00\x00\x13\x04\x01\x00\x03\x04nova\tscheduler\x07\x03\x
12\xeb\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x12\xdb\x00\x00\x00\x02\[Link]\x95\x12\xb4{"_context_roles": ["_member_", "admin"],
"_context_request_id": "req-066522b7-e5f2-4af2-845f-d13380270324", "_context_quota_class": null, "_context_service_catalog": [], "_context_tenant":
"e70bb1af044648e9b1ccc62836c101b0", "args": {"legacy_bdm_in_spec": false, "request_spec": {"instance_type": {"root_gb": 20, "name": "[Link]", "ephemeral_gb": 0,
"memory_mb": 2048, "vcpus": 1, "extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "flavorid": "2", "vcpu_weight": null, "id": 5}, "num_instances": 1, "block_device_mapping":
[{"instance_uuid": "41b812a9-2bb7-4d7c-b1b2-ef1f6a5e9ee2", "guest_format": null, "boot_index": 0, "no_device": null, "connection_info": null, "image_id": "38a64756-
827a-4c03-a22d-f601ea060019", "volume_id": null, "device_name": null, "disk_bus": null, "volume_size": null, "source_type": "image", "device_type": "disk",
"snapshot_id": null, "destination_type": "local", "delete_on_termination": true}], "instance_properties": {"vm_state": "building", "availability_zone": "nova",
"terminated_at": null, "ephemeral_gb": 0, "instance_type_id": 5, "user_data": null, "cleaned": false, "vm_mode": null, "deleted_at": null, "reservation_id": "r-c3dojj8b",
"id": 5, "security_groups": {"objects": []}, "disable_terminate": false, "root_device_name": null, "display_name": "centos", "uuid": "41b812a9-2bb7-4d7c-b1b2-
ef1f6a5e9ee2", "default_swap_device": null, "info_cache": {"instance_uuid": "41b812a9-2bb7-4d7c-b1b2-ef1f6a5e9ee2", "network_info": []}, "hostname": "centos",
"launched_on": null, "display_description": "centos", "key_data": null, "deleted": false, "config_drive": "", "power_state": 0, "default_ephemeral_device": null, "progress":
0, "project_id": "e70bb1af044648e9b1ccc62836c101b0", "launched_at": null, "scheduled_at": null, "node": null, "ramdisk_id": "6580f36e-2f88-4f01-ad4e-a7696098dd3c",
"access_ip_v6": null, "access_ip_v4": null, "kernel_id": "745d2e2b-9996-4a5a-a33e-4df75f6c691b", "key_name": null, "updated_at": null, "host": null, "user_id":
"ccdc7c1d0d114b8a972e3e7f1a032c99", "system_metadata": {"image_kernel_id": "745d2e2b-9996-4a5a-a33e-4df75f6c691b", "image_min_disk": 20,
"instance_type_memory_mb": 2048, "instance_type_swap": 0, "instance_type_vcpu_weight": null, "instance_type_root_gb": 20, "instance_type_name": "[Link]",
"image_ramdisk_id": "6580f36e-2f88-4f01-ad4e-a7696098dd3c", "instance_type_id": 5, "instance_type_ephemeral_gb": 0, "instance_type_rxtx_factor": 1.0,
"instance_type_flavorid": "2", "image_container_format": "ami", "instance_type_vcpus": 1, "image_min_ram": 0, "image_disk_format": "ami", "image_base_image_ref":
"38a64756-827a-4c03-a22d-f601ea060019"}, "task_state": "scheduling", "shutdown_terminate": false, "cell_name": null, "root_gb": 20, "locked": false, "name": "instance-
00000005", "created_at": "2013-12-17T[Link].788072", "locked_by": null, "launch_index": 0, "memory_mb": 2048, "vcpus": 1, "image_ref": "38a64756-827a-4c03-a22d-
f601ea060019", "architecture": null, "auto_disk_config": false, "os_type": null, "metadata": {}}, "security_group": ["default"], "image": {"status": "active", "name":
"centos", "deleted": false, "container_format": "ami", "created_at": "2013-12-17T[Link].000000", "disk_format": "ami", "updated_at": "2013-12-17T[Link].000000",
"properties": {"kernel_id": "745d2e2b-9996-4a5a-a33e-4df75f6c691b", "ramdisk_id": "6580f36e-2f88-4f01-ad4e-a7696098dd3c"}, "min_disk": 0, "min_ram": 0,
"checksum": "5eba5009290eb1534f8f22a3245e7c48", "owner": "e70bb1af044648e9b1ccc62836c101b0", "is_public": false, "deleted_at": null, "id": "38a64756-827a-
4c03-a22d-f601ea060019", "size": 1073741824}, "instance_uuids": ["41b812a9-2bb7-4d7c-b1b2-ef1f6a5e9ee2"]}, "is_first_time": true, "filter_properties":
{"instance_type": {"disabled": false, "root_gb": 20, "name": "[Link]", "flavorid": "2", "deleted": 0, "created_at": null, "ephemeral_gb": 0, "updated_at": null,
"memory_mb": 2048, "vcpus": 1, "extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": null, "id": 5}, "scheduler_hints": {}},
"admin_password": "8xsGpfUPKBh9", "injected_files": [], "requested_networks": [["f6d1fb29-f212-4640-afb8-748879462eb8", null, null]]}, "_unique_id":
"da1731c3e72947fab6ec39670b9bfe01", "_context_timestamp": "2013-12-17T[Link].484714", "_context_user_id": "ccdc7c1d0d114b8a972e3e7f1a032c99",
"_context_project_name": "admin", "_context_read_deleted": "no", "_context_auth_token": "0ecf459e8dcf5bd35e4ec1e4d167db10", "namespace": null,
"_context_instance_lock_checked": false, "_context_is_admin": true, "version": "2.9", "_context_project_id": "e70bb1af044648e9b1ccc62836c101b0", "_context_user":
"ccdc7c1d0d114b8a972e3e7f1a032c99", "_context_user_name": "admin", "method": "run_instance", "_context_remote_address":
"[Link]"}\[Link]\x95\x00\x032.0' readable /usr/lib/python2.6/site-packages/qpid/messaging/[Link]
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 5/19
29 May 2014
The nova-scheduler LOGO
●
User requests are processed sequentially (FIFO scheduling)
➢
nova-scheduler doesn't provide any dynamic priority strategy algorithm
●
User requests not satisfied (e.g. resource not available) fails and will be
lost
➢
on that scenario, nova-scheduler doesn't provide queuing of the requests
●
OpenStack simply provides a partitioning of resources among more
projects / experiments (use of quotas)
➢
if a project has free quota (underutilized its resources), and another project
instead has consumed its quota, the only solution is to change the related
quotas by the Cloud Administrator
Nova-scheduler is mainly missing of:
✔
queuing of the requests
✔
fair-share algorithm in the resources provisioning
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 6/19
29 May 2014
The nova-scheduler LOGO
Resources unavailable: any new request fails and is lost.
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 7/19
29 May 2014
Fair-share algorithm LOGO
The priority is an integer and the larger the number, the higher the job will be
positioned in the queue, and the sooner the job will be scheduled
The fair-share is a component of the job's priority that influences the order in
which a user's queued jobs are scheduled to run
It guarantees the usage of the resources is equally distributed among users
and groups by considering the portion of the resources allocated to them (i.e.
share) and the resources they already consumed
●
historical resource utilization information is incorporated into priority decisions
We analyzed the fair-share algorithms implemented by the most relevant
LRMS
●
Selected SLURM's Priority MultiFactor strategy, a sophisticated and complete
fair-share algorithm
●
[Link]
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 8/19
29 May 2014
The SLURM fair-share formula LOGO
SLURM fair-share formula
F = 2**(-Ue/S)
Ue: user's effective usage
S: user's normalized share
Consider account = tenant
The share values must be defined
in the [Link]
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 9/19
29 May 2014
FairShareScheduler LOGO
FairShareScheduler, a pluggable scheduler with the objective to extend the
existing OpenStack scheduler by integrating a (batch like) dynamic priority
algorithm has been developed by INFN-PD
●
Selected the “Multifactor Priority” SLURM algorithm
●
FairShareScheduler will assign dynamically the proper priority to every user request
●
the priority at any given time will be a weighted sum of these factors (configurable):
age and fair-share
➢
priority = (PriorityWeightAge) * (age_factor) +
(PriorityWeightVCPUFairshare) * (fair-share-vcpu_factor) +
(PriorityWeightMemoryFairshare) * (fair-share-memory_factor)
●
The weight expresses the interest for a specific factor
➢
Example: you can configure fair-share-vcpu to be the dominant factor (say 80%),
set the age factor to contribute 20%, and set the fair-share-memory influences
to zero in the priority decision
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 10/19
29 May 2014
FairShareScheduler LOGO
●
all user requests will be inserted in a (persistent) priority queue and then
processed asynchronously by the dedicated process (filtering + weighting
phase) when compute resources are available
●
From the client point of view the queued requests remain in “Scheduling”
state till the compute resources are available
●
No new states added: this prevents any possible interaction issue with the
Openstack clients
●
User requests are dequeued by a pool of WorkerThreads (configurable)
➢
no sequential processing of the requests
●
the failed requests at filtering + weighting phase may be inserted again in
the queue for n-times (configurable)
●
the priority of the queued requests will be recalculated periodically (see
age_factor)
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 11/19
29 May 2014
FairShareScheduler: High level architecture LOGO
FairShareScheduler
0 [Link]
2
AMQP
AMQPclient
client
3
4
PriorityQueue
PriorityQueue FairShare
FairShare
6 Manager 5 Manager
Manager Manager
7
PQueue
PQueue ...
nova,
nova,keystone
keystone
1 8
9 filtering
filtering++weighting
weightingphases
phases
AMQP
AMQP
user req.
(nova api)
10
nova-compute
nova-compute
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 12/19
29 May 2014
FairShareScheduler LOGO
Resources unavailable: new requests remain Scheduling (no host, no ip address)
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 13/19
29 May 2014
FairShareScheduler LOGO
Resources unavailable. Terminate one instance.
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 14/19
29 May 2014
FairShareScheduler LOGO
The previous scheduled request goes Running (assigned both host and ip address).
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 15/19
29 May 2014
FairShareScheduler: Use Case 1 LOGO
QUACK (Queues in Openstack)
CCR Cloud Working Group task (Bari, CNAF, Padova)
Objective: to implement the integration of Cloud in an existing Grid
environment (LRMS)
●
use the same resources for Cloud computing and Grid computing (no partitioning)
●
All requests for resources (batch jobs, Cloud) will result in requests for allocations of
VMs to OpenStack
●
FairShareScheduler provides both queuing of the requests and fair-share algorithm
in the resources provisioning
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 16/19
29 May 2014
FairShareScheduler: Use Case 2 LOGO
Assessing a possible evolution of the CREAM architecture to submit jobs
directly to IaaS (with FairShareScheduler)
●
CREAM responsabile of the VM's life-cycle management
➢
jobs executed in the VMs
●
The batch system is not more necessary
➢
fair-share and queueing provided by the FairShareScheduler
●
Nothing changes for the Grid users
➢
No updates of the WS interfaces are foreseen
Virgo experiment has shown interest in this type of architecture
●
for further details, please see the yesterday Virgo presentation
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 17/19
29 May 2014
Conclusion and next steps LOGO
●
FairShareScheduler source code (for HAVANA) available in github:
➢
[Link]
➢
Authors: Eric Frizziero (INFN-PD), Lisa Zangrando (INFN-PD)
●
Testing in progress in Bari's Cloud Testbed
●
Installation of the FairShareScheduler in the “Cloud Area Padovana”
●
Started the integration process of the FairShareScheduler in IceHouse
Openstack release
●
Evaluate how to integrate this scheduler in the official OpenStack
distribution
●
FairshareScheduler's live demo at coffee break area
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 18/19
29 May 2014
FairShareScheduler LOGO
Thank you for your attention!
Questions?
Workshop CCR-INFN Eric Frizziero (INFN - Padova) 19/19
29 May 2014