A fully configurable HPC web
portal for managing Slurm jobs
Patrice Calegari
Slurm User Group SLUG’19
Salt Lake City, USA - September 18, 2019
© Atos
We will talk about…
Context of the projects
XCS - eXtreme factory Computing Studio
BEM - Bull Efficiency Manager
Conclusion and future work
2 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
Context of the projects
Bull/Atos HPC & AI Software R&D
▶ Our division, Atos BDS (Big Data & Security) is in charge of developing
supercomputing hardware and middleware.
▶ Our domains of interests: HPC, AI and Quantum simulations.
▶ User experience (UX) is extremely important
▶ Security is critical in all our activities (and those of our clients)
▶ We contribute to Slurm community and integrate Slurm in our HPC stack for
more than 10 years
4 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS
eXtreme factory
Computing Studio
Extreme factory Computing Studio v3 (XCS3)
Introduction
▶ Modular HPC, AI & Quantum portal
– as-a-Service cornerstone application,
– supports Slurm (and other schedulers)
– Role Based Access Control (RBAC)
– supports AD, LDAP (with Kerberos)
– XCS = REST API service + GUI
▶ Fully customizable user interface
– Responsive Web Design (RWD) GUI
– Single Page Application (SPA) with
configurable dashboards: layout,
components, languages, themes
Latest release: XCS 3.8.0 (April 5, 2019)
6 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS REST API
https://public.extremefactory.com/demo/api/doc/api-full.html
7 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS REST API
https://public.extremefactory.com/demo/app/api/doc/api-full.html
8 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS REST API
https://public.extremefactory.com/demo/app/api/doc/api-full.html
9 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS user dashboard
Example 1: 8 components
10 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS user dashboard
Example 2: 1 component
11 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS user dashboard
Example 3: 6 components with edited theme
12 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS dashboard main menu
import/export dashboards
13 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS dashboard main menu
REST API documentation
14 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS Fundamental concepts
Key software product for HPCaaS solutions
Give users and admins access to resources through web services
• Use of a GUI in a web browser that relies on a REST API
Be compatible with « all possible » environments
• Software, frameworks, middleware
Never be intrusive
• The solution should be used in existing environments without modifying them
Keep all the intelligence in the REST API server
• The goal of the GUI is only to be the HMI (Human Machine Interface)
15 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS architecture
current v3
Job submission HPC cluster
DC XCS SSH integration layer
HTTPS REST • Slurm
DC API
XCS DCs API • HPC applications
Data mngmt HTTPS
web
DC server XCS
Data base
XCS GUI
web server Security Directory
XCS web User Interface
• Dashboards service service
• Web Design
DC = Dashboard Component
16 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
Slurm job submission workflow with XCS
sbatch … Appli.sh $arg1 …
17 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS application administrator dashboard
HPC application general information
18 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS application administrator dashboard
HPC application form definition
19 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Bull Efficiency Manager
Bull Efficiency Manager (BEM)
Introduction
▶ Slurm has been enhanced by Bull/Atos to provide additional functionality
including topology-aware resource allocation and advanced placement policies,
▶ Bull Efficiency Manager (BEM) is the web application running upon the
Slurm workload manager to show cluster details interactively,
▶ BEM dashboards show information in graphs and tables for both current and
previous archived data about cluster resources.
21 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS architecture
current v3
Switch
HPC cluster
Topology DC BEM SSH integration layer
HTTPS REST
DC API
BEM DCs • Slurm
API
Slurm usage HTTPS
web
history DC server BEM
Data base
XCS GUI
web server Security Directory
BEM web User Interface • Dashboards service service
• Web Design
DC = Dashboard Component
22 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Login Page
23 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Current resource usage 1/3
24 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Current resource usage 2/3
25 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Current resource usage 3/3
26 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Historical resource usage
27 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Topology resource allocation 1/3
28 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Current resource usage 2/3
29 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
BEM
Current resource usage 3/3
30 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
Conclusion & Future Work
Conclusions
▶ XCS is successfully used in production on many sites for several years and it
evolves continuously
▶ BEM is still under development and the first Minimal Viable Product (MVP) is
very promising
▶ Mobile devices are becoming a new standard way for doing “everything”, so
such a web portal approach will soon be mandatory for new users
(unexperienced users, young scientist of the new generation, non-technical
managers, etc.)
32 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
On going and future work
▶ Unify both interfaces (XCS & BEM) and share a unique security service
▶ Add new features to administrate Slurm
▶ We develop a new web portal framework to federate all our HPC, AI & Quantum
tools/microservices. It is an evolution of our current XCS solution with:
– a generic web GUI framework
– a security service (with flexible identity, authentication with SSO and
authorization management).
– global services (reverse proxy, gateway, discovery service, etc.)
33 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS and BEM architecture
Complete solution to be developed in 2020
Job submission HPC cluster
DC XCS integration layer
SSH
REST • Slurm
DC API
HTTPS HTTPS
XCS DCs API • HPC applications
web
Data mngmt
DC server XCS
Data base
Unified GUI HTTPS
NEW unified web User web server Security Directory
Interface • Dashboards service service
• Web Design
BEM
Switch
Topology DC BEM Data base
REST
DC API
HTTPS
BEM DCs API SSH BEM
web integration layer
Slurm usage
history DC server • Slurm
DC = Dashboard Component
34 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
XCS and Slurm native REST service architecture
Possible evolution…
Job submission HPC cluster
DC XCS integration layer
SSH
REST • Slurm
DC API
HTTPS HTTPS
XCS DCs API • HPC applications
web
Data mngmt
DC server XCS
Data base
Unified GUI HTTPS
NEW unified web User web server Security Directory
Interface • Dashboards service service
• Web Design
Slurm job Slurm
specific DC REST Slurm
DC API
HTTPS Data base
Slurm DCs API
deamon
Slurm admin
Slurm server
slurm.restd
specific DC
DC = Dashboard Component
35 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software
Thank you
For more information please contact:
Mathis Clayer for Slurm topics ([email protected])
Patrice Calegari for GUI topics ([email protected])
Atos, the Atos logo, Atos Syntel, Unify, and Worldline are registered trademarks of the
Atos group. May 2019. © 2019 Atos. Confidential information owned by Atos, to be used
by the recipient only. This document, or any part of it, may not be reproduced, copied,
circulated and/or distributed nor quoted without prior written approval from Atos.
More on HPC web portals
▶ Web Portals for High-performance Computing: A Survey
– 36 page journal paper published by ACM
– https://dl.acm.org/citation.cfm?id=3197385
▶ Democratization of HPC through the Use of Web Portals: Different
Strategies
– Panel at SC’19 in Denver, November 20th, 3:30pm-5pm
– https://sc19.supercomputing.org/presentation/?id=pan102&sess=sess223
37 | 18-09-2019 | Patrice Calegari | © Atos
HPC & AI R&D Software