Rela%onal	
  Cloud	
  
A	
  Database-­‐as-­‐a-­‐Service	
  for	
  the	
  Cloud	
  

            Paper	
  by	
  Carlo	
  Curino	
  et	
  al.	
  @mit.edu	
  
                                       	
  
              Presenta%on	
  by	
  Antonio	
  Severien	
  
                            severien@kth.se	
  	
  
Overview	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø 	
  Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                2	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                3	
  
Rela%onal	
  Databases	
  
Ø 1970	
  by	
  Edgar	
  Codd,	
  IBM	
  research	
  San	
  Jose	
  
Ø Tables	
  
    Ø Rows	
  à	
  Tuples	
  
    Ø Columns	
  à	
  AEributes	
  
Ø Rela%onal	
  Database	
  Management	
  Systems	
  
   (RDBMS)	
  




                                                                        4	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                5	
  
Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Cloud	
  
Ø Reduce	
  management,	
  opera%onal	
  	
  
   and	
  energy	
  costs	
  
Ø Elas%city	
  and	
  scale	
  economy	
        amazon	
  RDS	
  
Ø Pay-­‐per-­‐use	
  




                                                                     6	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                7	
  
Problems	
  AEacked	
  	
  


                      Efficient	
  	
  
                    Mul%-­‐tenancy	
  




Elas%c	
  Scalability	
                  Privacy	
  




                                                       8	
  
Efficient	
  	
  
Mul%-­‐tenancy	
  




                     9	
  
Efficient	
  Mul%-­‐tenancy	
  
Ø Reduce	
  costs	
  
Ø Efficient	
  usage	
  of	
  resources	
  
Ø Maximize	
  hardware	
  u%liza%on	
  
Ø Single	
  database	
  server	
  on	
  each	
  machine	
  
Ø Maintain	
  applica%on	
  query	
  performance	
  




                                                               10	
  
Efficient	
  Mul%-­‐tenancy	
  
Ø Reduce	
  costs	
  
Ø Efficient	
  usage	
  of	
  resources	
  
Ø Maximize	
  hardware	
  u%liza%on	
  
Ø Single	
  database	
  server	
  on	
  each	
  machine?	
  
Ø Maintain	
  applica%on	
  query	
  performance	
  
	
  


                       Virtual	
  Machine	
                     11	
  
Efficient	
  Mul%-­‐tenancy	
  
Ø Problems	
  
   Ø Monitoring	
  resource	
  requirements	
  for	
  workloads	
  
   Ø Predic%ng	
  the	
  load	
  generated	
  
   Ø Assigning	
  workloads	
  to	
  physical	
  machines	
  
   Ø Migra%ng	
  workloads	
  between	
  nodes	
  
   Ø Live	
  migra*on	
  




                                                                   12	
  
Efficient	
  Mul%-­‐tenancy	
  
Ø Kairos	
  (Monitoring	
  and	
  consolida%on	
  engine)	
  
   Ø Resource	
  Monitor	
  
       Disk	
  ac%vity	
  and	
  RAM	
  requirements	
  
   Ø Combined	
  Load	
  Predictor	
  
       CPU,	
  RAM,	
  Disk	
  model	
  that	
  predicts	
  the	
  combined	
  
       resource	
  requirements	
  
   Ø Consolida%on	
  Engine	
  
       Non-­‐linear	
  op%miza%on	
  techniques	
  to…	
  
         	
  …	
  minimize	
  the	
  number	
  of	
  machines	
  needed	
  
         	
  …	
  balance	
  load	
  between	
  back-­‐end	
  machines	
  

                                                                                  13	
  
Elas%c	
  Scalability	
  




                            14	
  
Elas%c	
  Scalability	
  
Ø Workload	
  exceeds	
  single	
  machine	
  capacity	
  
	
  

  Ø Scale	
  a	
  single	
  database	
  to	
  mul%ple	
  nodes	
  
  Ø Scale-­‐out	
  by	
  query	
  processing	
  par%%oning	
  
  Ø Granular	
  placement	
  and	
  load	
  balance	
  on	
  backend	
  




                                                                        15	
  
Elas%c	
  Scalability	
  
Ø Strategy	
  well	
  suited	
  for	
  OLTP	
  and	
  Web	
  
   workloads…	
  but	
  can	
  extend	
  to	
  OLAP	
  
Ø Minimize	
  cross-­‐node	
  distributed	
  transac%ons	
  
   	
  
Ø Workload-­‐aware	
  par**oner	
  
   Ø Par%%on	
  data	
  to	
  minimize	
  mul%-­‐node	
  transac%ons	
  
   Ø Front-­‐end	
  analyses	
  execu%on	
  traces	
  represented	
  
     as	
  a	
  graph	
  

                                                                       16	
  
Graph	
  Par%%oning	
  
                                                   we=2

                                                                               id	
     name	
       age	
     salary	
  


                    id	
     name	
     age	
     salary	
  



                                                                                                   we=1



                             we=10                     id	
     name	
     age	
        salary	
  




we :	
  weight	
  of	
  edge
                                                                                                                            17	
  
Graph	
  Par%%oning	
  
                                                   we=2

                                                                               id	
     name	
       age	
     salary	
  


                    id	
     name	
     age	
     salary	
  



                                                                                                   we=1



                             we=10                     id	
     name	
     age	
        salary	
  




we :	
  weight	
  of	
  edge
                                                                                                                            18	
  
Graph	
  Par%%oning	
  

                                                           id	
     name	
       age	
     salary	
  


id	
     name	
     age	
     salary	
  




                                   id	
     name	
     age	
        salary	
  




                                                                                                        19	
  
Privacy	
  




              20	
  
Privacy	
  
Ø Adjustable	
  security	
  
   Ø Onion	
  ring	
  encryp%on	
  design	
  
       2	
  onion	
  layer	
  and	
  1	
  homomorphic	
  encryp%on	
  of	
  integer	
  
   Ø SQL	
  query	
  on	
  encrypted	
  data	
  
   Ø Security	
  level	
  dynamically	
  adap%ve	
  
       Converge	
  to	
  an	
  overall	
  security	
  level	
  




                                                                                          21	
  
Onion	
  Layers	
  of	
  Encryp%on	
  
6.	
  RND:	
  no	
  func%onality	
               5.	
  RND:	
  no	
  func%onality	
        HOM:	
  addi%on	
  
                                                                                               int	
  value	
  
4.	
  DET:	
  equality	
  selec%on	
             3.	
  OPE:	
  inequality	
  select,	
  
                                                 min,	
  max,	
  sort,	
  group-­‐by	
  
  2.	
  DET:	
  equality	
  join	
                                                                  or	
  
                                                   1.	
  OPE:	
  inequality	
  join	
  
                                                                                            String	
  search	
  
               Value	
                                         Value	
  
                                                                                             string	
  value	
  


          Strong	
  
                           RND	
  =	
  Randomized	
  Encryp%on	
  (no	
  opera%ons	
  allowed)	
  
                           DET	
  =	
  Determinis%c	
  Encryp%on	
  	
  
                           OPE	
  =	
  Order-­‐preserving	
  Encryp%on	
  
                           HOM	
  =	
  Homomorphic	
  Encryp%on	
  (opera%ons	
  over	
  encrypted	
  data)	
  
         Weak	
  
                                                                                                                   22	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                23	
  
Rela%onal	
  Cloud	
  Architecture	
  
                                                                        Client	
  Nodes	
  
                               Users	
                                                  User	
  Applica%on	
  
                                                                            JDBC-­‐client	
  (CryptoDB	
  enabled)	
  
 Trusted	
  Pla,orm	
  (Private/Secured)	
                   Privacy-­‐preserving	
                     Privacy-­‐preserving	
  
 Untrusted	
  Pla,orm	
  (Public)	
                               Queries	
                                   Results	
  

    Admin	
  Nodes	
                                   Frontend	
  Nodes	
  
                                                         Router	
        Distributed	
  Transac%onal	
  Coordina%on	
  
         Par%%oning	
  Engine	
  

           Placement	
  and	
  
          Migra%on	
  Engine	
  
                                                       Backend	
  Nodes	
                                 Backend	
  Nodes	
  
                                   Database	
  	
          CryptoDB	
                                          CryptoDB	
  
Par**ons	
                         load	
  stats	
  
                                                       Encryp%on	
  Engine	
                               Encryp%on	
  Engine	
  
Placement	
  




                                                                                                                                   24	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                25	
  
Experiments	
  




                  26	
  
Experiments	
  




      Bad	
  results?	
  
Tradeoff	
  for	
  be=er	
  privacy	
  
                                         27	
  
Experiments	
  
  Scaling	
  TPC-­‐C	
  




                           28	
  
Rela%onal	
  Cloud	
  
Ø Rela%onal	
  Databases	
  
Ø Database-­‐as-­‐a-­‐Service	
  (DBaaS)	
  
Ø Problems	
  AEacked	
  
   Ø Efficient	
  Mul%-­‐tenancy	
  
   Ø Elas%c	
  Scalability	
  
   Ø Database	
  Privacy	
  
Ø Rela%onal	
  Cloud	
  
Ø Experiments	
  
Ø Conclusion	
  

                                                29	
  
Conclusion	
  
Ø Presented	
  Rela%onal	
  Cloud	
  
Ø Efficient	
  Mul%-­‐tenancy	
  
     Ø Novel	
  resource	
  es%ma%on	
  
     Ø Non-­‐linear	
  op%miza%on-­‐based	
  consolida%on	
  technique	
  
Ø Scalability	
  
     Ø Graph-­‐based	
  par%%oning	
  
Ø Privacy	
  	
  
     Ø Adjustable	
  privacy	
  
     Ø SQL	
  queries	
  on	
  encrypted	
  data	
  
Ø DBaaS	
  is	
  a	
  viable	
  cloud	
  service	
  
	
  
                                                                              30	
  
References	
  
Ø  "Rela%onal	
  Cloud:	
  a	
  Database	
  Service	
  for	
  the	
  cloud"	
  Carlo	
  
    Curino,	
  Evan	
  Jones,	
  Raluca	
  Popa,	
  Nirmesh	
  Malviya,	
  Eugene	
  
    Wu,	
  Sam	
  Madden,	
  Har	
  Balakrishnan,	
  Nickolai	
  Zeldovich	
  
Ø  hEp://rela%onalcloud.com	
  	
  




                                                                                            32	
  
Privacy	
  
CryptoDB	
  Example	
  

                                                                                            DET-­‐encrypted	
  	
  
 Return	
  to	
  JDBC	
  client	
  decrypted	
  	
                                          cyphertext	
  
 RND	
  cyphertexts	
  


  SELECT i_price, ... FROM item WHERE i_id = N



                                                       JDBC	
  client	
  decrypts	
  	
  
                                                       DET	
  level	
  4	
  


                                                                                                                      33	
  

Relational Cloud

  • 1.
    Rela%onal  Cloud   A  Database-­‐as-­‐a-­‐Service  for  the  Cloud   Paper  by  Carlo  Curino  et  al.  @mit.edu     Presenta%on  by  Antonio  Severien   [email protected]    
  • 2.
    Overview   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø   Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   2  
  • 3.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   3  
  • 4.
    Rela%onal  Databases   Ø 1970  by  Edgar  Codd,  IBM  research  San  Jose   Ø Tables   Ø Rows  à  Tuples   Ø Columns  à  AEributes   Ø Rela%onal  Database  Management  Systems   (RDBMS)   4  
  • 5.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   5  
  • 6.
    Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Cloud   Ø Reduce  management,  opera%onal     and  energy  costs   Ø Elas%city  and  scale  economy   amazon  RDS   Ø Pay-­‐per-­‐use   6  
  • 7.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   7  
  • 8.
    Problems  AEacked     Efficient     Mul%-­‐tenancy   Elas%c  Scalability   Privacy   8  
  • 9.
  • 10.
    Efficient  Mul%-­‐tenancy   Ø Reduce  costs   Ø Efficient  usage  of  resources   Ø Maximize  hardware  u%liza%on   Ø Single  database  server  on  each  machine   Ø Maintain  applica%on  query  performance   10  
  • 11.
    Efficient  Mul%-­‐tenancy   Ø Reduce  costs   Ø Efficient  usage  of  resources   Ø Maximize  hardware  u%liza%on   Ø Single  database  server  on  each  machine?   Ø Maintain  applica%on  query  performance     Virtual  Machine   11  
  • 12.
    Efficient  Mul%-­‐tenancy   Ø Problems   Ø Monitoring  resource  requirements  for  workloads   Ø Predic%ng  the  load  generated   Ø Assigning  workloads  to  physical  machines   Ø Migra%ng  workloads  between  nodes   Ø Live  migra*on   12  
  • 13.
    Efficient  Mul%-­‐tenancy   Ø Kairos  (Monitoring  and  consolida%on  engine)   Ø Resource  Monitor   Disk  ac%vity  and  RAM  requirements   Ø Combined  Load  Predictor   CPU,  RAM,  Disk  model  that  predicts  the  combined   resource  requirements   Ø Consolida%on  Engine   Non-­‐linear  op%miza%on  techniques  to…    …  minimize  the  number  of  machines  needed    …  balance  load  between  back-­‐end  machines   13  
  • 14.
  • 15.
    Elas%c  Scalability   Ø Workload  exceeds  single  machine  capacity     Ø Scale  a  single  database  to  mul%ple  nodes   Ø Scale-­‐out  by  query  processing  par%%oning   Ø Granular  placement  and  load  balance  on  backend   15  
  • 16.
    Elas%c  Scalability   Ø Strategy  well  suited  for  OLTP  and  Web   workloads…  but  can  extend  to  OLAP   Ø Minimize  cross-­‐node  distributed  transac%ons     Ø Workload-­‐aware  par**oner   Ø Par%%on  data  to  minimize  mul%-­‐node  transac%ons   Ø Front-­‐end  analyses  execu%on  traces  represented   as  a  graph   16  
  • 17.
    Graph  Par%%oning   we=2 id   name   age   salary   id   name   age   salary   we=1 we=10 id   name   age   salary   we :  weight  of  edge 17  
  • 18.
    Graph  Par%%oning   we=2 id   name   age   salary   id   name   age   salary   we=1 we=10 id   name   age   salary   we :  weight  of  edge 18  
  • 19.
    Graph  Par%%oning   id   name   age   salary   id   name   age   salary   id   name   age   salary   19  
  • 20.
  • 21.
    Privacy   Ø Adjustable  security   Ø Onion  ring  encryp%on  design   2  onion  layer  and  1  homomorphic  encryp%on  of  integer   Ø SQL  query  on  encrypted  data   Ø Security  level  dynamically  adap%ve   Converge  to  an  overall  security  level   21  
  • 22.
    Onion  Layers  of  Encryp%on   6.  RND:  no  func%onality   5.  RND:  no  func%onality   HOM:  addi%on   int  value   4.  DET:  equality  selec%on   3.  OPE:  inequality  select,   min,  max,  sort,  group-­‐by   2.  DET:  equality  join   or   1.  OPE:  inequality  join   String  search   Value   Value   string  value   Strong   RND  =  Randomized  Encryp%on  (no  opera%ons  allowed)   DET  =  Determinis%c  Encryp%on     OPE  =  Order-­‐preserving  Encryp%on   HOM  =  Homomorphic  Encryp%on  (opera%ons  over  encrypted  data)   Weak   22  
  • 23.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   23  
  • 24.
    Rela%onal  Cloud  Architecture   Client  Nodes   Users   User  Applica%on   JDBC-­‐client  (CryptoDB  enabled)   Trusted  Pla,orm  (Private/Secured)   Privacy-­‐preserving   Privacy-­‐preserving   Untrusted  Pla,orm  (Public)   Queries   Results   Admin  Nodes   Frontend  Nodes   Router   Distributed  Transac%onal  Coordina%on   Par%%oning  Engine   Placement  and   Migra%on  Engine   Backend  Nodes   Backend  Nodes   Database     CryptoDB   CryptoDB   Par**ons   load  stats   Encryp%on  Engine   Encryp%on  Engine   Placement   24  
  • 25.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   25  
  • 26.
  • 27.
    Experiments   Bad  results?   Tradeoff  for  be=er  privacy   27  
  • 28.
    Experiments   Scaling  TPC-­‐C   28  
  • 29.
    Rela%onal  Cloud   Ø Rela%onal  Databases   Ø Database-­‐as-­‐a-­‐Service  (DBaaS)   Ø Problems  AEacked   Ø Efficient  Mul%-­‐tenancy   Ø Elas%c  Scalability   Ø Database  Privacy   Ø Rela%onal  Cloud   Ø Experiments   Ø Conclusion   29  
  • 30.
    Conclusion   Ø Presented  Rela%onal  Cloud   Ø Efficient  Mul%-­‐tenancy   Ø Novel  resource  es%ma%on   Ø Non-­‐linear  op%miza%on-­‐based  consolida%on  technique   Ø Scalability   Ø Graph-­‐based  par%%oning   Ø Privacy     Ø Adjustable  privacy   Ø SQL  queries  on  encrypted  data   Ø DBaaS  is  a  viable  cloud  service     30  
  • 32.
    References   Ø  "Rela%onal  Cloud:  a  Database  Service  for  the  cloud"  Carlo   Curino,  Evan  Jones,  Raluca  Popa,  Nirmesh  Malviya,  Eugene   Wu,  Sam  Madden,  Har  Balakrishnan,  Nickolai  Zeldovich   Ø  hEp://rela%onalcloud.com     32  
  • 33.
    Privacy   CryptoDB  Example   DET-­‐encrypted     Return  to  JDBC  client  decrypted     cyphertext   RND  cyphertexts   SELECT i_price, ... FROM item WHERE i_id = N JDBC  client  decrypts     DET  level  4   33  

Editor's Notes

  • #5 Talk about the importance of relational databases and their legacy
  • #7 Talk about the market and the viability of relational databases as a service in the cloud
  • #11 Make this slide better
  • #12 Make this slide better
  • #16 Challenge: workload exceeds capacity of single machine
  • #17 - THE WAY TO SCALE THE WORKLOADS is to MINIMIZE # of MULTI-NODE TRANSACTIONS… why? OVERHEAD ON HOLDING LOCKS on the BACKEND
  • #20 Detail how the provacy works and follow to exemplify on the next sliideKnow well homomrphismUses symetric encryption
  • #25 Comparison between consolidated DBs in one machine versus DBs on Virtual Machines.Explained the difference between UNIFORM and SKEWED: uniform load and skewed (50% of the requests goes to one of the 20 DBs)Consolidated 20 databases to one physical machine
  • #27 Explain what is TPC-C (benchmarks for databases…. Etc)