Course Introduction
Lecture 1: Course Introduction &
History of Database Systems
1 / 54
Course Introduction
Welcome!
• This course focuses on the design and implementation of database management
systems (DBMSs).
• We will study the internals of modern database management systems.
• We will cover the core concepts and fundamentals of the components that are used in
high-performance transaction processing systems (OLTP) and large-scale analytical
systems (OLAP).
2 / 54
Course Introduction
Today’s Agenda
• Course Outline & Logistics
• History of Database Systems
3 / 54
Course Introduction Course Outline & Logistics
Course Outline & Logistics
4 / 54
Course Introduction Course Outline & Logistics
Why you should take this course?
• You want to learn how to make database systems scalable, for example, to support
web or mobile applications with millions of users.
• You want to make applications that are highly available (i.e., minimizing downtime)
and operationally robust.
• You have a natural curiosity for the way things work and want to know what goes on
inside major websites and online services.
• You are looking for ways of making systems easier to maintain in the long run,even as
they grow and as requirements and technologies change.
• If you are good enough to write code for a database system, then you can write code
on almost anything else.
5 / 54
Course Introduction Course Outline & Logistics
Course Objectives
• Learn about modern practices in database internals and systems programming.
• Students will become proficient in:
▶ Writing correct + performant code
▶ Proper documentation + testing
▶ Working on a systems programming project
6 / 54
Course Introduction Course Outline & Logistics
Course Topics
• Logging & Recovery Methods
• Concurrency Control
• Query Optimization, Compilation
• New Hardware (NVM, FPGA, GPU)
7 / 54
Course Introduction Course Outline & Logistics
Background
• I assume that you have already taken an intro course on database systems (e.g.,, GT
4400).
• We will discuss modern variations of classical algorithms that are designed for today’s
hardware.
• Things that we will not cover: SQL, Relational Algebra, Basic Algorithms + Data
Structures.
8 / 54
Course Introduction Course Outline & Logistics
Background
• All programming assignments will be written in C++11.
• You will learn how to debug and profile multi-threaded programs.
• Assignment 1 will help get you caught up with C++.
9 / 54
Course Introduction Course Outline & Logistics
Course Logistics
• Course Web Page
▶ Schedule: https://www.cc.gatech.edu/ jarulraj/courses/8803-s21/
• Discussion Tool: Piazza
▶ https://www.piazza.com/gatech/spring2021/cs8803dsi
▶ For all technical questions, please use Piazza. Don’t email me directly.
▶ All non-technical questions should be sent to me
• Grading Tool: Gradescope
▶ You will get immediate feedback on your assignment.
▶ You can iteratively improve your score over time.
• Virtual Office Hours
▶ Will be posted on Piazza.
10 / 54
Course Introduction Course Outline & Logistics
Course Logistics
• Course Policies
▶ The programming assignments and exercise sheets must be your own work.
▶ They are not group assignments.
▶ You may not copy source code from other people or the web.
▶ Plagiarism will not be tolerated.
• Academic Honesty
▶ Refer to Georgia Tech Academic Honor Code.
▶ If you are not sure, ask me.
11 / 54
Course Introduction Course Outline & Logistics
Late Policy
• You are allowed ten total slip days (for programming assignments and exercise sheets).
• You lose 25% of an assignment’s points for every 24 hrs it is late.
• Mark on your submission (1) how many days you are late and (2) how many late days
you have left.
12 / 54
Course Introduction Course Outline & Logistics
Teaching Assistants
• Gaurav Tarlok Kakkar
▶ M.S. (Computer Science)
▶ Worked at Adobe (2 years).
▶ Research Topic: Video analytics using deep learning.
• If you are acing through the assignments, you might want to hack on the video
analytics system (codenamed EVA) that we are building.
• Drop me a note if you are interested!
13 / 54
Course Introduction Course Outline & Logistics
Course Rubric
• Project (20%)
• Programming Assignments (45%)
• Exercise Sheets (15%)
• Mid-term Exam (20%)
14 / 54
Course Introduction Course Outline & Logistics
Project - Outline
• A key component of this course will be an original research project.
• Students will organize into groups and choose to implement a project that is:
▶ Relevant to the topics discussed in class.
▶ Requires a significant programming effort from all team members.
15 / 54
Course Introduction Course Outline & Logistics
Project - Outline
• You don’t have to pick a topic until midway through the course.
• We will provide sample project topics.
• This project can be a conversation starter in job interviews.
16 / 54
Course Introduction Course Outline & Logistics
Project – Deliverables
• Proposal: 2-page report + presentation
• Status Update: 3-page report + presentation
• Final: 4-page report + presentation
17 / 54
Course Introduction Course Outline & Logistics
Project – Proposal
• Five minute presentation to the class that discusses the high-level topic.
• Each proposal must discuss:
▶ What is the problem being addressed by the project?
▶ Why is this problem important?
▶ How will the team solve this problem?
18 / 54
Course Introduction Course Outline & Logistics
Project – Status Update
• Five minute presentation to update the class about the current status of your project.
• Each presentation should include:
▶ Current development status.
▶ Whether anything in your plan has changed.
▶ Any thing that surprised you.
19 / 54
Course Introduction Course Outline & Logistics
Project – Final Presentation
• Ten minute presentation on the final status of your project during the finals week.
• You’ll want to include any performance measurements or benchmarking numbers for
your implementation.
• Demos are always hot too.
20 / 54
Course Introduction Course Outline & Logistics
Programming Assignments
• Five assignments based on the BuzzDB academic DBMS.
• Goal is to familiarize you with the internals of database management systems.
• We will use Gradescope for giving you immediate feedback on programming
assignments and Piazza for providing clarifications.
• We will provide you with test cases and scripts for the programming assignments.
• If you have not yet received an invite from Gradescope, you can use the entry code
that will be shared on Piazza.
21 / 54
Course Introduction Course Outline & Logistics
Exercise Sheets
• Three pencil-and-paper tasks.
• You will need to upload the sheets to Gradescope.
• We will share the grading rubric for exercise sheets via Gradescope.
22 / 54
Course Introduction Course Outline & Logistics
Exercise Sheet #1
• Hand in one page with the following information:
▶ Digital picture (ideally 2x2 inches of face)
▶ Name, interests, More details on Gradescope
• The purpose of this sheet is to help me:
▶ know more about your background for tailoring the course, and
▶ recognize you in class
23 / 54
Course Introduction History of Database Systems
History of Database Systems
24 / 54
Course Introduction History of Database Systems
History Repeats Itself
• Reference
• Design decisions in early database systems are still relevant today.
• The “SQL vs. NoSQL” debate is reminiscent of “Relational vs. CODASYL” debate.
• Old adage: he who does not understand history is condemned to repeat it.
• Goal: ensure that future researchers avoid replaying history.
25 / 54
Course Introduction History of Database Systems
1960s – IBM IMS
• Information Management System
• Early database system developed to keep track of purchase orders for Apollo moon
mission.
▶ Hierarchical data model.
▶ Programmer-defined physical storage format.
▶ Tuple-at-a-time queries.
26 / 54
Course Introduction History of Database Systems
Hierarchical Data Model
27 / 54
Course Introduction History of Database Systems
Hierarchical Data Model
sno sname scity sstate parts
students 1001 Maria New York NY part-1
1002 Rahul rahul@cs MA part-2
pno pname psize qty price
part-1
999 Batteries Large 10 100
pno pname psize qty price
part-2
999 Batteries Large 14 99
28 / 54
Course Introduction History of Database Systems
Hierarchical Data Model
• Advantages
▶ No need to reinvent the wheel for every application
▶ Logical data independence: New record types may be added as the logical requirements
of an application may change over time.
29 / 54
Course Introduction History of Database Systems
Hierarchical Data Model
• Limitations
▶ Information is repeated.
▶ Tree structured data model is very restrictive: Existence depends on parent tuples.
▶ No Physical data independence: Cannot freely change storage organization to tune a
database application because there is no guarantee that the applications will continue to
run
▶ Optimization: A tuple-at-a-time user interface forces the programmer to do manual query
optimization, and this is often hard.
30 / 54
Course Introduction History of Database Systems
1960s – IDS
• Integrated Data Store
• Developed internally at GE in the early 1960s.
• GE sold their computing division toHoneywell in 1969.
• One of the first DBMSs:
▶ Network data model.
▶ Tuple-at-a-time queries.
31 / 54
Course Introduction History of Database Systems
1960s – CODASYL
• COBOL people got together and proposeda
standard for how programs will access a
database. Lead by Charles Bachman.
▶ Network data model.
▶ Tuple-at-a-time queries.
32 / 54
Course Introduction History of Database Systems
Network Data Model
33 / 54
Course Introduction History of Database Systems
Network Data Model
• Advantages
▶ Graph structured data models are less restrictive
• Limitations
▶ Poorer physical and logical data independence: Cannot freely change storage
organizations or change application schema
▶ Slow loading and recovery: Data is typically stored in one large network. This much
larger object had to be bulk-loaded all at once, leading to very long load times.
34 / 54
Course Introduction History of Database Systems
1970s – Relational Data Model
• Ted Codd was a mathematician working at IBM
Research.
• He saw developers spending their time
rewriting IMS and Codasyl programs every
time the database’s schema or layout changed.
• Database abstraction to avoid this maintenance:
▶ Store database in simple data structures.
▶ Access data through high-level declarative
language.
▶ Physical storage left up to implementation.
35 / 54
Course Introduction History of Database Systems
1970s – Relational Data Model
36 / 54
Course Introduction History of Database Systems
Relational Data Model
• Advantages
▶ Set-a-time languages are good, regardless of the data model, since they offer physical data
independence
▶ Logical data independence is easier with a simple data model than with a complex one.
▶ Query optimizers can beat all but the best tuple-at-a-time DBMS application
programmers.
37 / 54
Course Introduction History of Database Systems
1970s – Relational Data Model
• Early implementations of relational DBMS:
▶ System R – IBM Research
▶ INGRES – U.C. Berkeley
▶ Oracle – Larry Ellison
38 / 54
Course Introduction History of Database Systems
1980s – Relational Data Model
• The relational model wins.
▶ IBM comes out with DB2 in 1983.
▶ “SEQUEL” becomes the standard (SQL).
• Many new “enterprise” DBMSs, but Oracle wins marketplace.
• Examples: Teradata, Informix, Tandem, e.t.c.
39 / 54
Course Introduction History of Database Systems
1980s – Object-Oriented Data Model
• Avoid relational-object impedance mismatch by tightly coupling objects and
database.
• Analogy: Gluing an apple onto a pancake
• Objects are treated as a first class citizen.
• Objects may have many-to-many relationships and are accessed using pointers.
• Few of these original DBMSs from the 1980s still exist today but many of the
technologies exist in other forms (e.g., JSON, XML)
• Examples: Object Store, Mark Logic, e.t.c.
40 / 54
Course Introduction History of Database Systems
1980s – Object-Oriented Data Model
41 / 54
Course Introduction History of Database Systems
1980s – Object-Oriented Data Model
42 / 54
Course Introduction History of Database Systems
1990s – Boring Days
• No major advancements in database systems or application workloads.
▶ Microsoft forks Sybase and creates SQL Server.
▶ MySQL is written as a replacement for mSQL.
▶ Postgres gets SQL support.
▶ SQLite started in early 2000.
43 / 54
Course Introduction History of Database Systems
2000s – Internet Boom
• All the big players were heavyweight and expensive.
• Open-source databases were missing important features.
• Many companies wrote their own custom middleware to scale out database across
single-node DBMS instances.
44 / 54
Course Introduction History of Database Systems
2000s – Data Warehouses
• Rise of the special purpose OLAP DBMSs.
▶ Distributed / Shared-Nothing
▶ Relational / SQL
▶ Usually closed-source.
• Significant performance benefits from using Decomposition Storage Model (i.e.,
columnar storage)
45 / 54
Course Introduction History of Database Systems
2000s – NoSQL Systems
• Focus on high-availability & high-scalability:
▶ Schema-less (i.e., “Schema Last”)
▶ Non-relational data models (document, key/value, etc)
▶ No ACID transactions
▶ Custom APIs instead of SQL
▶ Usually open-source
46 / 54
Course Introduction History of Database Systems
2010s – NewSQL
• Provide same performance for OLTP workloads as NoSQL DBMSs without giving up
ACID:
▶ Relational / SQL
▶ Distributed
▶ Usually closed-source
47 / 54
Course Introduction History of Database Systems
2010s – Hybrid Systems
• Hybrid Transactional-Analytical Processing.
• Execute fast OLTP like a NewSQL system while also executing complex OLAP queries
like a data warehouse system.
▶ Distributed / Shared-Nothing
▶ Relational / SQL
▶ Mixed open/closed-source.
48 / 54
Course Introduction History of Database Systems
2010s – Cloud Systems
• First database-as-a-service (DBaaS) offerings were containerized versions of existing
DBMSs.
• There are new DBMSs that are designed from scratch explicitly for running in a cloud
environment.
49 / 54
Course Introduction History of Database Systems
2010s – Specialized Systems
• Shared-disk DBMSs
• Embedded DBMSs
• Times Series DBMS
• Multi-Model DBMSs
• Blockchain DBMSs
50 / 54
Course Introduction History of Database Systems
2010s – Specialized Systems
51 / 54
Course Introduction Conclusion
Conclusion
52 / 54
Course Introduction Conclusion
Parting Thoughts
• There are many innovations that come from both industry and academia.
▶ Lots of ideas start in academia but few build complete DBMSs to verify them.
▶ IBM was the vanguard during 1970-1980s but now there is no single trendsetter.
▶ The era of cloud systems has begun.
• The relational model has won for operational databases.
53 / 54
Course Introduction Conclusion
Next Class
• Recap of topics covered in the first course
• Submit exercise sheet #1 via Gradescope.
54 / 54