What types of storage do you know ?
neuefi[Link] 2
Types of Data
Unstructured Semi- structured Structured
Schema and types of
Data does not conform to Data conforms to a
data are known and
a rigid structure. schema but deviations
encoded in the system
are possible
neuefi[Link] 3
Unstructured
● Free text - ‘lorem ipsum какой прекрасный сегодня
день!’
● Images
● Audio files
● Can be stored in “object store” - AWS S3
neuefi[Link] 4
Semi-structured
● Stored in document databases - “MongoDB”, “Elastic Search”
● CSV - “comma separated values” ● JSON - JavaScript object notation
neuefi[Link] 5
Structured
● Standardised format
● Can be processed efficiently by software and humans alike
● Rows and columns that define data attributes
○ Definable attributes - same attributes for all records
○ Relational attributes - common values that link
different datasets together
● Lends itself easily to mathematical analysis
● Easy to store and manage
● The structure is encoded and enforced
classroom_id classroom_capacity classroom_number bootcamp_id
1 10 100 1
2 15 101 2
3 20 102 3
neuefi[Link] 6
Introduction to
Databases
Why learn about Databases?
Why?
● Most of (large) data is stored in databases
● An analyst needs to be able to connect to a database and
access its data to unlock insights
What?
● Understand the basics of databases
● Be able to connect to a database
How?
● Learn about database types and how they are structured
● Connect to a database and explore its content
neuefi[Link] 8
Databases
A systematic collection of data
Data is either stored on disk or in-memory (faster)
Support electronic storage and manipulation of data
neuefi[Link] 9
Types of Databases
RDBMS: Relational Database Management Systems
● use SQL to query RDBMS
● have a predefined schema
● data is stored in tabular form of columns and rows
● the relationship between data is relational
● Examples: Postgres, MySql, Oracle, SQLite
NoSQL Databases (“Not only SQL Databases”)
● don’t use SQL as the primary language
● have no predefined schema
● Examples: Neo4j, Elasticsearch, MongoDB
neuefi[Link] 10
RDBMS
● Many different types of databases exist and each uses a different
flavour of SQL
● Their syntax can differ, but the core concepts are the same
● Some databases will implement a subset of the functionality
● Some DB will be optimized for speed of read, others for speed of
write
neuefi[Link] 11
Database Structure
● A database consists of one/multiple schemas
● Schemas consist of tables
● Tables consist of columns and rows
● A column is a variable and has a unique name
● A row is an observation
● Every cell is a single value
neuefi[Link] 12
Entity-Relationship model
● Data consists of entities of type object, class,
person or place
● The property of an entity is described
through their attribute(s)
● Relationships describe the relation between
entities
● Different types of relationship exist
neuefi[Link] 13
Entity-Relationship types
● One-to-one (1:1)
● One-to-many (1:n) / Many-to-one (n:1)
● Many-to-many (n:n)
neuefi[Link] 14
Entity-Relationship types
● One-to-one (1:1)
● One-to-many (1:n) / Many-to-one (n:1)
● Many-to-many (n:n)
neuefi[Link] 15
Entity-Relationship types
● One-to-one (1:1)
● One-to-many (1:n) / Many-to-one (n:1)
● Many-to-many (n:n)
neuefi[Link] 16
Entity-Relationship types
● One-to-one (1:1)
● One-to-many (1:n) / Many-to-one (n:1)
● Many-to-many (n:n)
neuefi[Link] 17
Relational Databases
● Tables are related via primary and
foreign keys
● Each table has one primary key that
is unique for each record
● A foreign key is a field (or collection
of fields) in one table, that refers to
the primary key in another table
neuefi[Link] 18
ERM Exercise
Group Work
1. Split up in 4 groups Topic
2. Open the Miro Board we send in Slack
Airport
3. Create between 4 and 8 entities
4. Write down properties (columns) of entities
Chocolate Factory
5. Model dependencies (1:1, 1:m, m:n)
6. Present your final diagram and explain Roller Coaster Park
decisions
7. You have 30 Minutes Restaurant
neuefi[Link] 20
Connecting to a Database
SQL Client / Database IDE
● IDE = Integrated Development Environment
● Powerful software that can be used to connect to a database
and retrieve and visualise data (and more!)
● Local or in the cloud
● Collection of open-source, free and paid software available
neuefi[Link] 22
Local SQL Clients Cloud SQL Clients
● Installed and run locally on your ● Deployed in the cloud and
machine accessed via a web-interface
Examples: Examples:
neuefi[Link] 23
DBeaver
● In this course we will use DBeaver
Why?
● Free
● Easy to use
● Works for many different types of databases
● Cross platform (Windows, Linux, Mac OS, Solaris)
neuefi[Link] 24
Installing DBeaver
Choose one:
1. Run the following command in your console:
brew install --cask dbeaver-community
OR
2. Download from their official website DBeaver Mac OS X
(dmg) and install to your Applications folder
neuefi[Link] 25
Setting up DBeaver
Open DBeaver > Preferences > Editors
● Enable upper case: SQL Editor > Formatting >
Keyword Case > Set to: Upper
● Add line numbers: Text Editors > Show line
numbers > Tick box
neuefi[Link] 26
Connect to a PostgreSQL database
1. Click on “New Database Connection”
2. Search for and select PostgreSQL
3. Enter the connection details below
Host [Link]
Port 5432
Database postgres
Username Will be posted in Slack/Zoom Chat
Password Will be posted in Slack/Zoom Chat
neuefi[Link] 27
Exploring the database
Find the database connection in your “Database Navigator” pane on the left
Expand it to postgres > postgres > Schemas > introduction > Tables
Check out the introduction’s tables and answer the following questions:
1. What happens if you double click on a table?
2. What is the first and last entry listed in the records table?
3. What data type is the column ‘record_date’ in the record table?
neuefi[Link] 28