0% found this document useful (0 votes)
187 views42 pages

Database Services and MySQL Guide

Cs531

Uploaded by

zara mahmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views42 pages

Database Services and MySQL Guide

Cs531

Uploaded by

zara mahmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

IT601 – System and Network Administration

Database Services
Arif Husen

Department of Computer Science and Information Technology,


Virtual University of Pakistan
Database Services
➢ Database is essential to any web-based application for saving records and user data. A database is an
organized collection of data so that it can be easily accessed. To manage these databases, Database
Management Systems (DBMS) are used.

➢ Types of DBMS
In general, there are two common types of databases:
▪ Non-Relational
▪ Relational
➢ Non-Relational Database Management System (Non-RDBMS)
▪ In Non-RDBMS, data is stored in key-value pairs. For example:
Customers = [ {id: 1, name=furqan, age=40}, {id=2,name=aisha,age=22}]

▪ Commonly used Non-RDBMS: MongoDB, Amazon DynamoDB, Redis, ES etc.


➢ In RDBMS, data is stored in tabular format. For example,
ID Name Age
1 Furqan 40
2 Aisha 22

▪ Commonly used RDBMS: MySQL, PostgreSQL, MSSQL, Oracle etc.


Elements of Relational Databases
➢ Relational databases are based on the relational model.
▪ The relational model is a group of rules set forth by E. F. Codd based on mathematical principles (relational
algebra), and it defines how database management systems should function.
▪ The basic structures of a relational database (as defined by the relational model) are tables,
columns (or fields), rows (or records), and keys.

Tables

Types Elements

Association Characteristic
Kernel Table Columns Rows
Table tables

Indexes Keys
Types of Tables

Tables are generally grouped into three types:

Kernel tables Association tables Characteristic tables

• Tables that are independent • Tables that represent a relationship • Tables whose purpose is to qualify
entities. Kernel tables often among entities. or describe some other entity.
represent or model things that exist
in the real world. • For example, an order represents • Characteristic only have meaning in
an association between a customer relation to the entity they describe.
• Some example kernel tables are and goods.
customers, vendors, employees, • For example, order-lines might
parts, goods, and equipment. describe orders; without an order,
an order-line is useless.
Elements of Tabe : Row

➢ A row is a single occurrence of the data contained in a table; each row is treated as a single unit. In the
Customer table image in Tables, there are four rows, and each row contains information about an
individual customer.
Elements of Tabe : Column

➢ Rows are organized as a set of columns (or fields). All rows in a table comprise the same set of columns.
In the Customer table image in Tables, the columns are Cust Number, Name, and Street.

<Name>
Elements of Tabe : Keys
➢ Keys identify a unique row

Primary key Foreign Key

▪ A primary key is a column (or group of ▪ A foreign key is a column value in one
columns) whose value uniquely identifies each table that is required to match the column
row in a table. Because the key value is value of the primary key in another table.
always unique, you can use it to detect and
prevent duplicate rows. ▪ If the foreign key value is not null, then the
primary key value in the referenced table
▪ A good primary key has the following must exist.
characteristics:
▪ It is this relationship of a column in one
❑ mandatory table to a column in another table that
❑ unique provides the relational database with its
❑ stable ability to join tables.
❑ short
Elements of Tabe : Indexes

➢ An index in a database operates like the index tab on a file folder. It points out one identifying column, such
as a customer's name, that makes it easier and quicker to find the information you want.
▪ A single column can be used to define a simple index, or a combination of columns to define a
composite or compound index.

▪ To decide which columns to use, you first need to determine how the data in the table is accessed.

▪ If users frequently look up customers by last name, then the last name is a good choice for an index.
It is typical to base indexes on primary keys
Database Schema
➢ A database schema defines how data is organized within a relational database.

▪ It covers logical constraints such as, table names, fields, data types, and the relationships
between these entities.
➢ Schemas commonly use visual representations to communicate the architecture of the database,
becoming the foundation for an organization’s data management discipline.

➢ Types of database schemas

Conceptual schema Logical schema Physical schema


• It provides a big-picture view of • It is less abstract, compared to • It provide the technical
what the system will contain, how conceptual schemas. information that the logical
it will be organized, and which database schema type lacks in
business rules are involved. • It clearly define schema objects addition to the contextual
with information, such as table information, such as table
• Conceptual models are usually names, field names, entity names, field names, entity
created as part of the process of relationships, and integrity relationships, et cetera.
gathering initial project constraints.
requirements.
Strcutured Query Language
➢ Structured Query Language (SQL) is a standard query language that is used to work with relational
databases.
▪ The SQL is used to perform several operations.
▪ SQL queries are general divided into 5 classes
❑ Data Definition Language (DDL)
❑ Data Manipulation Language (DML)
❑ Data Control Language(DCL)
❑ Transaction Control Language(TCL)
❑ Data Query Language (DQL)

▪ Example : SELECT first_name, last_name


FROM Customers;

➢ SQL is generally used with relational databases, however there is no standard way of using non-relational
databases
MYSQL

➢ MySQL is a very popular open-source relational database management system (RDBMS).

▪ MySQL is a relational database management system


▪ MySQL is open-source
▪ MySQL is free
▪ MySQL is ideal for both small and large applications
▪ MySQL is very fast, reliable, scalable, and easy to use
▪ MySQL is cross-platform
▪ MySQL is compliant with the ANSI SQL standard
▪ MySQL was first released in 1995
▪ MySQL is developed, distributed, and supported by Oracle Corporation
▪ MySQL is named after co-founder Monty Widenius's daughter: My

➢ Applications

▪ Huge websites like Facebook, Twitter, Airbnb, Booking.com, Uber, GitHub, YouTube, etc.
▪ Content Management Systems like WordPress, Drupal, Joomla!, Contao, etc.
▪ A very large number of web developers around the world
Common RDBMS

• MySQL,
• Microsoft SQL Server
• Oracle
• Microsoft Access.
Installing MYSQL Server
➢ Installation MySQL Service

sudo apt update

sudo apt install mysql-server

mysql --version

➢ Secure MySQL Service

sudo mysql_secure_installation

▪ Add a Dedicated MySQL User

sudo mysql

mysql> CREATE USER ‘<username>’@’<hostname>' IDENTIFIED WITH authentication_plugin BY


'password';

mysql> CREATE USER ‘<username>’@’<hostname>' IDENTIFIED BY 'password';


Installing MYSQL Server
▪ Grant Privileges to Secure MySQL

mysql> GRANT PRIVILEGE ON database.table TO 'username'@'host';

GRANT CREATE, ALTER, DROP, INSERT, UPDATE, DELETE, SELECT, REFERENCES, RELOAD
on *.* TO ‘username'@'localhost' WITH GRANT OPTION;

FLUSH PRIVILEGES;

▪ Managing MySQL Service

systemctl status mysql.service

systemctl start|restart|enable mysql.service

▪ Log in to your MySQL Server


sudo mysql -u root
Use of mysqladmin
➢ mysqladmin package used by Database Administrators to easily perform basic tasks in MySQL.

➢ It has several valuable tools which can be used for

Server Server Server Database Resource


Maintenance Configuration Monitoring Backup Management

MySQL Service Managing User SQL CRUD


Management Roles Operations Many More
Common Tasks Performed with mysqladmin
➢ mysqladmin has a controlled set of procedures and workflow. It can perform Database operations and
queries with the help of standard and easy-to-use Structured Query Language (SQL).

▪ It assigns users permissions to work on the Database Server Management and Maintenance
activities.
▪ Here are some of the important tasks that can be performed with mysqladmin.

Create and drop Flush information logs,


Monitor MySQL Server Databases in MySQL statistics, status Reload/reset MySQL
processes Server variables, and tables privileges

Start and stop the Start and stop Check server


Kill running queries server with backups replicas configuration and status
Benefits of mysqladmin
➢ Below are some of the advantages of mysqladmin.

▪ Provides defined and improved structure of settings vital for the performance of the MySQL Server.

▪ Visualize the flow by displaying a graphical representation, making it easy for users to read, interpret, and
fine-tune the settings of the MySQL Server.

▪ Takes care of Security Risk Management, hence you can feel super safe working around your data.

▪ Easily import and export data files from the MySQL Server depending on the limited file size.

▪ It maintains User Accounts, their Passwords, and is also capable of locking or unlocking users whenever
needed.

▪ It provides open-source flexibility and secure transactional support with high scalability and continuous
uptime.
Administering MYSQL with mysqladmin
Create a new Database. create db_name
Delete a Database. drop db_name
Check the status of all MySQL Server variables. extended-status
Flush all information in the host cache. flush-hosts
Flush all tables. refresh
Set a new password. password new_password
Stop the server. shutdown
Display the server variables. variables
Change the MySQL Root Password mysqladmin -u root password [New_Password]
Change the MySQL Root Password mysqladmin -u root -pOld_Password password 'New_Password'
see if your MySQL server is up and running. mysqladmin -u root -pPassword ping
Check MySQL Server Uptime mysqladmin -u root -pPassword status
check the version of the MySQL server mysqladmin -u root -pPassword version
Check the Status of a MySQL Server mysqladmin -u root -pPassword status
Extended Status of a MySQL Server mysqladmin -u root -pPassword extended-status
Check MySQL Server Variables mysqladmin -u root -pPassword variables
Check the MySQL Process List mysqladmin -u root -pPassword processlist
kill the MySQL client process mysqladmin -u root -pPassword kill 195003
phpmyadmin

➢ Phpmyadmin is an Web based alternative to mysqlamin. Its main features are

▪ Security
▪ It has some built-in security features, but it may not be as robust as Debian.

▪ Stability
▪ It is generally stable but may not be as reliable as Debian.

▪ Compatibility
▪ It may require additional steps to install and configure PHPMyAdmin on a separate system.

▪ Familiarity
▪ It may require some learning if not familiar with Ubuntu.
Installaing PHPMYADMIN
➢ Update APT Repositories

sudo apt update

➢ Install PHP Support

sudo apt install phpmyadmin php-mbstring php-zip php-gd php-json php-curl

➢ Install phpMyAdmin

sudo apt install phpMyAdmin

phpenmod mbstring

➢ Restart Web service

sudo systemctl restart apache2

➢ Access via Web Interface http://<hostname>/phpmyadmin


Non-Relational Databases

➢ Relational Databases are not suitable for large data, unstructured or semistrucutre data

▪ NoSQL databases are non-relational databases and addresses above issues

➢ There are four major types of NoSQL Databases

Document databases Key-value stores Column stores Graph databases

• Store data as semi- • Store data as key- • These databases • Store data as nodes
structured value pairs, and are store data as column and edges, and are
documents, such as optimized for simple families, which are designed to handle
JSON or XML, and and fast read/write sets of columns that complex relationships
can be queried using operations. are treated as a between data.
document-oriented single entity.
query languages
• Optimized for fast and
efficient querying of
large amounts of
data.
Key Characteristics of NoSQL
➢ Key Characteristics of NoSQL databases are

Dynamic schema Horizontal scalability Document-based Key-value-based

• Do not have a fixed • Designed to scale out • Some NoSQL Dbs, such • NoSQL based Redis,
schema and can by adding more nodes as MongoDB, use a use a key-value data
accommodate changing to a database cluster, document-based data model, where data is
data structures without making them well-suited model, where data is stored as a collection of
the need for migrations for handling large stored in semi- key-value pairs.
or schema alterations. amounts of data and structured format, such
high levels of traffic. as JSON or BSON.

Distributed and high


Column-based Flexibility Performance
availability
• Some NoSQL • NoSQL databases are • NoSQL databases allow • NoSQL databases are
databases, such as often designed to be developers to store and optimized for high
Cassandra, use a highly available and to retrieve data in a flexible performance and can
column-based data automatically handle and dynamic manner, handle a high volume of
model, where data is node failures and data with support for multiple reads and writes,
organized into columns replication across data types and changing making them suitable for
instead of rows. multiple nodes in a data structures big data and real-time
database cluster. applications.
Merits/Demerits
➢ NoSQL has the following Benefits

High scalability Flexibility High availability

• Use sharding for horizontal scaling • Designed to handle • Auto replication feature in
• Vertical Scaling Complex unstructured or semi-structured NoSQL databases makes it
• MongoDB, Cassandra are data, which means that they can highly available because in case
examples of horizontal scaling accommodate dynamic changes of any failure data replicates
DBs. to the data model. itself to the previous consistent
state.
• Handle a huge amount of data, as
the data grows it scale itself to • This makes it suitable for
handle that data in an efficient applications that need to handle
manner changing data requirements.

Performance Cost-effectiveness Agility

• Designed to handle large amounts • More cost-effective than • Ideal for agile development.
of data and traffic, which means traditional relational databases,
that they can offer improved as they are typically less
performance compared to complex and do not require
traditional relational databases. expensive hardware or
software.
Merits/Demerits

➢ NoSQL has the following demerits

No standardization No ACID compliance Narrow focus

No complex queries Not Mature Complex Management


Applications of NoSQL Databases
➢ NoSQL databases are often used in applications where there is a high volume of data that needs to be
processed and analyzed in real-time, such as social media analytics, e-commerce, and gaming.

➢ They can also be used for other applications, such as content management systems, document
management, and customer relationship management.

➢ NoSQL databases may not be suitable for all applications, as they may not provide the same level of data
consistency and transactional guarantees as traditional relational databases.

➢ It is important to carefully evaluate the specific needs of an application when choosing a database
management system.
Available NoSQL Databases

➢ Following are commonly available NoSQL based databases.

Graph Databases Key value store Tabular Document-based

➢Amazon Neptune ➢Memcached • Hbase • MongoDB


➢ Neo4j ➢Redis • Big Table • CouchDB
➢Coherence • Accumulo • Cloudant
• Elasticsearch ?
Elasticsearch
➢ To some people known as "an index," "a search engine," "an analytics database," "a big data
solution," "it's quick and scalable," or "it's like Google."

▪ All of above are correct, which is part of Elasticsearch's appeal.

➢ Elasticsearch is a distributed, free and open search and analytics engine for all types of data,
including textual, numerical, geospatial, structured, and unstructured.

▪ Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now
known as Elastic).

➢ It is known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the
central component of the Elastic Stack, a set of free and open tools for data ingestion, enrichment,
storage, analysis, and visualization.

▪ Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack
now includes a rich collection of lightweight shipping agents known as Beats for sending data to
Elasticsearch.
Operations of ES
➢ Raw data flows into Elasticsearch from a variety of sources, including logs, system metrics, and web
applications.
▪ Data ingestion is the process by which this raw data is parsed, normalized, and enriched before
it is indexed in Elasticsearch.

➢ Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to
retrieve complex summaries of their data.
Elasticsearch index
➢ An Elasticsearch index is a collection of documents that are related to each other. Elasticsearch
stores data as JSON documents.
▪ Each document correlates a set of keys (names of fields or properties) with their corresponding values
(strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).

➢ Elasticsearch uses a data structure called an inverted index, which is designed to allow very fast
full-text searches. An inverted index lists every unique word that appears in any document and
identifies all of the documents each word occurs in.

▪ An inverted index is a mapping of each specific ‘word' (token) to the list of documents (locations)
containing that word, allowing users to easily find documents containing given keywords. Index data is
contained in one or more partitions, also defined as shards. Elasticsearch also automatically distributes
and allocates shards to cluster nodes.

➢ During the indexing process, Elasticsearch stores documents and builds an inverted index to make
the document data searchable in near real-time. Indexing is initiated with the index API, through
which you can add or update a JSON document in a specific index.
Elasticsearch index
Architecture of Elasticsearch

Data Ingestion DataNode


Shard
D Documents
MasterNode

Visuali
M
zation Data Ingestion

D Shard Documents
DataNode

Data Ingestion Documents


D
DataNode
Shard

Document Store
ES Cluster Index
Basic Terms in ES
➢ Some basic terminology related to ES are

JVM Shard Index Segment Mapping


• JVM allows running • Shards, on the other • Each record in • The Lucene index is • Mapping is the
java programs on hand, are the "Apache ElasticSearch consists split into parts, which method of specifying
specified servers. Lucene" application of JSON documents. are smaller how a document and
itself that provides Elasticsearch indexes directories. A segment the fields contained
indexing of data within is a collection of is a subset of the inside it will be stored
nodes. JSONson documents. Lucene index. and indexed. Each
In short, each index is index has a single
a kind of database. mapping form that
defines how the text is
indexed.

Node Document Replica Cluster Type


• Any single instance • In Elasticsearch a • Elasticsearch sends a • A cluster in • In Elasticsearch, a type
(elasticsearch installed document represents a copy of each data to Elasticsearch is a set of represents a class of
machine ) is defined as basic unit of information other machines, thus nodes with the same related documents and
Node. that can be indexed. preventing data loss if cluster namecluster is identified by a name
one of the machines is .name attribute. As such as customer or
down. Theise replicated nodes join or exit a product.
machines or shards are cluster, the cluster
defined as Replica. reorganizes itself to
spread data equally
over the available
nodes.
Why use Elasticsearch?

➢ Elasticsearch is fast.
▪ built on top of Lucene which excels at full-text search.
▪ a near real-time search platform, Lowest Latency, typicaly 1s
❑ A requirements for time-sensitive use cases such as security analytics and infrastructure monitoring

➢ Elasticsearch is distributed by nature.


▪ The documents stored in Elasticsearch are distributed across different containers known as shards
▪ Shards are duplicated to provide redundant copies of the data in case of hardware failure
▪ scale out to hundreds (or even thousands) of servers and handle petabytes of data.
➢ A wide set of features
▪ In addition to its speed, scalability, and resiliency, it has a number of powerful built-in features such
as data rollups and index lifecycle management.
➢ Simplified data ingest, visualization, and reporting
▪ Integration with Beats and Logstash makes it easy to process data before indexing into Elasticsearch.
▪ Kibana provides real-time visualization of Elasticsearch data as well as UIs for quickly accessing
application performance monitoring (APM), logs, and infrastructure metrics data.
ES Applications
➢ Application search

➢ Website search

➢ Infrastructure metrics and container monitoring

➢ Application performance monitoring

➢ Enterprise search

➢ Geospatial data analysis and visualization

➢ Security analytics

➢ Logging and log analytics

➢ Business analytics
ES Node Roles
➢ The main Node roles of ES are as follows

Master Remote_cluster_client ml
• A node that has the master role,
which makes it eligible to be elected • A node that has the • Allows to use machine learning
as the master node, which controls remote_cluster_client role, which features, there must be at least one
the cluster. makes it eligible to act as a machine learning node in your
remote client. cluster.

Ingest Data Transform


• Ingest nodes are able to apply an • Data nodes hold data and perform data related • Used to transform data,
ingest pipeline to a document in order operations such as CRUD, search, and
to transform and enrich the document aggregations. A node with the data role can fill any
there must be at least
before indexing. of the specialised data node roles. one transform node in
your cluster.
• With a heavy ingest load, it makes • Data_content
sense to use dedicated ingest nodes • Data_hot
and to not include the ingest role from • Data_warm
nodes that have the master or data
roles. • Data_cold
• Data_frozen
Data Ingestion
➢ Data ingestion refers to the tools & processes used to collect data from various sources and move
it to a target site, either in batches or in real-time.
▪ The data ingestion layer is critical to your downstream data science, BI, and analytics systems which
depend on timely, complete, and accurate data.
➢ ES provides following tools for data ingestion

Elastic Beats Logstash Language clients Kibana Dev Tools

• Elastic Beats are a • Powerful and flexible • Python, ruby etc


set of lightweight tool to read, process,
data shippers that and ship data of any
allow to conveniently kind
send data to
Elasticsearch
Service.
Elasticbeats

▪ Used to read, preprocess and ship data from sources that come in the form of log files.

▪ Filebeat further supports a number of other data sources including TCP/UDP, containers, Redis, and
Filebeat Syslog.

▪ Large No. of module ease collection and parsing of log formats for applications such as Apache, MySQL,
and Kafka.

▪ Collects and preprocesses system and service metrics.

▪ System metrics include information about running processes, as well as CPU / memory / disk / network
Metricbeat utilization numbers.

▪ Can collect data from many different services including Kafka, Palo Alto Networks, Redis, and many more.

•Collects and preprocesses live networking data, therefore enabling application monitoring, as well as security
and network performance analytics.
Packetbeat
•Among others, Packetbeat supports the following protocols: DHCP, DNS, HTTP, MongoDB, NFS, and TLS.
Elasticbeats
•is all about capturing event logs from Windows operating systems, including application events, hardware
events, and security and system events.
Winlogbeat
•The vast information available from the Windows event log is of much interest for many use cases.

▪ detects changes to critical files and collects events from the Linux Audit Framework.
Auditbeat
▪ Different modules ease its deployment, which is mostly used in the security analytics use cases.

▪ uses probing to monitor the availability of systems and services.


Heartbeat
▪ Heartbeat is useful in a number of scenarios such as infrastructure monitoring and security analytics.
ICMP, TCP, and HTTP are supported protocols.

Functionbeat ▪ collects logs and metrics from within a serverless environment such as AWS Lambda.
Logstash
➢ Logstash is a powerful and flexible tool to read, process, and
ship data of any kind.
▪ Logstash provides several capabilities that are not
currently available or too costly to perform with Beats,
such as enriching documents by performing lookups
against external data sources.
➢ However, this functionality and flexibility of Logstash
comes at a price. Also, hardware requirements for
Logstash are significantly higher than for Beats.

▪ Logstash should generally not be deployed on low-


resource devices. Logstash is therefore used as an
alternative to Beats, should the functionality of the latter
be insufficient for a specific use case.

➢ A common architectural pattern is to combine Beats and Logstash: use Beats to collect data and use
Logstash to perform any data processing that Beats are not capable of doing.
Logstash Pipeline
➢ Logstash works by executing event processing pipelines, whereby each pipeline consists of at
least one of each of the following:
▪ Process and enrich the data in various ways.
▪ Filters can parse CSV, JSON, key/value pairs,
delimited unstructured data, and complex unstructured
data on the basis of regular expressions (grok filters).

Inputs Filters Outputs

▪ Enrich data by performing DNS lookups, ▪ write the parsed and enriched data
▪ Read from data sources such as adding geoinformation about IP addresses, or to data sinks and are the final stage
by performing lookups against a custom of the Logstash processing pipeline.
files, http, imap, jdbc, kafka, syslog, dictionary or an Elasticsearch index.
tcp, and udp.
▪ Additional filters allow for diverse ▪ While many output plugins are
transformations of the data, for example, to available, here we focus on ingestion
rename, remove, copy data fields and values into Elasticsearch Service using the
(mutate filter). Elasticsearch output.
Logstash Example Pipeline

input Transformations Output

input { filter { output {


rss { mutate { stdout {
url => "/blog/feed" rename => [ "message", "blog_html" ] codec => dots
interval => 120 copy => { "blog_html" => "blog_text" } }
} copy => { "published" => "@timestamp" } elasticsearch {
} } hosts => [ "https://<your-elsaticsearch-url>" ]
mutate { index => "elastic_blog"
gsub => [ user => "elastic"
"blog_text", "<.*?>", "", password => "<your-elasticsearch-
"blog_text", "[\n\t]", " " password>"
] }
remove_field => [ "published", "author" ] }
}
}
Installing ES on Linux Server

• Refer to LAB Session, Week 15.

You might also like