0% found this document useful (0 votes)
14 views31 pages

Dbms Update Manual

The document outlines the syllabus for an Advanced Database Management System Lab at Everest College, covering topics such as Distributed Databases, ORDBMS, Data Pre-processing in R, and Data Mining techniques. It includes detailed content for each module, along with practical experiments on database partitioning, creating relational tables, and data preprocessing methods. The document also provides example SQL commands and R programming snippets for implementation and analysis.

Uploaded by

ebtesam khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Dbms Update Manual

The document outlines the syllabus for an Advanced Database Management System Lab at Everest College, covering topics such as Distributed Databases, ORDBMS, Data Pre-processing in R, and Data Mining techniques. It includes detailed content for each module, along with practical experiments on database partitioning, creating relational tables, and data preprocessing methods. The document also provides example SQL commands and R programming snippets for implementation and analysis.

Uploaded by

ebtesam khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

[Link] CSE

ADVANCED DATABASE MANAGEMENT SYSTEM LAB


Syllabus

Module Detailed Contents Hrs


No

1 Distributed Database: 2
Implementation of Partitions: Range, List.
Self-Learning Topics: Hash Partition, Composite
partition

2 ORDBMS:
Implementation of,
• Abstract Data Type
• Reference
Self-Learning Topics: Nested ADT, Inheritance
3 Pre-processing in R : 2
Data pre-processing techniques in R
Self-Learning Topics: Sorting, Date Conversion

4 Data Mining - Classification using Reprogramming 2


:
Implementation and Analysis of -Regression,
Classification Models

5 Data Mining - Clustering and Association using


R-Programming:
Implementation of Market Basket Analysis and
Clustering.



6
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Experiment 1
DISTRIBUTED DATABASE

RANGE Partitioning in mysql

Problem Statement:

1a. Implementation of Data partitioning through Range.

Concept:

A table that is partitioned by range is partitioned in such a way that


each partition contains rows for which the partitioning expression value
lies within a given range.

Ranges should be contiguous but not overlapping, and are defined


using the VALUES LESS THAN operator.

Below example demonstrate the partitioning in range into mysql


database.

Program:
Mysql>create database mca;
Mysql>use mca;

mysql> CREATE TABLE tr (id INT, name VARCHAR (50), purchased


DATE)
PARTITION BY RANGE (YEAR (purchased)) (
PARTITION p0 VALUES LESS THAN (1990),
PARTITION p1 VALUES LESS THAN (1995),
PARTITION p2 VALUES LESS THAN (2000),
PARTITION p3 VALUES LESS THAN (2005),
PARTITION p4 VALUES LESS THAN (2010),
PARTITION p5 VALUES LESS THAN (2015)
);

1
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Above code simply create a table named as tr with three columns


and 6 partitions on which values are defined inside brackets i.e ().

Now insert data as rows into employees table with following queries.

Mysql> INSERT INTO tr VALUES


(1, 'desk organiser', '2003-10-15'),
(2, 'alarm clock', '1997-11-05'),
(3, 'chair', '2009-03-10'),
(4, 'bookcase', '1989-01-10'),
(5, 'exercise bike', '2014-05-09'),
(6, 'sofa', '1987-06-05'),
(7, 'espresso maker', '2011-11-22'),
(8, 'aquarium', '1992-08-04'),
(9, 'study desk', '2006-09-16'),
(10, 'lava lamp', '1998-12-25');

Now check the inserted data with select statement.


Mysql> select * from tr;

2
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

We can also check the inserted data into table tr with partition
with range created with following code.

Mysql> SELECT * FROM tr PARTITION (p2);

Mysql> SELECT * FROM tr PARTITION (p5);

Mysql> SELECT * FROM tr PARTITION (p4);

We can also check the data with below select statement.

Mysql> SELECT PARTITION_NAME, TABLE_ROWS FROM


INFORMATION_SCHEMA.PARTITIONS WHERE
TABLE_NAME='tr';

3
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Drop a MySQL partition

If you feel some data are useless in a partitioned table you can drop
one or more partition(s). To delete all rows from partition p0 of tr, you can
use the following statement:

MySQL> ALTER TABLE tr TRUNCATE PARTITION p0;

Mysql> SELECT PARTITION_NAME, TABLE_ROWS FROM


INFORMATION_SCHEMA.PARTITIONS WHERE
TABLE_NAME='tr';

1b. Implementation of Data partitioning through List partition.


MySQL LIST Partitioning

It is the same as Range Partitioning. Here, the partition is defined


and selected based on columns matching one of a set of discrete value lists
rather than a set of a contiguous range of values. It is performed by the
PARTITION BY LIST(exp) clause. The exp is an expression or column
value that returns an integer value. The VALUES IN(value_lists)
statement will be used to define each partition.

In the below example, suppose we have 12 stores distributed


among four franchises based on their region. The table explains it more
clearly:

Region Store ID Number


East 101, 103, 105
West 102, 104, 106
North 107, 109, 111
South 108, 110, 112

We can partition the above table where rows for stores belonging
to the same region and will be stored in the same partition. The following

4
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

statement arranges the stores in the same region using LIST partitioning,
as shown below:

Mysql>CREATE TABLE Stores (


cust_name VARCHAR(40),
bill_no VARCHAR(20) NOT NULL,
store_id INT PRIMARY KEY NOT NULL,
bill_date DATE NOT NULL,
amount DECIMAL(8,2) NOT NULL
)
PARTITION BY LIST(store_id) (
PARTITION pEast VALUES IN (101, 103, 105),
PARTITION pWest VALUES IN (102, 104, 106),
PARTITION pNorth VALUES IN (107, 109, 111),
PARTITION pSouth VALUES IN (108, 110, 112));



5
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

6
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Experiment no 2

ORDBMS

Aim: Implementation of Abstract Data Type & Reference

Creating Tables Under the Relational Model

The relational approach normalizes everything into tables. The


table names are Customer_reltab, PurchaseOrder_reltab, and Stock_reltab.

Each part of an address becomes a column in the Customer_reltab


table. Structuring telephone numbers as columns sets an arbitrary limit on
the number of telephone numbers a customer can have.

The relational approach separates line items from their purchase


orders and puts each into its own table, named Purchase Order_reltab and
LineItems_reltab.

The relational approach results in the tables describe in the


following sections.

Customer_reltab

The Customer_reltab table has the following definition:

CREATE TABLE Customer_reltab (


CustNo NUMBER NOT NULL,
CustName VARCHAR2(200) NOT NULL,
Street VARCHAR2(200) NOT NULL,
City VARCHAR2(200) NOT NULL,
State CHAR(2) NOT NULL,
Zip VARCHAR2(20) NOT NULL,
Phone1 VARCHAR2(20),
Phone2 VARCHAR2(20),
Phone3 VARCHAR2(20),
PRIMARY KEY (CustNo));

7
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

This table, Customer_reltab, stores all the information about


customers, which means that it fully contains information that is intrinsic
to the customer (defined with the NOTNULL constraint) and information
that is not as essential. According to this definition of the table, the
application requires that every customer have a shipping address.

PurchaseOrder_reltab
The PurchaseOrder_reltab table has the following definition:

CREATE TABLE PurchaseOrder_reltab (


PONo NUMBER, /* purchase order no */
Custno NUMBER references Customer_reltab, /* Foreign KEY
referencing
customer */
OrderDate DATE, /* date of order */
ShipDate DATE, /* date to be shipped */
ToStreet VARCHAR2(200), /* shipto address */
ToCity VARCHAR2(200),
ToState CHAR(2),
ToZip VARCHAR2(20),
PRIMARY KEY(PONo));

Purchase Order_reltab manages the relationship between the


customer and the purchase order by means of the foreign key (FK) column
Cust No, which references the CustNo key of the Customer_reltab. The
Purchase Order_reltab table contains no information about related line
items. The line items table (next section) uses the purchase order number
to relate a line item to its parent purchase order.

Stock_reltab

The Stock_reltab table has the following definition:

CREATE TABLE Stock_reltab (


StockNo NUMBER PRIMARY KEY,
Price NUMBER,
TaxRate NUMBER);

8
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

LineItems_reltab

The LineItems_reltab table has the following definition:

CREATE TABLE LineItems_reltab (

LineItemNo NUMBER,

PONo NUMBER REFERENCES PurchaseOrder_reltab,

StockNo NUMBER REFERENCES Stock_reltab,

Quantity NUMBER,

Discount NUMBER,

PRIMARY KEY (PONo, LineItemNo));

The table name is in the plural form LineItems_reltab to emphasize


to someone reading the code that the table holds a collection of line items.
PONo, which references the PONo column in Purchase Order_reltab
StockNo, which references the StockNo column in Stock_reltab

Inserting Values Under the Relational Model


In our application, statements like these insert data into the tables:
INSERT INTO Stock_reltab VALUES(1004, 6750.00, 2);
INSERT INTO Stock_reltab VALUES(1011, 4500.23, 2);
INSERT INTO Stock_reltab VALUES(1534, 2234.00, 2);
INSERT INTO Stock_reltab VALUES(1535, 3456.23, 2);
INSERT INTO Customer_reltab
VALUES (1, 'Jean Nance', '2 Avocet Drive',
'Redwood Shores', 'CA', '95054',
'415-555-1212', NULL, NULL);
9
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

INSERT INTO Customer_reltab


VALUES (2, 'John Nike', '323 College Drive',
'Edison', 'NJ', '08820',
'609-555-1212', '201-555-1212', NULL);
INSERT INTO PurchaseOrder_reltab
VALUES (1001, 1, SYSDATE, '10-MAY-1997',
NULL, NULL, NULL, NULL);
INSERT INTO PurchaseOrder_reltab
VALUES (2001, 2, SYSDATE, '20-MAY-1997',
'55 Madison Ave', 'Madison', 'WI', '53715');
INSERT INTO LineItems_reltab VALUES(01, 1001, 1534, 12, 0);
INSERT INTO LineItems_reltab VALUES(02, 1001, 1535, 10, 10);
INSERT INTO LineItems_reltab VALUES(01, 2001, 1004, 1, 0);
INSERT INTO LineItems_reltab VALUES(02, 2001, 1011, 2, 1);

Querying Data Under the Relational Model


The application can execute queries like these:
SELECT [Link], [Link], [Link], [Link], [Link],
[Link], C.phone1, C.phone2, C.phone3,
[Link], [Link],
[Link], [Link], [Link], [Link]
FROM Customer_reltab C,
PurchaseOrder_reltab P,
LineItems_reltab L
WHERE [Link] = [Link]
AND [Link] = [Link]
AND [Link] = 1001;

Get the Total Value of Purchase Orders


SELECT [Link], SUM([Link] * [Link])
FROM PurchaseOrder_reltab P,
LineItems_reltab L,
Stock_reltab S
WHERE [Link] = [Link]
AND [Link] = [Link]
GROUP BY [Link];
Get the Purchase Order and Line Item Data for Stock Item 1004

SELECT [Link], [Link],


[Link], [Link], [Link], [Link]
FROM PurchaseOrder_reltab P,
10
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

LineItems_reltab L
WHERE [Link] = [Link]
AND [Link] = 1004;

Updating Data Under the Relational Model. The application


can execute statements like these to update the data:

UPDATE LineItems_reltab
SET Quantity = 20
WHERE PONo = 1001
AND StockNo = 1534;

Deleting Data Under the Relational Model

DELETE
FROM LineItems_reltab
WHERE PONo = 1001;

DELETE
FROM PurchaseOrder_reltab
WHERE PONo = 1001;



11
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

12
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

13
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

EXPERIMENT NO.3

Aim : Data preprocessing in R.

Objective: What is data preprocessing? Different steps involved in data


preprocessing.

Theory:
 Data Preprocessing
Data preprocessing is a process of preparing the raw data to make
it suitable for a machine learning model. It is the first and crucial step
while making any machine learning model.

When creating a machine learning model, it is not a case that we


come across the clean and formatted data always. It is mandatory to clean
the data and put it in a formatted way before using it for any model. So,
for this we use data preprocessing.

Why do we need Data Preprocessing?


A real-world data generally contains noises, missing values, and
maybe in an unusable format which cannot be directly used for machine
learning models. Data preprocessing is used for cleaning the data and
making it suitable for a machine learning model which also increases the
accuracy and efficiency of a machine learning model.
 Data Preprocessing in R

[Link] file

14
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

 Importing the Dataset


Here, first we will change the working directory to E:/datasets
(where [Link] is stored)

To display all 7 rows from csv file

This dataset consists of four features. The dependent factor is the


‘Purchased’ column.

If the above dataset is to be used for machine learning model, the


idea will be to predict if an item got purchased or not depending on the
Country, Age and Salary of a person. The highlighted cells with value
‘NA’ denote missing values in the dataset.

15
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

 Dealing with Missing Values

dataset$Age = ifelse([Link](dataset$Age),ave(dataset$Age, FUN = function(x) mean(x,


[Link] = 'TRUE')),dataset$Age)

dataset$Salary = ifelse([Link](dataset$Salary), ave(dataset$Salary, FUN = function(x)


mean(x, [Link] = 'TRUE')), dataset$Salary)

The above code checks for missing values in the Age and Salary
columns and update the missing cells with the column-wise average.

 dataset$column_header:
Selects the column in the dataset specified after $ (Age and
Salary).
 [Link](dataset$column_header):
This method returns true for all the cells in the specified column
with no values.
 ave(dataset$column_header, FUN = function(x) mean(x, [Link]
= ‘TRUE’)):
This method calculates the average of the column passed as
argument.
Output:

Since we don’t want decimal places for Age, we will round it up using the
following code.

dataset$Age = [Link](format(round(dataset$Age, 0)))

16
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

The argument 0 in the round function means no decimal places.

After executing the above code block , the dataset would look like
what’s shown below :

 Dealing with Categorical Data

Categorical variables represent types of data which may be divided


into groups. Examples of categorical variables are race, sex, age group,
educational level etc.

In our dataset, we have categorical features ‘Purchased’. In R we can


use the factor method to convert texts into numerical codes.

dataset$Purchased = factor(dataset$Purchased, levels = c('No','Yes'), labels = c(0,1))

 factor(dataset$olumn_header, levels = c(), labels = c()) :


the factor method converts the categorical features in the specified
column to factors or numerical codes.
 levels:
The categories in the column passed as a vector. Example
c(‘No’,’Yes’)
 labels:
The numerical codes for the specified categories in the same order.
Example c(0,1))

17
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Output:

Questions:
1. Incorrect or invalid data is known as .
[Link] data [Link] [Link] data [Link] data

2. What will be the output of the following R code?


> x <- c(2, 6, NaN, NA, 4)
> [Link](x)
a) FALSE FALSE TRUE TRUE FALSE
b) FALSE TRUE TRUE TRUE FALSE
c) TRUE FALSE TRUE TRUE FALSE
d) TRUE FALSE TRUE FALSE FALSE

3. is used for cleaning the data and making it suitable for a


machine learning model.
a. Data preprocessing
b. Saving the data
c. Data Repairing
d. Data removing



18
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

EXPERIMENT NO. 4
Aim: To implement and analyse linear regression

Objective:- To understand linear regression which is a statistical model to


study the relationship that could exist between two variable quantities :
one of the variables is called independent variable(x) and the other is
known as dependent variable(y).

Theory:

LINEAR REGRESSION

The independent and dependent variables are assumed to have a linear


relationship in linear regression. This implies that a line can be drawn
between the two.

The relation between two quantitative variables is estimated using simple


linear regression. When you need to know the following, you can apply
simple linear regression:

1) What is the extent of the association between the two variables?

2) The value of the dependent variable at a given value of the independent


variable.

Program:

In this program, ages and heights of people are recorded in an excel file
“[Link]” and the relationship between ages (independent
variable) and heights(dependent variable) is studied.

This relationship between heights and ages can be expressed as a linear


equation:

Heights = m*ages + c.

M is the slope of the line and c is the intercept.

19
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

([Link] file )

20
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Output :

Residuals: The intention is for the sum of the residuals to be close to zero
or as low as possible. Most cases will not follow a completely straight line
in real life, hence residuals are always to be expected.

Coefficients: The values of the intercept (“c” value) and the slope (“m”
value) for the age can be seen These “c” and “m” values are used to draw
a line between all the points of the data.

So in this case, if there is a person whose age is 18, then the height
can be calculated as (18*2.0500 + 131.0778)

The p-value for the age is 0.000001155. The smaller the value the
better is ‘ages’ a good determinant of ‘heights’.

R² value is almost 1 for models that fit well and in case of models
that poorly fit the data have R² value near about 0. In this output R squared
value is 0.9715 which explains almost 97% of the variability.

21
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

EXPERIMENT NO. 5
Aim: Implementation of market basket analysis.

Objective:- To implement market basket analysis in R

Theory:

Market Basket Analysis is a form of frequent itemset mining that


examines consumer purchasing patterns by identifying relationships
between the many goods in their "shopping baskets." By getting insight
into which goods are commonly purchased together by customers,
businesses may build marketing strategies based on the finding of these
relationships.
Market Basket Analysis is a method of determining the value of a
market basket.

MBA is most often used to help in cross-selling and up-selling. If


you know that customers who buy trousers also buy belts, for example,
you may advertise the belts on the same page or offer them as part of a
bundle to try to boost sales. You may also advertise one product while
seeing an increase in the other.

Customers' purchase patterns are depicted using "Association


Rules" in Market Basket Analysis. A rule's interestingness is determined
by two metrics: support and confidence.

Example:

Tea_powder => sugar [support = 4%, confidence = 70%]

a. A support of 2% for the above rule states that 2% of all the


transaction under analysis show that tea powder and sugar are
purchased together.

support(B => C) = P(B U C)

b. A confidence of 70% means that 70% of the customers who


purchased tea powder also bought the sugar.

c. Lift is a metric that helps us figure out if combining two products


increases our chances of making a sale.

Packages/functions used:
1) arules
It is used for displaying, manipulating, and analysing transaction data and
patterns (frequent item sets and association rules)

22
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

2) arulesViz
Extends package 'arules' with various visualization techniques for
association rules and item sets.
3) inspect()
It summarizes all relevant options, plots and statistics that should be
usually considered
4) [Link]()
It finds redundant rules
5) apriori()
From a given collection of transaction data, apriori() creates the
most relevant set of rules. It also demonstrates the rules' support,
confidence, and lifting. The relative strength of the rules may be
determined using these three criteria.
6) plot()
It is used to visualize association rules and item sets. It has in it
implemented several popular visualization methods like scatter plots.
Algorithm:
The Apriori algorithm seeks out "often recurring item sets." An itemset is
a collection of related items (such as products in a basket) whose
frequency of co-occurrence is determined by a user-defined "support"
level.
1) Read through the entire transaction.
2) Calculate the value of support for each item.
3) If the item's support is less than the minimum, it should be discarded.
Otherwise, add it to the frequently used itemset.
4) Determine the level of confidence for each non-empty subset.
If the confidence is less than the minimum, the subgroup should be
discarded.

Program:
1) library(arules)

2) library(arulesViz)

3) inspect(Groceries)

23
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

4) grules<-apriori(Groceries)

24
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

5) grules <-apriori(Groceries, parameter=list( supp = 0.001 , conf = 0.8 ))

25
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

6) inspect(grules[1:5])

7) grules <- sort (grules, by = ” support ”,decreasing = T)

inspect(grules[1:10])

8) redundant_grules <- [Link](grules)

9) summary(redundant_grules)

26
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

10) grules<-grules[!redundant_grules]

11) inspect(grules)

12) grules_whole_milk <- apriori(Groceries, parameter=list(supp =


0.0001,conf = 0.08), appearance=list(default="rhs”, lhs = "whole milk"))

27
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

13) inspect(grules_whole_milk[1:7])

14) plot(grules_whole_milk, method = "graph" ,engine ="interactive")

28
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY

Graph:

29
EVEREST COLLEGE OF ENGINEERING AND TECHNOLOGY


30

You might also like