0% found this document useful (0 votes)
26 views36 pages

DMDW Lab6 Kirtan

Uploaded by

Kirtan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views36 pages

DMDW Lab6 Kirtan

Uploaded by

Kirtan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

DataMiningandDataWarehousing

Laboratory
(CSPC-328)

B.TechVIthSemester
(January–June2024)

Submittedby
Kirtan Gohil(21103073) Group-G3

Submittedto
Dr.SamayveerSingh

DepartmentofComputerScience&Engineering
Dr.B.R.AmbedkarNationalInstituteofTechnology Jalandhar
-144008,Punjab,India TableofContent
Sr.No. PracticalName Date Pag Remark
e s
No.
1. DesigningDatabaseUsingERModelling 29-01-2024 3-6
a)HospitalManagementSystem
b)LibraryManagementSystem
2. NormalizingaDatabase 5-02-2024 7-19

3. ProgramstoimplementProcedures, 12-02-2024 20-23


CursorsandTriggersinadatabase

4. Writeprogramstoimplementand 19-02-2024 24-26


understandusageofDatamarts.
5. Feature Selection and Variable Filtering.

6. Perform Associative Mining In Weka and


Orange on large datasets
7.

8.

9.

10.

2
Practical1
Aim:-DesigningDatabaseUsingERModelling
Que1CreatedatabasedesignforHospitalManagementSystemusingER Modelling
The patient, physician, department, room, and appointment are the entities that make up the hospital
administration system.
The following is a relationship between these entities areas:

An appointment is for one patient and one doctor. A patient may have one or more appointments. A
doctor may schedule many appointments with various patients.
One department is assigned to a doctor.
A department may employ several physicians.
One patient can be assigned to one room, and one or more patients can be housed in a room.
A doctor is in charge of each room, however they can oversee more than one. These
relationships allow us to develop the subsequent ER model:

1.Entities:
• Patient with attributes (Name, Age, Room Number, and Patient ID).
• Physician with the following attributes: DepartmentID, Name, Specialty, DoctorID.
• Department including features like DepartmentName, DepartmentID.
Room has the following attributes: bed count, supervising doctor ID, room number.
• Appointment with the following attributes: PatientID, DoctorID, Date, Time, Appointment ID.

2. Relationships:
A patient's relationship with an appointment is symbolized by a "has" relationship.
A doctor-patient connection is based on a "conducts" relationship.
A department and a doctor are associated, represented by a "assignedto" relationship.
Multiple doctors are associated with a department through the "employs" relationship.
A patient and a room are connected through a "assignedto" relationship.
A room can have a relationship with numerous patients, represented by a "houses" relationship.
A room has a relationship with a doctor, which is represented by a "supervisedby" relationship. An
diagram representing things as boxes and relationships as lines linking these boxes—often with
additional symbols to signify the kind and cardiacality of the interactions—would be the visual
representation of the ER model.
The relationships and entities within the hospital management system are shown in Fig. 1.1.

The patient, doctor, department, room, and appointment are the five main entities that are included.
Patients may schedule many appointments, with a doctor and a single patient at each visit.Physicians
are assigned to departments, and each department may have more than one physician on staff.Patients
are assigned to rooms, and each room can accommodate several patients under a single doctor's
care.The ER graphic also shows how a doctor is able to oversee many rooms.The entities are linked
together by a number of links, including "has," "conducts," "assigned to," "employees," "houses," and

3
"supervisedby," which illustrate the many relationships and interactions that exist in a medical
setting.The diagram shows the relationships between the various components of the system and acts as
a visual representation of the data model.

Fig.1.1:ERdiagramforHospitalManagementSystem

Que2CreatedatabasedesignforLibraryManagementSystemusingER Modelling
The following entities are included in the library management system: book, author, borrower, genre,
and loan.The following is a relationship between these entities areas: A book is authored by one or
more writers. • A writer can pen one or more books.
• A borrower may check out many books, but a book may be checked out by just one borrower. •in real
time.
• A book falls into a specific genre.
• A genre can be connected to more than one book.
• The loan specifies when a book was checked out and when it must be returned.

These relationships led us to derive the subsequent ER model:

1. Entities
• Book with attributes: Title, ISBN, BookID, GenreID.
•Author with attributes: Name, BirthDate, and AuthorID.
• Borrower with properties: Name, Address, Phone, and Borrower ID.
• Genre with attributes (GenreName, GenreID).
• Loan with attributes: BookID, BorrowerID, Borrow Date, Due Date, Loan ID.
2. Relationships:
• A book is linked to its author(s) by means of a "writtenby" relationship.
One or more books are associated with an author via a "writes" relationship.

4
• A "borrows" relationship connects a borrower with books.
• A book and borrower have a relationship thanks to the "isborrowedby" connection.
• A book and a genre are connected by a "belongsto" relationship.
• Aloani is related to a borrower and a book through a "issued for" relationship. • A genre is connected
to many books through a "encompasses" relationship.

To visualize the ER model, entities would be shown as boxes with relationships between them shown
as lines or arrows. The types and cardinality of each link would be represented by annotations or
symbols.

Fig.1.2:ERdiagramforLibraryManagementSystem

Figure 1.2 illustrates the connections and entities in the Library Management System.There are five
main components to it: Book, Author, Borrower, Genre, and Loan.The graphic shows how a book is
linked to one or more writers by a "written by" relationship, enabling numerous authors to contribute
to a single work.Books are linked to authors by a "writes" relationship, meaning that an author is able
to write more than one book.The relationship "borrows" links borrowers to books; this means that one
borrower may check out numerous books at once, but only one borrower may check out a book at a
time.Books are grouped by genres using a "belongsto" relationship, which indicates that a given book
is part of a particular genre. Genres might include more than one book.The "issuedfor" relationships
bind loans to both borrowers and books, indicating the date a book was borrowed and the return
deadline.

5
Practical2
Aim:-NormalisingaDatabaseUsingGriffithNormalisation Tool
Que1Understandthefunctionaldependenciesandnormalizeeach functional
dependencyupto2NF,3NF,andBCNFusingnormalizationtoolfrom GriffithUniversity.
Foreachquestion:
•Findtheminimalcover.
•Identifythecandidatekey(s)orprimarykey.
•Checkforpartialdependenciestodetermineiftherelationisin2NF.
•Checkfortransitivedependenciestoassessiftherelationisin3NF.
•CheckfortransitivedependenciestoassessiftherelationisinBCNF.

A.StudentDatabase:
Giventherelation:
StudentCourses(StudentID,CourseName,Instructor,CourseCredits)
andthefunctionaldependencies: StudentID,CourseName→Instructor
CourseName→CourseCredits

PreviousFunctionalDependencies

Fig1.A.1 ModifiedFunctionalDependencies

Fig1.A.2

Result

6
Fig1.A.3

Fig1.A.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.A.2
showsnewFunctionalDependencieswhichshowsIfyouknowastudent'sIDand
thenameofthecoursethey'retaking,youcandeterminetheinstructorwhoteaches
thatcourseandhowmanycreditsthatcoursecarries.Fig1.A.3showstheresultthat
newFDsareinBCNF.

B.EmployeeManagement:
Giventherelation:
EmployeeProjects(EmployeeID,ProjectName,Manager,Department)
withthefunctionaldependencies:
EmployeeID→Department
ProjectName→Manager Department→Manager

PreviousFunctionalDependencies

Fig1.B.1 ModifiedDependencies

7
Fig1.B.2 Result

Fig1.B.3

Fig1.B.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.B.2
showsnewFunctionalDependencieswhichshowsGivenanEmployeeID,wecan
determinetheProjectNameandDepartmentassociatedwiththatemployee.
GivenaDepartment,wecandeterminetheManagerandEmployeeIDassociated
withthatdepartment.Fig1.B.3showstheresultthatnewFDsareinBCNF.

C.LibrarySystem:
Considertherelation:
BookLending(BookID,MemberID,BorrowDate,DueDate,MemberAddress)
andthefunctionaldependencies: BookID→DueDate
MemberID→MemberAddress
PreviousFunctionalDependencies

8
Fig1.C.1

ModifiedDependencies

Fig1.C.2 Result

Fig1.C.3
Fig1.C.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.C.2
showsnewFunctionalDependencieswhichshowsIfyouknowwhichbookis
borrowedbywhichmember,youcandeterminethemember'saddress,theduedate
ofthebook,andthedateitwasborrowed.Fig1.C.3showstheresultthatnewFDs areinBCNF.
D.HospitalManagement:
-Fortherelation:

9
PatientTreatment(PatientID,Treatment,Doctor,DoctorSpecialization)
withthefunctionaldependencies: Doctor→DoctorSpecialization PatientID,Treatment→Doctor

PreviousFunctionalDependencies

Fig1.D.1 ModifiedDependencies

Fig1.D.2 Result

Fig1.D.3
Fig1.D.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.D.2
showsnewFunctionalDependencieswhichshowsIfyouknowthePatientIDandthe
treatmenttheyareundergoing,youcandeterminewhichdoctorisresponsiblefor
providingthattreatment,alongwiththedoctor'sspecialization.Fig1.D.3showsthe
resultthatnewFDsareinBCNF.

E.AirlineReservationSystem:

10
-Giventherelation:
FlightReservations(FlightNumber,Date,PassengerID,SeatNumber,ClassType,
Price,DepartureTime,ArrivalTime,DepartureCity,ArrivalCity) -Functionaldependenciesare:
FlightNumber,Date→DepartureTime,ArrivalTime,DepartureCity,ArrivalCity
SeatNumber,Date,FlightNumber→PassengerID,ClassType,Price
ClassType→Price
PassengerID→DepartureCity

PreviousFunctionalDependencies

Fig1.E.1 ModifiedDependencies

Fig1.E.2 Result

11
Fig1.E.3

Fig1.E.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.E.2
showsnewFunctionalDependencieswhichshowsthatifyouhaveinformation
abouttheflightnumber,date,andseatnumber,youcandeterminethedetailsrelated
tothatspecificbooking,includingthedepartureandarrivaltimes,cities,passenger
ID,classtype,andpriceassociatedwiththatbooking.Fig1.E.3showstheresultthat
newFDsareinBCNF.

F.6.UniversityEnrolmentSystem:
-Giventherelation:
Enrollments(StudentID,CourseCode,Semester,Grade,InstructorID,CourseName,
CourseCredits,Department) -Functionaldependenciesare:
StudentID,CourseCode,Semester→Grade,InstructorID
CourseCode→CourseName,CourseCredits,Department
InstructorID,CourseCode→Department
InstructorID→Department
PreviousFunctionalDependencies

Fig1.F.1

ModifiedDependencies

12
Fig1.F.2 Result

Fig1.F.3
Fig1.F.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.F.2
showsnewFunctionalDependencieswhichshowsthatforagivenstudent,a
specificcourseinaparticularsemesteruniquelydeterminesthegradereceivedby
thestudent,theinstructorteachingthecourse,andthenumberofcreditsassociated withthecourse.
Itmeansthatforagivenstudenttakingaspecificcourseinaparticularsemester
withaparticularinstructor,thereisonlyonedepartmenttowhichthecoursebelongs
andonespecificnameforthecourse.Fig1.F.3showstheresultthatnewFDsarein BCNF.

G.MusicStreamingPlatform:
-Fortherelation:
UserPlays(UserID,SongID,Date,ArtistName,Album,Genre,PlayCount,
SubscriptionType)
-Functionaldependenciesare:
UserID,SongID,Date→PlayCount
SongID→ArtistName,Album,Genre
UserID→SubscriptionType
ArtistName,Album→Genre

13
PreviousFunctionalDependencies

Fig1.G.1 ModifiedDependencies

Fig1.G.2 Result

Fig1.G.3
Fig1.G.1showspreviousFunctionalDependencieswhicharenotinBCNF.Fig1.G.2
showsnewFunctionalDependencieswhichshowsthatforagivenuser,listeningto
aspecificsongonaparticulardateuniquelydeterminesvariousattributesrelatedto

14
thatlisteningevent,suchashowmanytimesthesongwasplayed(PlayCount),the
typeofsubscriptiontheuserhas(SubscriptionType),thenameoftheartist,the
album,andthegenreofthesong.Fig1.G.3showstheresultthatnewFDsarein BCNF.

H.RealEstateSystem:
-Fortherelation:
PropertyListings(PropertyID,OwnerID,AgentID,Price,Location,HouseType,
NumberOfRooms,AgentName,CommissionRate) -Functionaldependenciesare:
PropertyID→Price,Location,HouseType,NumberOfRooms,OwnerID,AgentID
AgentID→AgentName,CommissionRate HouseType→NumberOfRooms

PreviousFunctionalDependencies

Fig1.H.1 ModifiedDependencies

15
Fig1.H.2 Result

Fig1.H.3
Fig1.H.1showspreviousFunctionalDependencieswhic
harenotinBCNF.Fig1.H.2
showsnewFunctionalDependencieswhichshowsthat
eachpropertyinthetableis
uniquelyidentifiedbyitsPropertyID,andforeachPrope
rtyID,thereisafixedprice,
location,housetype,ownerID,andagentIDassociated
withit.
Itmeansthateachagentassignedtoaspecificpropertyisuniquelyidentifiedby
theirAgentID,andforeachcombinationofAgentIDandPropertyID,thereisafixed
namefortheagentandafixedcommissionrateassociatedwiththatagent's
involvementinthatpropertytransaction.
Itmeansthatthenumberofroomsinapropertyisuniquelydeterminedbythe
combinationofitsPropertyIDandHouseType.Fig1.H.3showstheresultthatnew FDsareinBCNF.

Que2DesignaBCNFNormalizedDatabaseandverifyusingGriffithTool.
AnsDatabaseisFlightReservationSystem.

16
Fig2.1
Fig2.1showsthedesignofairlinereservationsystemdatabase.
FunctionalDependenciesare:
FlightsTable:
• Flight_ID->Source_Airport_ID
• Flight_ID->Destination_Airport_ID
• Flight_ID->Departure_Time
• Flight_ID->Arrival_Time

• Flight_ID->Airplane_Type AirportsTable: Airport_Code->Airport_Name


Airport_Code->Airport_City
Airport_Code->Airport_Country PassengerTable:
• Customer_ID->Name
• Customer_ID->Email
• Customer_ID->Phone_No
• Customer_ID->Address BookingsTable:

Booking_ID->Flight_ID
Booking_ID->Passenger_ID
Booking_ID->Date_of_Booking PaymentsTable:
• Payment_ID->Booking_ID
• Payment_ID->Amount_Paid

17
• Payment_ID->Payment_Date
VerificationUsingGriffithTool

Fig2.2
Result
Fig2.2showsthatEachTableisinBCNF.

18
Practical-3
Aim:-CreateProcedures,TriggersandCursors
Que1WriteastoredprocedurenamedUpdateCountryPopulationthat
updatesthepopulationofagivencountrybasedonaprovidedcountry
codeandnewpopulationvalue.Additionally,theprocedureshouldlog
theoldandnewpopulationvaluestoapopulation_change_logtable. Ans
DELIMITER//
CREATEPROCEDUREUpdateCountryPopulation(INCountryCodeCHAR(3),IN
NewPopulationINT)
BEGIN
DECLAREOldPopulationINT;

--Gettheoldpopulation
SELECTPopulationINTOOldPopulation
FROMcountry
WHERECode=CountryCode;

--Updatethepopulation
UPDATEcountry
SETPopulation=NewPopulation WHERECode=CountryCode;

--Logthepopulationchange
INSERTINTOpopulation_change_log(CountryCode,OldPopulation,
NewPopulation,ChangeDate)
VALUES(CountryCode,OldPopulation,NewPopulation,NOW());--NOW()isused
forthecurrenttimestampinMySQL
END//
DELIMITER;

CALLUpdateCountryPopulation('USA',350000000);

Fig3.1
Fig3.1showspopulation_change_logtablewhichhasoldpopulation,new
populationanddateofchange.
Que2Developatriggernamedafter_country_insertthatchecksifthe
insertedcountry'spopulationexceeds1million.Ifitdoes,inserta
recordintoahigh_population_countriestable.

19
Ans
CREATETRIGGERafter_country_insert
AFTERINSERTONcountry
FOREACHROW
BEGIN
DECLARECountryPopulationINT;

--Getthepopulationoftheinsertedcountry
SELECTPopulationINTOCountryPopulation
FROMcountry
WHERECode=NEW.Code;

--Checkifpopulationexceeds1million
IFCountryPopulation>1000000THEN
--Insertintohigh_population_countriestable
INSERTINTOhigh_population_countries(CountryCode,Population)
VALUES(NEW.Code,CountryPopulation);
ENDIF; END//
DELIMITER;

INSERTINTOcountry(Code,Population)VALUES('ABC',1500000);

select*fromhigh_population_countries;

Fig3.2
Fig3.2showshigh_population_countriestablewithcountrycodeandpopulation.
Que3DevelopaprocedureAdjustCityPopulationsusingacursorthat
decreasesthepopulationby10%forallcitiesinagivencountrycode,
providedthecurrentpopulationisbetween500,000and1million.
Additionally,logthesechangestoacity_population_adjustmentstable
withcityID,oldpopulation,andnewpopulation.
Ans
DELIMITER//
CREATEPROCEDUREAdjustCityPopulations(INCountryCodeCHAR(3))
BEGIN
DECLAREdoneINTDEFAULTFALSE;
DECLARECityIDINT;
DECLAREOldPopulationINT;
DECLARENewPopulationINT;
--Declarecursor

20
DECLAREcity_cursorCURSORFOR
SELECTCityID,Population
FROMcity
WHERECountryCode=CountryCode
ANDPopulationBETWEEN500000AND1000000;

--Declarehandlerfornomorerows
DECLARECONTINUEHANDLERFORNOTFOUNDSETdone=TRUE;

--Openthecursor
OPENcity_cursor;

--Startloopingthroughthecursor
adjust_loop:LOOP --Fetchtherow
FETCHcity_cursorINTOCityID,OldPopulation;

--Checkifnomorerows IFdoneTHEN
LEAVEadjust_loop;
ENDIF;

--Calculatenewpopulation(decreaseby10%)
SETNewPopulation=ROUND(OldPopulation*0.9,0);

--Updatecitypopulation
UPDATEcity
SETPopulation=NewPopulation WHERECityID=CityID;

--Logpopulationadjustment
INSERTINTOcity_population_adjustment(CityID,OldPopulation,
NewPopulation,AdjustmentDate)
VALUES(CityID,OldPopulation,NewPopulation,NOW()); ENDLOOPadjust_loop;

--Closethecursor
CLOSEcity_cursor;
END//
DELIMITER;
CALLAdjustCityPopulations('USA'); select*fromcity_population_adjustment;

21
Fig3.3
Fig3.3showscity_population_adjustmenttablewhichrecordthepopulation
statisticsanddateofchange.

22
Practical-4
Aim:-Writeprogramstoimplementandunderstandusageof Datamarts.

Question1:Designadatamartforabanktostorethecredithistoryof
customersinabank.Usethiscreditprofilingtoprocessfutureloan
applications.(Suggestivetables:CustomerProfile,accounts,loans,
creditcards,paymenthistorytable,inquiries,Collections,CreditScore History). Ans
createdatabasebank;

createtablecustomer_profile(customer_idintprimarykey,first_name
varchar(25),last_namevarchar(25),d_o_bdate,addressvarchar(50),phone_no
int,emailvarchar(25),incomeint);

createtableaccounts(account_idintprimarykey,customer_idint,accounttype
varchar(25),dateofopendate,accountstatusvarchar(25),foreignkey(customer_id)
referencescustomer_profile(customer_id),balanceint);

createtableloans(loan_idintprimarykey,customer_idint,loantype
varchar(25),loanamountint,termint,interest_ratedecimal(4,2),loanstatus
varchar(25),foreignkey(customer_id)referencescustomer_profile(customer_id));

createtablecreditcards(card_idintprimarykey,customer_idint,cardtype
varchar(25),creditlimitdecimal(10,2),cardissuedatedate,foreignkey(customer_id)
referencescustomer_profile(customer_id),currentbalancedecimal(10,2));

createtablepaymenthistory(payment_idintprimarykey,customer_idint,account_id
int,paymentamountdecimal(10,2),paymentdatedate,foreignkey(customer_id)
referencescustomer_profile(customer_id),foreignkey(account_id)references
accounts(account_id));

createtableinquiries(inquiry_idintprimarykey,customer_idint,inquirydate
date,inquirytypevarchar(25),foreignkey(customer_id)references
customer_profile(customer_id));

createtablecollections(collection_idintprimarykey,customer_idint,collectiondate
date,collectiontypevarchar(25),amountint,foreignkey(customer_id)references
customer_profile(customer_id));

createtablecredit_score_history(creditscore_idintprimarykey,customer_id
int,creditscoreint,scoredatedate,foreignkey(customer_id)references
customer_profile(customer_id)); --DATAMART:

23
createtablecustomerrisk(customer_idintprimarykey,riskcategoryvar
char(25));
insertintocustomerrisk(customer_id,riskcategory)selectc.customer_id,case
whenc.income>75000andsum(a.balance)>100000then'lowrisk'
whenc.income>50000andsum(a.balance)>60000then'moderaterisk' else'highrisk'
endasriskcategory
fromcustomer_profilecjoinaccountsaonc.customer_id=a.customer_idgroupby c.customer_id;

Fig4.1
InFig4.1,itshowsthatitdividesthecustomersintodifferentriskcategorybaseon
incomeandbalanceofcustomers.

createtableloanassessmentasselectc.customer_idas
customer_id,c.collectionstatusascollectionstatus,l.loanstatusasloanstatusfrom
collectionscjoinloanslonc.customer_id=l.customer_idwherecollectionstatus='ontime'andloanst
atus='paid_off';

Fig4.2
InFig4.2itshowstheresultofcustomerswhoseloanstatusispaidoffand collectionstatusisontime.

createtableloanpassasselectl.customer_idfromloanassessmentljoin
customerriskconl.customer_id=c.customer_idjoincredit_score_historychon
ch.customer_id=c.customer_idwherec.riskcategory='lowrisk'and ch.creditscore>750;

Fig4.3
InFig4.3itshowsthecustomerswhichhaslowriskcategoryhasloanstatusas
paidoffandontimeandcreditscoregreaterthan750.

CREATEPROCEDURELOAN_PASS_RESULT(INCUSTOMERIDINT) BEGIN
DECLAREMESSAGE_TEXTVARCHAR(50);
IFEXISTS(
SELECT1FROMloanpass

24
WHEREcustomer_id=CUSTOMERID
)THEN
SELECTCUSTOMERID,'PASSED'ASLOAN_ELIGIBILITY;
ELSE
SELECTCUSTOMERID,'REJECTED'ASLOAN_ELIGIBILITY;
ENDIF;
END//
DELIMITER; callLOAN_PASS_RESULT(1);

Output1

Fig4.4

callLOAN_PASS_RESULT(2); Output2

Fig4.5

RESULT:SuccessfullyimplementedandlearnttheusageofDatamarts.

25
PRACTICAL#5
Objective: Feature Selection and Variable Filtering.
Question#:

A) Select a dataset that has a minimum of 150 features.


B) Apply 3 Feature Selection Techniques
C) For each feature selection technique apply 3 machine learning models on it.
D) Compare the results.

TOOL USED: Weka


Feature Selection Technique->Gain Ratio->The gain ratio is a metric in decision trees that
balances the information gain with the intrinsic information of attributes, helping to select the best
attribute for splitting nodes.
No. of selelcted attribute-> 20

Algorithm: Naive Bayes->Probabilistic classification algorithm based on Bayes' theorem with an


assumption of independence between features

Fig 5.1 Naive Bayes with 20 attributes


Algorithm:Random tree->It works by building multiple decision trees during training, where each
tree is trained on a random subset of the training data and a random subset of the features.The
random trees vote on the final classification or regression output, and the most popular outcome is
chosen.Random Trees help reduce overfitting and improve accuracy, especially when dealing with
noisy or high-dimensional data

26
Fig 5.2 Random Tree with 20 attributes

Algorithm:AdaBoost->It works by combining multiple weak learners (typically decision trees) to


create a strong learner.t begins by assigning equal weights to all training samples. Then, it iteratively
trains weak learners, focusing more on incorrectly classified samples in each iteration.The predictions
of weak learners are combined through weighted voting, where more accurate learners have higher
weights. This process continues until a predetermined number of iterations is reached or until perfect
predictions are achieved

27
Fig 5.3 AdaBoost with 20 attributes

Feature Selection Technique->Gain Ratio

No. of selelcted attribute-> 40


Algorithm: Naive Bayes

Fig 5.4 Naive Bayes with 40 attributes

Algorithm: Random Tree

28
Fig 5.5 Random Tree with 40 attributes

Algorithm:AdaBoost
Fig 5.6

Fig 5.6 AdaBoost with 40 attributes

Feature Selection Technique->Gain Ratio

29
No. of selelcted attribute-> 50
Algorithm: Naive Bayes

Fig 5.7 Naive Bayes with 50 attributes

Algorithm: Random Tree

Fig 5.8 Random Tree with 50 attributes

Algorithm:AdaBoost

30
Fig 5.9 AdaBoost with 50 attributes

TOOL USED:- ORANGE

Orange is an open-source data visualization, analysis, and machine learning toolkit. It provides a user-
friendly interface for data preprocessing, exploration, visualization, and predictive modeling.Orange
offers a wide range of machine learning algorithms for classification, regression, clustering, and other
tasks. Users can easily compare and evaluate different algorithms using built-in evaluation widgets.

KNN->K-Nearest Neighbors (KNN) is a simple yet effective supervised machine learning algorithm
used for both classification and regression tasks. It's based on the idea that similar data points tend to
belong to the same class or have similar values.When making predictions for a new data point, KNN
calculates the distance between that point and all other points in the training dataset. Common distance
metrics include

31
Euclidean distance, Manhattan distance, or cosine similarity.

Fig 5.10

Feature Selection Technique->Gain Ratio

No. of selelcted attribute-> 20

No. of
selelcted

32
attribute-> 40

No. of

selelcted attribute-> 50

33
PRACTICAL#6

Aim :- Perform Associative Mining In Weka and Orange on large datasets

Theory :- To perform association mining on large datasets, algorithms such as Apriori or FP-
growth are employed. These algorithms efficiently extract frequent itemsets by iteratively
identifying patterns within transactional data. With the support of these algorithms, associations
between items can be discovered, aiding in tasks such as market basket analysis or
recommendation systems. Efficient implementation and optimization are crucial for handling the
computational complexity posed by large datasets, ensuring scalability and practical applicability
in real-world scenarios.

PROCEDURE:

USING WEKA TOOL:

Scenario#1:WITH VALUE OF SUPPORT = 0.3 AND CONFIDENCE = 0.5

Fig 6.1 he rules found based on Support = 0.3 and Confidence = 0.5

34
The Apriori method, with a minimum support of 0.3 and a minimum confidence of 0.5 over 11
cycles, produced 2082 instances in Figure 6.1. Large itemset sets were produced by it; L(1)
contained 13 sets, while L(2) contained 7. Among the notable rules are those describing
associations, such biscuits leading to cake and bread or fruit leading to cake and bread.

Scenario#2:WITH VALUE OF SUPPORT = 0.5 AND CONFIDENCE = 0.7

Fig 6.2 The rules found based on Support = 0.5 and Confidence = 0.7

The Apriori method, with a minimum support of 0.5 and a minimum confidence of 0.7 over 10
cycles, produced 2314 instances in Figure 6.2. Large itemset sets were produced by it. L(1)
contained 10 sets, while L(2) contained 2.
Some Associations are milk-cream to bread and cake or fruit to bread and cake

Scenario#3:WITH VALUE OF SUPPORT = 0.3 AND CONFIDENCE = 0.7

Fig 6.3 The rules found based on Support = 0.7 and Confidence = 0.9

35
In Figure 6.3, no rules were found in this iteration of the Apriori method, with a minimum
support of 0.7 and a minimum confidence of 0.9 applied.This finding might be explained by
the strict confidence and support standards that were established, which might have led to too
few examples satisfying these requirements to create meaningful correlations. The lack of rules
implies that there might not be frequent itemsets in the dataset that meet the designated
confidence and support requirements.

36

You might also like