0% found this document useful (0 votes)

71 views4 pages

Excel to CSV Conversion for Data Mining

The document demonstrates how to convert an Excel file to CSV format and then analyze transactional data using frequent pattern mining and sequential pattern mining algorithms. It loads two datasets, applies preprocessing steps, then uses the Apriori algorithm to find frequent patterns in Dataset 1 and PrefixSpan algorithm to find sequential patterns in Dataset 2. The results are analyzed to compare the algorithms and understand the effect of minimum support.

Uploaded by

Muhammad Angga Risfanani (Angga)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views4 pages

Excel to CSV Conversion for Data Mining

Uploaded by

Muhammad Angga Risfanani (Angga)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

[Link]

usp=sharing

from [Link] import drive
[Link]('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call [Link]("/content/drive", force_

import openpyxl

def to_csv(filename):
## opening the xlsx file
xlsx = openpyxl.load_workbook('/content/drive/My Drive/Colab Notebooks/'+filename)

## opening the active sheet
sheet = [Link]

## getting the data from the sheet
data = [Link]

## creating a csv file
csv = open('/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv'), "w+")

  for row in data:
      l = list(row)
      for i in range(len(l)):
          if i == len(l) - 1:
              [Link](str(l[i].value))
          else:
              [Link](str(l[i].value) + ',')
      [Link]('\n')

## close the csv file
[Link]()

return '/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv')

import pandas as pd

df1 = pd.read_csv(to_csv('[Link]'))

df2 = pd.read_csv(to_csv('[Link]'))

df1 = [Link](['User'], axis=1).to_numpy()

df2 = [Link](['User'], axis=1)

print(df1[:5])

print([Link]())

[['citrus fruit' 'semi-finished bread' 'margarine' 'ready soups' 'None'

'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None']

['tropical fruit' 'yogurt' 'coffee' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None' 'None' 'None']

['whole milk' 'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None' 'None']

['pip fruit' 'yogurt' 'cream cheese ' 'meat spreads' 'None' 'None'

'None' 'None' 'None' 'None' 'None' 'None' 'None']

['other vegetables' 'whole milk' 'condensed milk'

'long life bakery product' 'None' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None']]

Trans 1 Trans 2 ... Trans 12 Trans 13

0 citrus fruit semi-finished bread ... None None

1 tropical fruit yogurt ... None None

2 whole milk None ... None None

3 pip fruit yogurt ... None None

4 other vegetables whole milk ... None None

[5 rows x 13 columns]

1. Tentukan dataset mana yang cocok untuk FP dan mana yang cocok
untuk SP

Dataset 1 cocok menggunakan Frequent Pattern Mining karena setiap row merepresentasikan item yang dibeli pada
satu transaksi
Dataset 2 cocok menggunakan Sequential Pattern Mining karena setiap row merepresentasikan urutan item yang
dibeli antar transaksi
2. Simpulkan karakteristik data seperti apa yang cocok untuk untuk FP
dan untuk SP

Frequent Pattern cocok untuk data dengan karakteristik yang terdiri dari kumpulan item yang terlibat dalam satu
kejadian/event
Sequential Pattern cocok untuk data dengan karakteristik yang terdiri dari urutan kejadian/event yang
berkesinambungan

!pip install apyori

from apyori import apriori

results = list(apriori(df1, min_support=0.075))

for item in results:

pair = item[0]

if (len(pair) >= 2):

items = [x for x in pair]

print("Rule: " + items[0] + " -> " + items[1])

print("Support: " + str(item[1]))

print("Confidence: " + str(item[2][0][2]))

print("Lift: " + str(item[2][0][3]))

print("=====================================")

Support: 0.10137264870360956

Confidence: 0.10137264870360956

Lift: 1.0

=====================================

Rule: None -> canned beer

Support: 0.0753431621759024

Confidence: 0.0753431621759024

Lift: 1.0

=====================================

Rule: None -> other vegetables

Support: 0.1708185053380783

Confidence: 0.1708185053380783

Lift: 1.0

=====================================

Rule: None -> pastry

Support: 0.07839349262836807

Confidence: 0.07839349262836807

Lift: 1.0

=====================================

Rule: None -> rolls/buns

Support: 0.1721403152008134

Confidence: 0.1721403152008134

Lift: 1.0

=====================================

Rule: None -> root vegetables

Support: 0.09364514489069649

Confidence: 0.09364514489069649

Lift: 1.0

=====================================

Rule: None -> sausage

Support: 0.08378240976105744

Confidence: 0.08378240976105744

Lift: 1.0

=====================================

Rule: None -> shopping bags

Support: 0.09100152516522624

Confidence: 0.09100152516522624

Lift: 1.0

=====================================

Rule: None -> soda

Support: 0.1610574478901881

Confidence: 0.1610574478901881

Lift: 1.0

Rule: None -> tropical fruit

Support: 0.09049313675648195

Confidence: 0.09049313675648195

Lift: 1.0

=====================================

Rule: None -> whole milk

Support: 0.23202846975088967

Confidence: 0.23202846975088967

Lift: 1.0

=====================================

Rule: None -> yogurt

Support: 0.12191154041687849

Confidence: 0.12191154041687849

Lift: 1.0

=====================================

!pip install spmf
from spmf import Spmf

# spmf = Spmf("GSP",
#             input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
#             spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
#             output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
#             arguments=[0.7, 100])

spmf = Spmf("PrefixSpan",
            input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
            spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
            output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
            arguments=[0.7, 100])
[Link]()
print(spmf.to_pandas_dataframe(pickle=True))
spmf.to_csv("/content/drive/My Drive/Colab Notebooks/[Link]")

Requirement already satisfied: spmf in /usr/local/lib/python3.7/dist-packages (1.4)

>/content/drive/My Drive/Colab Notebooks/[Link]

============= PREFIXSPAN 0.99-2016 - STATISTICS =============

Total time ~ 8 ms

Frequent sequences count : 0

Max memory (mb) : 0.0

minsup = 1 sequences.

Pattern count : 0

===================================================

Empty DataFrame

Columns: []

Index: []

3. Berikan analisis perbedaan antar algoritma SP yang digunakan

Hasil penelitian menunjukan bahwa algoritme GSP bekerja lebih baik pada minimum support yang tinggi sedangkan
algoritme prefixspan bekerja lebih baik pada minimum support yang rendah
4. Berikan analisis pengaruh minimum support terhadap hasil
pengolahan data

Untuk algoritma GSP

Semakin kecil minimum support yang diberikan, semakin banyak informasi/kesimpulan yang berhasil di-mining,
namun informasinya lebih general dan kurang spesifik.
Semakin besar minimum support yang diberikan, semakin sedikit informasi/kesimpulan yang berhasil di-
mining, namun informasi yang diberikan lebih spesifik.
Untuk algoritma PrefixSpan

Sebaliknya
check 2s completed at 4:40 PM

Vinay Kumar Kannegala Siddalingappa HW4D
No ratings yet
Vinay Kumar Kannegala Siddalingappa HW4D
1 page
Assignment 6
No ratings yet
Assignment 6
7 pages
Data Cleaning
No ratings yet
Data Cleaning
22 pages
Indexdw
No ratings yet
Indexdw
34 pages
DMCT Assgn 4 ROLL - 507
No ratings yet
DMCT Assgn 4 ROLL - 507
51 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
1 Lab Program 3 2 Vinay Sirohi 3 2139472: December 1, 2021
No ratings yet
1 Lab Program 3 2 Vinay Sirohi 3 2139472: December 1, 2021
6 pages
Algorithm
No ratings yet
Algorithm
8 pages
FP Growth Algorithm Guide
No ratings yet
FP Growth Algorithm Guide
6 pages
MBA RETAIL - Ipynb - Colab
No ratings yet
MBA RETAIL - Ipynb - Colab
3 pages
Data Mining Lab Record
No ratings yet
Data Mining Lab Record
18 pages
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
No ratings yet
CSC - 310 Advanced Python Programming Continuous Assessment-2 Assignment:Ca2
33 pages
Data Analysis Exercises for Beginners
No ratings yet
Data Analysis Exercises for Beginners
43 pages
Lab File
No ratings yet
Lab File
96 pages
4249 DM Lab-7,8
No ratings yet
4249 DM Lab-7,8
3 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
AI Lab 04 Lab Tasks
No ratings yet
AI Lab 04 Lab Tasks
18 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
Experiment No. 9
No ratings yet
Experiment No. 9
9 pages
Ass 2
No ratings yet
Ass 2
3 pages
Student Copy of Apriori Example - Colaboratory
No ratings yet
Student Copy of Apriori Example - Colaboratory
14 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
17 pages
Weantuday: T Deuhh Anytha
No ratings yet
Weantuday: T Deuhh Anytha
23 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Data Frames Pandas, Handout 1
No ratings yet
Data Frames Pandas, Handout 1
16 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
Apriori Algorithm for Itemset Mining
No ratings yet
Apriori Algorithm for Itemset Mining
28 pages
DMT Cia2
No ratings yet
DMT Cia2
11 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
10 pages
Pds
No ratings yet
Pds
3 pages
BIG Mart Data Analyst Project
No ratings yet
BIG Mart Data Analyst Project
19 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Python Data Science Cheat Sheet
0% (1)
Python Data Science Cheat Sheet
3 pages
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
No ratings yet
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
6 pages
Association Rules Ans
No ratings yet
Association Rules Ans
28 pages
Document 1116
No ratings yet
Document 1116
6 pages
Vertopal Review: Pizza Brand Analysis
No ratings yet
Vertopal Review: Pizza Brand Analysis
4 pages
E-Note 28879 Content Document 20241209125940PM
No ratings yet
E-Note 28879 Content Document 20241209125940PM
20 pages
1 Lab Program 4 2 Vinay Sirohi 3 2139472: December 1, 2021
No ratings yet
1 Lab Program 4 2 Vinay Sirohi 3 2139472: December 1, 2021
6 pages
Oxy Metre
No ratings yet
Oxy Metre
17 pages
Tennis Dataset Decision Tree Code
No ratings yet
Tennis Dataset Decision Tree Code
3 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Data - Analytics Lab - Manual JNTUH R22 Regulation
No ratings yet
Data - Analytics Lab - Manual JNTUH R22 Regulation
26 pages
Apr 2023
No ratings yet
Apr 2023
32 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
LAB FILE-Shelly Sharma
No ratings yet
LAB FILE-Shelly Sharma
47 pages
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
No ratings yet
Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels
7 pages
Data Munging with Pandas in Python
No ratings yet
Data Munging with Pandas in Python
4 pages
DataAnalytics Practical3
No ratings yet
DataAnalytics Practical3
3 pages
IP Practical PRGM
No ratings yet
IP Practical PRGM
41 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
12 pages
Hibhi
No ratings yet
Hibhi
3 pages
Research Questions and Objectives Explained
No ratings yet
Research Questions and Objectives Explained
8 pages
Hydroponics: Benefits and Growing Guide
No ratings yet
Hydroponics: Benefits and Growing Guide
5 pages
JSS2 2ND Maths
No ratings yet
JSS2 2ND Maths
2 pages
Matched Filter: Definition and Applications
No ratings yet
Matched Filter: Definition and Applications
5 pages
Starters
No ratings yet
Starters
5 pages
Lie Algebra 2
No ratings yet
Lie Algebra 2
5 pages
Editing in The Translation Process
No ratings yet
Editing in The Translation Process
10 pages
TransPalatalArch SIO1995 J.robettalo
100% (2)
TransPalatalArch SIO1995 J.robettalo
11 pages
Service Bulletin: Service Bulletin NUMBER: 8.7/133B Caterpillar: Confidential Green Page 1 of 11
No ratings yet
Service Bulletin: Service Bulletin NUMBER: 8.7/133B Caterpillar: Confidential Green Page 1 of 11
13 pages
Mining Full Forms and Abbreviations
100% (1)
Mining Full Forms and Abbreviations
7 pages
The Magical Garden Adventure Published
No ratings yet
The Magical Garden Adventure Published
35 pages
Manuale Uso Condizionatore Portatile Adler AD 7909
No ratings yet
Manuale Uso Condizionatore Portatile Adler AD 7909
128 pages
Nick Drake - Pink Moon
No ratings yet
Nick Drake - Pink Moon
2 pages
Activity 62 A Extra Evolution Theories
No ratings yet
Activity 62 A Extra Evolution Theories
3 pages
Overview of Artificial Intelligence
No ratings yet
Overview of Artificial Intelligence
181 pages
LASU Part-Time Course Registration 2023
No ratings yet
LASU Part-Time Course Registration 2023
1 page
Buy 5-Star Fiverr Reviews For Instant Trust
No ratings yet
Buy 5-Star Fiverr Reviews For Instant Trust
6 pages
Determining Integration Constants in Beams
No ratings yet
Determining Integration Constants in Beams
37 pages
Numerical Method For Engineers-Chapter 10
100% (3)
Numerical Method For Engineers-Chapter 10
22 pages
Core Task Chart
No ratings yet
Core Task Chart
3 pages
FINAL Consumer Satisfaction Towards Flipkart
No ratings yet
FINAL Consumer Satisfaction Towards Flipkart
23 pages
RCAR Bumper Test Procedure For Automobiles
100% (1)
RCAR Bumper Test Procedure For Automobiles
32 pages
Overview of Histology and Cell Structure
No ratings yet
Overview of Histology and Cell Structure
87 pages
Best Math Thesis Title
100% (3)
Best Math Thesis Title
7 pages
Cryptography and Network Security Guide
No ratings yet
Cryptography and Network Security Guide
3 pages
PTE Magazine - RS - V.1.0
No ratings yet
PTE Magazine - RS - V.1.0
30 pages
Income Generating Projectby Loida
No ratings yet
Income Generating Projectby Loida
12 pages
CC2 Transesssssss
No ratings yet
CC2 Transesssssss
7 pages
F13 Bus Timetable to Dubai Mall
No ratings yet
F13 Bus Timetable to Dubai Mall
2 pages
TKT Yl Sample Test
No ratings yet
TKT Yl Sample Test
16 pages

Excel to CSV Conversion for Data Mining

Uploaded by

Excel to CSV Conversion for Data Mining

Uploaded by

[Link]

[['citrus fruit' 'semi-finished bread' 'margarine' 'ready soups' 'None'

'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None']

['tropical fruit' 'yogurt' 'coffee' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None' 'None' 'None']

'None' 'None' 'None' 'None']

'None' 'None' 'None' 'None' 'None' 'None' 'None']

['other vegetables' 'whole milk' 'condensed milk'

'None' 'None' 'None']]

Trans 1 Trans 2 ... Trans 12 Trans 13

0 citrus fruit semi-finished bread ... None None

1 tropical fruit yogurt ... None None

2 whole milk None ... None None

3 pip fruit yogurt ... None None

4 other vegetables whole milk ... None None

Rule: None -> canned beer

Rule: None -> other vegetables

Rule: None -> pastry

Rule: None -> rolls/buns

Rule: None -> root vegetables

Rule: None -> sausage

Rule: None -> shopping bags

Rule: None -> soda

Rule: None -> tropical fruit

Rule: None -> whole milk

Rule: None -> yogurt

Requirement already satisfied: spmf in /usr/local/lib/python3.7/dist-packages (1.4)

>/content/drive/My Drive/Colab Notebooks/[Link]

============= PREFIXSPAN 0.99-2016 - STATISTICS =============

Frequent sequences count : 0

Max memory (mb) : 0.0

3. Berikan analisis perbedaan antar algoritma SP yang digunakan

Untuk algoritma GSP

You might also like