[Link]
usp=sharing
from [Link] import drive
[Link]('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call [Link]("/content/drive", force_
import openpyxl
def to_csv(filename):
## opening the xlsx file
xlsx = openpyxl.load_workbook('/content/drive/My Drive/Colab Notebooks/'+filename)
## opening the active sheet
sheet = [Link]
## getting the data from the sheet
data = [Link]
## creating a csv file
csv = open('/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv'), "w+")
for row in data:
l = list(row)
for i in range(len(l)):
if i == len(l) - 1:
[Link](str(l[i].value))
else:
[Link](str(l[i].value) + ',')
[Link]('\n')
## close the csv file
[Link]()
return '/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv')
import pandas as pd
df1 = pd.read_csv(to_csv('[Link]'))
df2 = pd.read_csv(to_csv('[Link]'))
df1 = [Link](['User'], axis=1).to_numpy()
df2 = [Link](['User'], axis=1)
print(df1[:5])
print([Link]())
[['citrus fruit' 'semi-finished bread' 'margarine' 'ready soups' 'None'
'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None']
['tropical fruit' 'yogurt' 'coffee' 'None' 'None' 'None' 'None' 'None'
'None' 'None' 'None' 'None' 'None']
['whole milk' 'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None'
'None' 'None' 'None' 'None']
['pip fruit' 'yogurt' 'cream cheese ' 'meat spreads' 'None' 'None'
'None' 'None' 'None' 'None' 'None' 'None' 'None']
['other vegetables' 'whole milk' 'condensed milk'
'long life bakery product' 'None' 'None' 'None' 'None' 'None' 'None'
'None' 'None' 'None']]
Trans 1 Trans 2 ... Trans 12 Trans 13
0 citrus fruit semi-finished bread ... None None
1 tropical fruit yogurt ... None None
2 whole milk None ... None None
3 pip fruit yogurt ... None None
4 other vegetables whole milk ... None None
[5 rows x 13 columns]
1. Tentukan dataset mana yang cocok untuk FP dan mana yang cocok
untuk SP
Dataset 1 cocok menggunakan Frequent Pattern Mining karena setiap row merepresentasikan item yang dibeli pada
satu transaksi
Dataset 2 cocok menggunakan Sequential Pattern Mining karena setiap row merepresentasikan urutan item yang
dibeli antar transaksi
2. Simpulkan karakteristik data seperti apa yang cocok untuk untuk FP
dan untuk SP
Frequent Pattern cocok untuk data dengan karakteristik yang terdiri dari kumpulan item yang terlibat dalam satu
kejadian/event
Sequential Pattern cocok untuk data dengan karakteristik yang terdiri dari urutan kejadian/event yang
berkesinambungan
!pip install apyori
from apyori import apriori
results = list(apriori(df1, min_support=0.075))
for item in results:
pair = item[0]
if (len(pair) >= 2):
items = [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])
print("Support: " + str(item[1]))
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")
Support: 0.10137264870360956
Confidence: 0.10137264870360956
Lift: 1.0
=====================================
Rule: None -> canned beer
Support: 0.0753431621759024
Confidence: 0.0753431621759024
Lift: 1.0
=====================================
Rule: None -> other vegetables
Support: 0.1708185053380783
Confidence: 0.1708185053380783
Lift: 1.0
=====================================
Rule: None -> pastry
Support: 0.07839349262836807
Confidence: 0.07839349262836807
Lift: 1.0
=====================================
Rule: None -> rolls/buns
Support: 0.1721403152008134
Confidence: 0.1721403152008134
Lift: 1.0
=====================================
Rule: None -> root vegetables
Support: 0.09364514489069649
Confidence: 0.09364514489069649
Lift: 1.0
=====================================
Rule: None -> sausage
Support: 0.08378240976105744
Confidence: 0.08378240976105744
Lift: 1.0
=====================================
Rule: None -> shopping bags
Support: 0.09100152516522624
Confidence: 0.09100152516522624
Lift: 1.0
=====================================
Rule: None -> soda
Support: 0.1610574478901881
Confidence: 0.1610574478901881
Lift: 1.0
Rule: None -> tropical fruit
Support: 0.09049313675648195
Confidence: 0.09049313675648195
Lift: 1.0
=====================================
Rule: None -> whole milk
Support: 0.23202846975088967
Confidence: 0.23202846975088967
Lift: 1.0
=====================================
Rule: None -> yogurt
Support: 0.12191154041687849
Confidence: 0.12191154041687849
Lift: 1.0
=====================================
!pip install spmf
from spmf import Spmf
# spmf = Spmf("GSP",
# input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
# spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
# output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
# arguments=[0.7, 100])
spmf = Spmf("PrefixSpan",
input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
arguments=[0.7, 100])
[Link]()
print(spmf.to_pandas_dataframe(pickle=True))
spmf.to_csv("/content/drive/My Drive/Colab Notebooks/[Link]")
Requirement already satisfied: spmf in /usr/local/lib/python3.7/dist-packages (1.4)
>/content/drive/My Drive/Colab Notebooks/[Link]
============= PREFIXSPAN 0.99-2016 - STATISTICS =============
Total time ~ 8 ms
Frequent sequences count : 0
Max memory (mb) : 0.0
minsup = 1 sequences.
Pattern count : 0
===================================================
Empty DataFrame
Columns: []
Index: []
3. Berikan analisis perbedaan antar algoritma SP yang digunakan
Hasil penelitian menunjukan bahwa algoritme GSP bekerja lebih baik pada minimum support yang tinggi sedangkan
algoritme prefixspan bekerja lebih baik pada minimum support yang rendah
4. Berikan analisis pengaruh minimum support terhadap hasil
pengolahan data
Untuk algoritma GSP
Semakin kecil minimum support yang diberikan, semakin banyak informasi/kesimpulan yang berhasil di-mining,
namun informasinya lebih general dan kurang spesifik.
Semakin besar minimum support yang diberikan, semakin sedikit informasi/kesimpulan yang berhasil di-
mining, namun informasi yang diberikan lebih spesifik.
Untuk algoritma PrefixSpan
Sebaliknya
check 2s completed at 4:40 PM