0% found this document useful (0 votes)
71 views4 pages

Excel to CSV Conversion for Data Mining

The document demonstrates how to convert an Excel file to CSV format and then analyze transactional data using frequent pattern mining and sequential pattern mining algorithms. It loads two datasets, applies preprocessing steps, then uses the Apriori algorithm to find frequent patterns in Dataset 1 and PrefixSpan algorithm to find sequential patterns in Dataset 2. The results are analyzed to compare the algorithms and understand the effect of minimum support.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views4 pages

Excel to CSV Conversion for Data Mining

The document demonstrates how to convert an Excel file to CSV format and then analyze transactional data using frequent pattern mining and sequential pattern mining algorithms. It loads two datasets, applies preprocessing steps, then uses the Apriori algorithm to find frequent patterns in Dataset 1 and PrefixSpan algorithm to find sequential patterns in Dataset 2. The results are analyzed to compare the algorithms and understand the effect of minimum support.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

[Link]

usp=sharing

from [Link] import drive
[Link]('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call [Link]("/content/drive", force_

import openpyxl

def to_csv(filename):
  ## opening the xlsx file
  xlsx = openpyxl.load_workbook('/content/drive/My Drive/Colab Notebooks/'+filename)

  ## opening the active sheet
  sheet = [Link]

  ## getting the data from the sheet
  data = [Link]

  ## creating a csv file
  csv = open('/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv'), "w+")

  for row in data:
      l = list(row)
      for i in range(len(l)):
          if i == len(l) - 1:
              [Link](str(l[i].value))
          else:
              [Link](str(l[i].value) + ',')
      [Link]('\n')

  ## close the csv file
  [Link]()

  return '/content/drive/My Drive/Colab Notebooks/'+[Link]('.xlsx', '.csv')

import pandas as pd

df1 = pd.read_csv(to_csv('[Link]'))

df2 = pd.read_csv(to_csv('[Link]'))

df1 = [Link](['User'], axis=1).to_numpy()

df2 = [Link](['User'], axis=1)

print(df1[:5])

print([Link]())

[['citrus fruit' 'semi-finished bread' 'margarine' 'ready soups' 'None'

'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None']

['tropical fruit' 'yogurt' 'coffee' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None' 'None' 'None']

['whole milk' 'None' 'None' 'None' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None' 'None']

['pip fruit' 'yogurt' 'cream cheese ' 'meat spreads' 'None' 'None'

'None' 'None' 'None' 'None' 'None' 'None' 'None']

['other vegetables' 'whole milk' 'condensed milk'

'long life bakery product' 'None' 'None' 'None' 'None' 'None' 'None'

'None' 'None' 'None']]

Trans 1 Trans 2 ... Trans 12 Trans 13

0 citrus fruit semi-finished bread ... None None

1 tropical fruit yogurt ... None None

2 whole milk None ... None None

3 pip fruit yogurt ... None None

4 other vegetables whole milk ... None None

[5 rows x 13 columns]

1. Tentukan dataset mana yang cocok untuk FP dan mana yang cocok
untuk SP

Dataset 1 cocok menggunakan Frequent Pattern Mining karena setiap row merepresentasikan item yang dibeli pada
satu transaksi
Dataset 2 cocok menggunakan Sequential Pattern Mining karena setiap row merepresentasikan urutan item yang
dibeli antar transaksi
2. Simpulkan karakteristik data seperti apa yang cocok untuk untuk FP
dan untuk SP

Frequent Pattern cocok untuk data dengan karakteristik yang terdiri dari kumpulan item yang terlibat dalam satu
kejadian/event
Sequential Pattern cocok untuk data dengan karakteristik yang terdiri dari urutan kejadian/event yang
berkesinambungan

!pip install apyori

from apyori import apriori

results = list(apriori(df1, min_support=0.075))

for item in results:

    pair = item[0] 

    if (len(pair) >= 2):

      items = [x for x in pair]

      print("Rule: " + items[0] + " -> " + items[1])

      print("Support: " + str(item[1]))

      print("Confidence: " + str(item[2][0][2]))

      print("Lift: " + str(item[2][0][3]))

      print("=====================================")

Support: 0.10137264870360956

Confidence: 0.10137264870360956

Lift: 1.0

=====================================

Rule: None -> canned beer

Support: 0.0753431621759024

Confidence: 0.0753431621759024

Lift: 1.0

=====================================

Rule: None -> other vegetables

Support: 0.1708185053380783

Confidence: 0.1708185053380783

Lift: 1.0

=====================================

Rule: None -> pastry

Support: 0.07839349262836807

Confidence: 0.07839349262836807

Lift: 1.0

=====================================

Rule: None -> rolls/buns

Support: 0.1721403152008134

Confidence: 0.1721403152008134

Lift: 1.0

=====================================

Rule: None -> root vegetables

Support: 0.09364514489069649

Confidence: 0.09364514489069649

Lift: 1.0

=====================================

Rule: None -> sausage

Support: 0.08378240976105744

Confidence: 0.08378240976105744

Lift: 1.0

=====================================

Rule: None -> shopping bags

Support: 0.09100152516522624

Confidence: 0.09100152516522624

Lift: 1.0

=====================================

Rule: None -> soda

Support: 0.1610574478901881

Confidence: 0.1610574478901881

Lift: 1.0

Rule: None -> tropical fruit

Support: 0.09049313675648195

Confidence: 0.09049313675648195

Lift: 1.0

=====================================

Rule: None -> whole milk

Support: 0.23202846975088967

Confidence: 0.23202846975088967

Lift: 1.0

=====================================

Rule: None -> yogurt

Support: 0.12191154041687849

Confidence: 0.12191154041687849

Lift: 1.0

=====================================

!pip install spmf
from spmf import Spmf

# spmf = Spmf("GSP",
#             input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
#             spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
#             output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
#             arguments=[0.7, 100])

spmf = Spmf("PrefixSpan",
            input_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
            spmf_bin_location_dir="/content/drive/My Drive/Colab Notebooks/",
            output_filename="/content/drive/My Drive/Colab Notebooks/[Link]",
            arguments=[0.7, 100])
[Link]()
print(spmf.to_pandas_dataframe(pickle=True))
spmf.to_csv("/content/drive/My Drive/Colab Notebooks/[Link]")

Requirement already satisfied: spmf in /usr/local/lib/python3.7/dist-packages (1.4)

>/content/drive/My Drive/Colab Notebooks/[Link]

============= PREFIXSPAN 0.99-2016 - STATISTICS =============

Total time ~ 8 ms

Frequent sequences count : 0

Max memory (mb) : 0.0

minsup = 1 sequences.

Pattern count : 0

===================================================

Empty DataFrame

Columns: []

Index: []

3. Berikan analisis perbedaan antar algoritma SP yang digunakan

Hasil penelitian menunjukan bahwa algoritme GSP bekerja lebih baik pada minimum support yang tinggi sedangkan
algoritme prefixspan bekerja lebih baik pada minimum support yang rendah
4. Berikan analisis pengaruh minimum support terhadap hasil
pengolahan data

Untuk algoritma GSP

Semakin kecil minimum support yang diberikan, semakin banyak informasi/kesimpulan yang berhasil di-mining,
namun informasinya lebih general dan kurang spesifik.
Semakin besar minimum support yang diberikan, semakin sedikit informasi/kesimpulan yang berhasil di-
mining, namun informasi yang diberikan lebih spesifik.
Untuk algoritma PrefixSpan

Sebaliknya
check 2s completed at 4:40 PM

You might also like