0% found this document useful (0 votes)

8 views1 page

Lazy Classification

The document outlines a data processing and machine learning workflow using Python libraries, focusing on customer satisfaction in the airline industry. It includes data preprocessing, model training with various algorithms, and evaluation metrics such as accuracy and F1 score. Additionally, it demonstrates the implementation of a lazy pipeline for model predictions and performance visualization.

Uploaded by

vxq5yrjf5z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views1 page

Lazy Classification

Uploaded by

vxq5yrjf5z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Импорт библиотек:

In [1]: import lazy_pipeline as lpipe

import [Link] as plt

import pandas as pd
import numpy as np
import time

# предобработка числовых признаков

from [Link] import KBinsDiscretizer

# используемые метрики
from [Link] import accuracy_score, f1_score

Оптимизация отображения ноутбука:

In [2]: from [Link] import display, HTML

display(HTML("<style>.container { width:75% !important; }</style>"))

Используемые версии библиотек:

In [3]: from platform import python_version

import sklearn

print(python_version())
print([Link].__version__)
print(sklearn.__version__)
print(pd.__version__)
print(np.__version__)

3.7.6
1.0.1
0.22.1
1.0.1
1.18.1

Запуск baseline
In [4]: def process_data(df):

# обработка датасета, замена пустых числовых значений средними по столбцу, категориальных значениями unknown
# оставляем не более 10 наиболее популярных значения каждого категориального признака

for col in df.select_dtypes(['number']).columns:

df[col] = df[col].fillna(df[col].mean())

for col in df.select_dtypes(['object']).columns:

df[col] = df[col].fillna('unknown')
use_values = df[col].value_counts().[Link][0:10]
df[col] = df[col].apply(lambda x: x if x in use_values else 'other')

return df

def discretize_data(df):

# категоризируем числовые признаки: разбиваем на 5 интервалов равной длины

est = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')

for col in df.select_dtypes(['number']).columns:
df[col] = est.fit_transform(df[[col]])

return df

def get_scores(y_preds, y_preds_fixedtrain):

# обернем в функцию расчет метрик из исходного ноутбука, принимаем метрики accuracy и f-score для дальнейшего использования

score_vals = {}
for score_f in [accuracy_score, f1_score]:
score_name = score_f.__name__
preds = y_preds
score_vals[score_name] = [score_f(y_test[:i], preds[:i]) for i in range(1, len(preds))]

score_name = score_f.name + '_fixedtrain'

preds = y_preds_fixedtrain
score_vals[score_name] = [score_f(y_test[:i], preds[:i]) for i in range(1, len(preds))]

return score_vals

def get_scores_info (score_vals, t_preds, t_preds_fixedtrain):

return {'accuracy_score' : [Link](score_vals['accuracy_score']),

'accuracy_score_fixedtrain' : [Link](score_vals['accuracy_score_fixedtrain']),
'f1_score' : [Link](score_vals['f1_score']),
'f1_score_fixedtrain' : [Link](score_vals['f1_score_fixedtrain']),
't_preds': [Link](t_preds),
't_preds_fixedtrain' : [Link](t_preds_fixedtrain)}

def plot_metrics(score_vals, t_preds, t_preds_fixedtrain):

# построение графиков метрик и времени расчета

[Link]['[Link]'] = (1,1,1,1)

fig, axs = [Link](2, 2, figsize=(12, 8))

for ax, t in zip(axs[0],['accuracy_score', 'f1_score']):

ax.set_ylim(0-0.05, 1+0.05)
[Link](range(n_train+1, len(X)), score_vals[t], label='baseline clf.')
[Link](range(n_train+1, len(X)), score_vals[t+'_fixedtrain'], label='baseline clf. (fixed train)')

axs[1,0].plot(range(n_train, len(X)), t_preds, label='baseline clf.')

axs[1,0].plot(range(n_train, len(X)), t_preds_fixedtrain, label='baseline clf. (fixed train)')

for (ax, t_verb, dim) in zip([Link](), ['Accuracy score', 'F1 score', 'Prediction time'], ['','','(secs.)']):
ax.set_title('\n'.join([f"{t_verb} progression", "w.r.t. the number of train examples"]), loc='left', size=18)
ax.set_xlabel('# of train examples', size=14)
ax.set_ylabel(f"{t_verb} {dim}".strip(), size=14)
[Link]()

axs[1,1].set_axis_off()
plt.tight_layout()
plt.subplots_adjust()

[Link]()

Подготовка данных

Используется открытый датасет об удовлетворенности клиентов самолетов

[Link]

In [5]: # считываем датасет, переводим целевую переменную в булев тип, убираем служебные столбцы

data = pd.read_csv('data/[Link]')
y_name = 'satisfaction'
data[y_name] = (data[y_name]=='satisfied')
data = [Link][:,2:]

print([Link])
[Link](5)

(25976, 23)

Out[5]:
Arrival
Inflight Ease of On- Leg Departure
Customer Type of Flight Departure/Arrival Gate Inflight Baggage Checkin Inflight Delay
Gender Age Class wifi Online ... board room Cleanliness Delay in
Type Travel Distance time convenient location entertainment handling service service in
service booking service service Minutes
Minutes

Loyal Business
0 Female 52 Eco 160 5 4 3 4 ... 5 5 5 5 2 5 5 50 44.0
Customer travel

Loyal Business
1 Female 36 Business 2863 1 1 3 1 ... 4 4 4 4 3 4 5 0 0.0
Customer travel

disloyal Business
2 Male 20 Eco 192 2 0 2 4 ... 2 4 1 3 2 2 2 0 0.0
Customer travel

Loyal Business
3 Male 44 Business 3377 0 0 0 2 ... 1 1 1 1 3 1 4 0 6.0
Customer travel

Loyal Business
4 Female 49 Eco 1182 2 3 4 3 ... 2 2 2 2 4 2 4 0 20.0
Customer travel

5 rows × 23 columns

Ограничим исследуемую выборку 500 объектами и обработаем данные:

In [6]: data = [Link](500, random_state = 1)

data = process_data(data)
data = discretize_data(data)

Бинаризация данных: one-hot кодирование

In [7]: y = data[y_name]
X = lpipe.binarize_X([Link](y_name, axis=1))
print([Link])
[Link](2)

(500, 99)

Out[7]:
Customer Customer Type of Departure Departure Departure Departure Departure Arrival Arrival Arrival Arrival Arrival
Gender: Gender: Type: Type: Age: Age: Age: Age: Age: Travel: Delay in Delay in Delay in Delay in Delay in Delay in Delay in Delay in Delay in Delay in
...
Female Male Loyal disloyal 0.0 1.0 2.0 3.0 4.0 Business Minutes: Minutes: Minutes: Minutes: Minutes: Minutes: Minutes: Minutes: Minutes: Minutes:
Customer Customer travel 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0

21362 True False True False False False False False True False ... True False False False False True False False False False

11437 False True False True False True False False False True ... True False False False False True False False False False

2 rows × 99 columns

Представление матрицы признаков как списка множеств:

In [8]: X_bin = [set([Link][x]) for idx, x in [Link]()]

X_bin[0]

Out[8]: {'Age: 4.0',

'Arrival Delay in Minutes: 0.0',
'Baggage handling: 3.0',
'Checkin service: 4.0',
'Class: Eco',
'Cleanliness: 3.0',
'Customer Type: Loyal Customer',
'Departure Delay in Minutes: 0.0',
'Departure/Arrival time convenient: 4.0',
'Ease of Online booking: 3.0',
'Flight Distance: 1.0',
'Food and drink: 2.0',
'Gate location: 0.0',
'Gender: Female',
'Inflight entertainment: 3.0',
'Inflight service: 3.0',
'Inflight wifi service: 3.0',
'Leg room service: 3.0',
'On-board service: 3.0',
'Online boarding: 3.0',
'Seat comfort: 3.0',
'Type of Travel: Personal Travel'}

Перевод целевой переменной в список:

In [9]: y = [Link]()

Предполагаем, что на начальном этапе у нас есть только 10% от общей выборки:

In [10]: n_train = int(len(X)*0.1)

n_test = len(X) - n_train
y_test = y[n_train:]

n_train, n_test

Out[10]: (50, 450)

Применение модели

In [11]: %%time
gen = lpipe.predict_array(X_bin, y, n_train, use_tqdm=True)
y_preds, t_preds = lpipe.apply_stopwatch(gen)

# обновляем обучающую выборку

Predicting step by step: 100%|███████████████████████████████████████████████████████| 500/500 [00:07<00:00, 71.22it/s]

Wall time: 7.02 s

In [12]: %%time
gen = list(lpipe.predict_array(X_bin, y, n_train, use_tqdm=True, update_train=False))
y_preds_fixedtrain, t_preds_fixedtrain = lpipe.apply_stopwatch(gen)

# не обновляем обучающую выборку

Predicting step by step: 100%|█████████████████████████████████████████████████████| 500/500 [00:00<00:00, 2110.59it/s]

Wall time: 241 ms

In [13]: scores = get_scores(y_preds, y_preds_fixedtrain)

plot_metrics(scores, t_preds, t_preds_fixedtrain)

In [14]: get_scores_info(scores, t_preds, t_preds_fixedtrain)

Out[14]: {'accuracy_score': 0.7562200315830624,

'accuracy_score_fixedtrain': 0.7212799822789326,
'f1_score': 0.7625785966832718,
'f1_score_fixedtrain': 0.7376621185160739,
't_preds': 0.01560578982035319,
't_preds_fixedtrain': 0.0}

Модификация алгоритма
In [16]: # используемые алгоритмы

import lightgbm as lgb

from lazy_pipeline import predict_with_generators
from [Link] import DecisionTreeClassifier

Вместо пересечения множеств используем функцию матричного умножения в numpy. Добавим возможность использовать алгоритмы (дерево решений и градиентный бустинг) в lazy
pipeline.

In [17]: def predict_with_dot(x, X_train, Y_train):

X_pos = [Link]([x_train for x_train, y in zip(X_train, Y_train) if y])

X_neg = [Link]([x_train for x_train, y in zip(X_train, Y_train) if not y])

pos_dot = [Link](x,X_pos.T).sum()
neg_dot = [Link](x,X_neg.T).sum()

return pos_dot > neg_dot

def predict_with_model(x, X_train, Y_train, use_model):

model = use_model
[Link](X_train,Y_train)
res = [Link]([x])

return res[0]

# используем для тестирования дерево решений, случайный лес, и градиентный бустинг (без какой-либо настройки параметров моделей)

def predict_with_tree(x, X_train, Y_train):

return predict_with_model(x, X_train, Y_train, DecisionTreeClassifier(max_depth = 6))

def predict_with_boosting(x, X_train, Y_train):

return predict_with_model(x, X_train, Y_train, [Link](max_depth =4, n_estimators = 100))

def train_lpipe(X, y, n_train, predict_function, update_train):

gen = lpipe.predict_array(X = X, Y = y, n_train = n_train, use_tqdm = False, predict_func = predict_function, update_train = update_train )
y_preds, t_preds = lpipe.apply_stopwatch(gen)

return y_preds, t_preds

def get_results(predict_function, X, y, n_train):

results = {}

results['model_name'] = predict_function.__name__

t_start = [Link]()
y_preds, t_preds = train_lpipe(X = X, y = y, n_train = n_train, predict_function = predict_function, update_train = True)
t_stop = [Link]()

results['upd_time'] = t_stop - t_start

t_start = [Link]()
y_preds_fixedtrain, t_preds_fixedtrain = train_lpipe(X = X, y = y, n_train = n_train, predict_function = predict_function, update_train = False)
t_stop = [Link]()

results['fixed_time'] = t_stop - t_start

scores = get_scores(y_preds, y_preds_fixedtrain)

scores_info = get_scores_info(scores, t_preds, t_preds_fixedtrain)

for key in scores_info:
results[key] = scores_info[key]

return results

In [18]: # переведем все данные в числовой формат

X_int = [Link](int).values

In [19]: df_results = [Link]([get_results(predict_function, X_int, y, n_train) for predict_function in [predict_with_dot, predict_with_tree, predict_with_boostin
g]])

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1515: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to n

o true nor predicted samples. Use `zero_division` parameter to control this behavior.
average, "true nor predicted", 'F-score is', len(true_sum)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1515: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to n
o true nor predicted samples. Use `zero_division` parameter to control this behavior.
average, "true nor predicted", 'F-score is', len(true_sum)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1515: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to n
o true nor predicted samples. Use `zero_division` parameter to control this behavior.
average, "true nor predicted", 'F-score is', len(true_sum)

In [20]: # итоговая таблица для сравнения

df_results

Out[20]:
model_name upd_time fixed_time accuracy_score accuracy_score_fixedtrain f1_score f1_score_fixedtrain t_preds t_preds_fixedtrain

0 predict_with_dot 0.148907 0.037977 0.747410 0.817870 0.581582 0.741756 0.000331 0.000084

1 predict_with_tree 0.727553 0.263840 0.836064 0.785045 0.803344 0.728630 0.001617 0.000584

2 predict_with_boosting 10.372624 4.307353 0.862641 0.807403 0.823205 0.762033 0.023050 0.009572

Airline Satisfaction Analysis
No ratings yet
Airline Satisfaction Analysis
34 pages
Dsbda Prac1
No ratings yet
Dsbda Prac1
1 page
Travel Data Analysis in Jupyter
No ratings yet
Travel Data Analysis in Jupyter
28 pages
Aviation Marketing Project - Capstone 1
100% (1)
Aviation Marketing Project - Capstone 1
25 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
ML5 Decision Tree Airline Safety
No ratings yet
ML5 Decision Tree Airline Safety
3 pages
Airline Booking Data Analysis
No ratings yet
Airline Booking Data Analysis
26 pages
Random Forest Model
No ratings yet
Random Forest Model
16 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
Customer Satisfaction Prediction with ML
No ratings yet
Customer Satisfaction Prediction with ML
42 pages
KNN Classifier on Digits Data
No ratings yet
KNN Classifier on Digits Data
3 pages
ML Lab 8
No ratings yet
ML Lab 8
9 pages
Airline Passenger Satisfaction Analysis
No ratings yet
Airline Passenger Satisfaction Analysis
55 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Voting and Prediction Models Analysis
No ratings yet
Voting and Prediction Models Analysis
13 pages
Uber Price Prediction Using ML Techniques
No ratings yet
Uber Price Prediction Using ML Techniques
42 pages
Lead Conversion Model Analysis
No ratings yet
Lead Conversion Model Analysis
42 pages
Project
No ratings yet
Project
4 pages
BPP Business School - Applied Modelling and Visualisation
No ratings yet
BPP Business School - Applied Modelling and Visualisation
19 pages
Updated
No ratings yet
Updated
8 pages
Step 16 Chapter4
No ratings yet
Step 16 Chapter4
64 pages
Flight Fare Prediction
No ratings yet
Flight Fare Prediction
5 pages
Exercise 10
No ratings yet
Exercise 10
4 pages
ML Lab - BCSL606
No ratings yet
ML Lab - BCSL606
67 pages
Hotel Management System Project Report
No ratings yet
Hotel Management System Project Report
22 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Airline Customer Satisfaction Analysis
No ratings yet
Airline Customer Satisfaction Analysis
12 pages
Airline Passenger Satisfaction Analysis
No ratings yet
Airline Passenger Satisfaction Analysis
23 pages
Walmart Sales Forecasting Guide
No ratings yet
Walmart Sales Forecasting Guide
37 pages
Pandas Cheatsheet DF
No ratings yet
Pandas Cheatsheet DF
1 page
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
5 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Flight Price Prediction Guide
No ratings yet
Flight Price Prediction Guide
28 pages
DCCCCCCCCCCC
No ratings yet
DCCCCCCCCCCC
41 pages
Subspace Cluster I Nig
No ratings yet
Subspace Cluster I Nig
6 pages
Predicting Airbnb Property Pricing
No ratings yet
Predicting Airbnb Property Pricing
11 pages
ML Lab Mannual1
No ratings yet
ML Lab Mannual1
37 pages
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
No ratings yet
Implementing K-Means Clustering: '/content/mall - Customers (1) .CSV'
8 pages
Supervised Learning Algorithms Analysis
No ratings yet
Supervised Learning Algorithms Analysis
22 pages
Churn V2
No ratings yet
Churn V2
15 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Airline Passenger Booking Analyze
No ratings yet
Airline Passenger Booking Analyze
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Airline Passenger Satisfact
No ratings yet
Airline Passenger Satisfact
6 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
Housing Linear
No ratings yet
Housing Linear
3 pages
Dab400 Dalvir Singh (0855812)
No ratings yet
Dab400 Dalvir Singh (0855812)
3 pages
Dinesh DWDM CCE
No ratings yet
Dinesh DWDM CCE
17 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
34 pages
ITERATORS
No ratings yet
ITERATORS
8 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
Ass 03
No ratings yet
Ass 03
3 pages
Pattern Recognition Project
No ratings yet
Pattern Recognition Project
2 pages
Exp 10
No ratings yet
Exp 10
1 page
Airbnb Pricing Model Analysis
No ratings yet
Airbnb Pricing Model Analysis
8 pages
AirBnb - Graphs and Explanation
No ratings yet
AirBnb - Graphs and Explanation
1 page
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Capstone Project - Airline Passenger Satisfaction
No ratings yet
Capstone Project - Airline Passenger Satisfaction
18 pages
Optimal Design of AS/RS Storage Systems With Three-Class-Based Assignment Strategy Under Single and Dual Command Operations
No ratings yet
Optimal Design of AS/RS Storage Systems With Three-Class-Based Assignment Strategy Under Single and Dual Command Operations
13 pages
C# Lab Manual
No ratings yet
C# Lab Manual
10 pages
Engineering Drawing
No ratings yet
Engineering Drawing
14 pages
Tos Ral Language Makabansa Math GMRC 1
No ratings yet
Tos Ral Language Makabansa Math GMRC 1
2 pages
Scientific Reasons / Short Questions: Chapter # 1 Scope of Physics
No ratings yet
Scientific Reasons / Short Questions: Chapter # 1 Scope of Physics
22 pages
Read Number
No ratings yet
Read Number
5 pages
FPGA-Based NCO Design and Implementation
No ratings yet
FPGA-Based NCO Design and Implementation
33 pages
Math g3 m6 Topic A Lesson 3
No ratings yet
Math g3 m6 Topic A Lesson 3
17 pages
Krishi Bank Math Solutions 2017
No ratings yet
Krishi Bank Math Solutions 2017
5 pages
Engineering Students' Project Certification
100% (1)
Engineering Students' Project Certification
35 pages
Chapter 1: Reviewing Number Concepts Number Examples
No ratings yet
Chapter 1: Reviewing Number Concepts Number Examples
12 pages
Basic Statistical Concepts Guide
No ratings yet
Basic Statistical Concepts Guide
17 pages
Aptitude Test Paper for Xperia Tech
No ratings yet
Aptitude Test Paper for Xperia Tech
12 pages
Miller, N. (1988) Ratios in
No ratings yet
Miller, N. (1988) Ratios in
10 pages
OOPS (Python) Laboratory Manual 2025 1-50 EXP
No ratings yet
OOPS (Python) Laboratory Manual 2025 1-50 EXP
54 pages
Adobe Scan 13 Jan 2025
No ratings yet
Adobe Scan 13 Jan 2025
5 pages
House Rent Prediction EDA Insights
No ratings yet
House Rent Prediction EDA Insights
35 pages
Simultaneous Equations: Solution of A Linear Equation
0% (1)
Simultaneous Equations: Solution of A Linear Equation
7 pages
Isc Practical Paper
No ratings yet
Isc Practical Paper
100 pages
Calculation of Mean, Median, Mode, Variance & Standard Deviation For Grouped Data
80% (10)
Calculation of Mean, Median, Mode, Variance & Standard Deviation For Grouped Data
10 pages
Ratios and Rates Practice Worksheet
No ratings yet
Ratios and Rates Practice Worksheet
2 pages
Evaluating Risk-Adjusted Portfolio Performance
No ratings yet
Evaluating Risk-Adjusted Portfolio Performance
30 pages
Understanding Map Data Algebra in GIS
No ratings yet
Understanding Map Data Algebra in GIS
11 pages
ABInitio FAQ
No ratings yet
ABInitio FAQ
20 pages
17 MCQ QSTN
No ratings yet
17 MCQ QSTN
5 pages
ôn tập gk critcal thinking
No ratings yet
ôn tập gk critcal thinking
4 pages
Chapter VIa
No ratings yet
Chapter VIa
16 pages
OR-Gate 3
No ratings yet
OR-Gate 3
6 pages
Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding
No ratings yet
Bayesian Segnet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures For Scene Understanding
11 pages

Lazy Classification

Uploaded by

Lazy Classification

Uploaded by

Импорт библиотек:

In [1]: import lazy_pipeline as lpipe

# предобработка числовых признаков

Оптимизация отображения ноутбука:

In [2]: from [Link] import display, HTML

Используемые версии библиотек:

In [3]: from platform import python_version

for col in df.select_dtypes(['number']).columns:

for col in df.select_dtypes(['object']).columns:

# категоризируем числовые признаки: разбиваем на 5 интервалов равной длины

est = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')

def get_scores(y_preds, y_preds_fixedtrain):

score_name = score_f.__name__ + '_fixedtrain'

def get_scores_info (score_vals, t_preds, t_preds_fixedtrain):

return {'accuracy_score' : [Link](score_vals['accuracy_score']),

def plot_metrics(score_vals, t_preds, t_preds_fixedtrain):

# построение графиков метрик и времени расчета

fig, axs = [Link](2, 2, figsize=(12, 8))

for ax, t in zip(axs[0],['accuracy_score', 'f1_score']):

axs[1,0].plot(range(n_train, len(X)), t_preds, label='baseline clf.')

Используется открытый датасет об удовлетворенности клиентов самолетов

Ограничим исследуемую выборку 500 объектами и обработаем данные:

In [6]: data = [Link](500, random_state = 1)

Бинаризация данных: one-hot кодирование

Представление матрицы признаков как списка множеств:

In [8]: X_bin = [set([Link][x]) for idx, x in [Link]()]

Out[8]: {'Age: 4.0',

Перевод целевой переменной в список:

In [10]: n_train = int(len(X)*0.1)

Out[10]: (50, 450)

# обновляем обучающую выборку

Predicting step by step: 100%|███████████████████████████████████████████████████████| 500/500 [00:07<00:00, 71.22it/s]

Wall time: 7.02 s

# не обновляем обучающую выборку

Predicting step by step: 100%|█████████████████████████████████████████████████████| 500/500 [00:00<00:00, 2110.59it/s]

Wall time: 241 ms

In [13]: scores = get_scores(y_preds, y_preds_fixedtrain)

plot_metrics(scores, t_preds, t_preds_fixedtrain)

In [14]: get_scores_info(scores, t_preds, t_preds_fixedtrain)

Out[14]: {'accuracy_score': 0.7562200315830624,

import lightgbm as lgb

In [17]: def predict_with_dot(x, X_train, Y_train):

X_pos = [Link]([x_train for x_train, y in zip(X_train, Y_train) if y])

return pos_dot > neg_dot

def predict_with_model(x, X_train, Y_train, use_model):

def predict_with_tree(x, X_train, Y_train):

def predict_with_boosting(x, X_train, Y_train):

def train_lpipe(X, y, n_train, predict_function, update_train):

return y_preds, t_preds

def get_results(predict_function, X, y, n_train):

results['upd_time'] = t_stop - t_start

results['fixed_time'] = t_stop - t_start

scores = get_scores(y_preds, y_preds_fixedtrain)

scores_info = get_scores_info(scores, t_preds, t_preds_fixedtrain)

In [18]: # переведем все данные в числовой формат

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1515: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to n

In [20]: # итоговая таблица для сравнения

0 predict_with_dot 0.148907 0.037977 0.747410 0.817870 0.581582 0.741756 0.000331 0.000084

1 predict_with_tree 0.727553 0.263840 0.836064 0.785045 0.803344 0.728630 0.001617 0.000584

2 predict_with_boosting 10.372624 4.307353 0.862641 0.807403 0.823205 0.762033 0.023050 0.009572

You might also like

score_name = score_f.name + '_fixedtrain'