Data Refinery

Uploaded by

Linh Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views2 pages

Data Refinery

Uploaded by

Linh Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Hi, I'm Sonali Surange Dev.

Data
scientists often end up spending a lot of time doing mundane tasks like
cleansing, shaping and preparing data. Typically these tasks are roadblocks for
starting the more enjoyable part of analyzing the data sets or building and
training machine learning models. This is because data sets typically are not in a
format that can be readily used. They first need to be cleansed, refined before
they are useable by a data scientist. IBM Data Refinery addresses this issue and
simplifies the task of refining data and its workflows. It provides a self-service
data preparation environment where you can quickly analyze, cleanse and prepare
data sets. Data refinery is available with Watson Studio on public cloud, private
cloud and desktop. In the rest of the video we will walk through a scenario
and see Data Refinery in action. In this scenario we will use Data Refinery to
find the best deals using data about discounts offered
over time. We will then automate the analysis to run on a regular schedule. Before the Data
Scientist starts, she looks
at the data distribution and notices that the inSale column
is missing data. She visualizes the offer column and
notices that it contains valuable information about discounts. Many fields contain the percent
of
information, some contain references to previous
price indicating a new reduced price being available. She decides to derive sale from offer. She
uses a conditional decrease
operation to derive if the product is on sale. Next she uses a filter operation to eliminate deals
that are not on sale She then wants to pick up the bargains. She uses the
replace substring operation and provides a pattern that extracts the
discounts from the offer. After converting the discount values to a
decimal she can visually see the discounts that
were available. She needs to find the months that offered the best deals. She
visualizes the dateUpdated and notices that the date field has a variety of
formats, some with dashes some with slashes and some with months as text. She
hopes that Data Refinery can normalize the data and extract a month. She uses the
convert column operation to convert to date and selects ymd. Next she extracts
month and creates a derived column called discountMonth. The data now
represents all brands and products providing sales and the month the offer
was available. The data scientist is only interested in her preferred brands. Over
time she has built a list of preferred brands and has imported the data in her
project. Data Refinery provides relational transformations such as left, inner
right, full, semi and anti-join. To ensure that the data only contains her
preferred brand she uses a semi-join operation which narrows the brands to
match her preferences. She then selects the keys for the join
and the resulting fields. The visual results now confirms that the
brands match the preferences. To find the best possible deals
she needs to perform some aggregations. Several features determine a good deal. She is
interested in the best offer and duration when the discounts are active. Aggregating the sale
data will help understand the deals. She groups the
columns by brand and discountMonth and calculates the maximum discount. Finally
she sorts the result in descending order Data refinery is now displaying the
best deals by brand preferences and the duration which the offer is available. The last step is to
execute the analysis
on the full dataset. She starts the full analysis, which she can monitor for the
completion status. It's time to automate the analysis which
runs on a regular basis. The data in the database can grow over time. She uses a
personalized runtime to match the larger data volumes and sets a schedule for
automation. The hourly schedule reads from updated data from the database and
writes to the target table. Data Refinery has helped her uncover deals
in the raw data through a small set of operations and transformations with the
bulk of the work done for her. Thank you for watching

? Data Preprocessing
No ratings yet
? Data Preprocessing
19 pages
Data Cleaning and Transformation Techniques
No ratings yet
Data Cleaning and Transformation Techniques
13 pages
Assignment 2 Itech1103
No ratings yet
Assignment 2 Itech1103
22 pages
Data Warehouse
No ratings yet
Data Warehouse
14 pages
Data Warehouse
No ratings yet
Data Warehouse
11 pages
Application 2
No ratings yet
Application 2
4 pages
DWDM PDF
No ratings yet
DWDM PDF
21 pages
Business Intelligence in Laptop Sales
No ratings yet
Business Intelligence in Laptop Sales
57 pages
Data Preprocessing & Analysis Guide
No ratings yet
Data Preprocessing & Analysis Guide
11 pages
Plantilla de Preparación para Examen de Certificación
No ratings yet
Plantilla de Preparación para Examen de Certificación
4 pages
Data Science & Analytics Overview
No ratings yet
Data Science & Analytics Overview
21 pages
Data Transformation Techniques Overview
100% (1)
Data Transformation Techniques Overview
8 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
No ratings yet
Presented By: - Preeti Kudva (106887833) - Kinjal Khandhar (106878039)
72 pages
ADF Data Flow Cheat Sheet
No ratings yet
ADF Data Flow Cheat Sheet
9 pages
Ds Unit 2 Notes
No ratings yet
Ds Unit 2 Notes
26 pages
Azure Data Engineering in Oil & Gas
No ratings yet
Azure Data Engineering in Oil & Gas
3 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
108 pages
Comptia Data+ Da0-001
No ratings yet
Comptia Data+ Da0-001
10 pages
Shortnjn
No ratings yet
Shortnjn
12 pages
Data Transformation
100% (2)
Data Transformation
26 pages
Data Proprocesing
No ratings yet
Data Proprocesing
18 pages
2 Data Prep
No ratings yet
2 Data Prep
95 pages
SOMA R PACKAGE Documentation
No ratings yet
SOMA R PACKAGE Documentation
4 pages
BI Unit 4 Final
No ratings yet
BI Unit 4 Final
2 pages
Build Predictive Models with SPSS
No ratings yet
Build Predictive Models with SPSS
24 pages
Data Preprocessing
No ratings yet
Data Preprocessing
5 pages
Data Science Notes
No ratings yet
Data Science Notes
59 pages
DWDM Unit I
No ratings yet
DWDM Unit I
14 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
Bi Unit 4
No ratings yet
Bi Unit 4
19 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
38 pages
Business Analysis & Data Solutions
No ratings yet
Business Analysis & Data Solutions
4 pages
Adob
No ratings yet
Adob
16 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Rithika
No ratings yet
Rithika
16 pages
Data Handling and Visualization 3rd Unit
No ratings yet
Data Handling and Visualization 3rd Unit
4 pages
Data Preprocessing Techniques Guide
No ratings yet
Data Preprocessing Techniques Guide
32 pages
Document
No ratings yet
Document
29 pages
Senior BI Developer Profile and Expertise
No ratings yet
Senior BI Developer Profile and Expertise
4 pages
DM Unit2
No ratings yet
DM Unit2
9 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
33 pages
Intro To Data Analytics - Cleanup & Transformation
No ratings yet
Intro To Data Analytics - Cleanup & Transformation
30 pages
Data Mining
No ratings yet
Data Mining
9 pages
B DWM Lab Manual Zil
No ratings yet
B DWM Lab Manual Zil
114 pages
8915 Bi Patil Aniket Shankar
No ratings yet
8915 Bi Patil Aniket Shankar
74 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
14 pages
Lesson 7 Data Description and Diagnostics
No ratings yet
Lesson 7 Data Description and Diagnostics
14 pages
Data Cleaning Using Pandas
No ratings yet
Data Cleaning Using Pandas
9 pages
Ravi Samal Resume
No ratings yet
Ravi Samal Resume
2 pages
Domain 2
No ratings yet
Domain 2
3 pages
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
No ratings yet
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
30 pages
What Is Data Cleaning
No ratings yet
What Is Data Cleaning
8 pages
Data Preprocessing, Data Warehousing
No ratings yet
Data Preprocessing, Data Warehousing
9 pages
Case Study-1 Data Quality
No ratings yet
Case Study-1 Data Quality
4 pages
Application
No ratings yet
Application
4 pages
Data Warehouse
No ratings yet
Data Warehouse
10 pages
Contoh Katalog
No ratings yet
Contoh Katalog
1 page
טכניקות לגירוש שדים + אנגלית
100% (1)
טכניקות לגירוש שדים + אנגלית
24 pages
The Gayatri Mantra - Essence of The Vedas
100% (2)
The Gayatri Mantra - Essence of The Vedas
2 pages
Glo Sikaseal 170 All Purpose
No ratings yet
Glo Sikaseal 170 All Purpose
2 pages
Philippine Business Finance Overview
No ratings yet
Philippine Business Finance Overview
4 pages
AA Similarity in Triangles Explained
No ratings yet
AA Similarity in Triangles Explained
11 pages
JFET Characteristics and Analysis
No ratings yet
JFET Characteristics and Analysis
4 pages
HPT - Rutong Model ZQ203-100 - Parts List-1
No ratings yet
HPT - Rutong Model ZQ203-100 - Parts List-1
27 pages
(Ebook) The Bible Knowledge Commentary: An Exposition of the Scriptures by Dallas Seminary Faculty [New Testament Edition] by John F. Walvoord, Roy B. Zuck ISBN 9780882078120, 0882078127 full digital chapters
0% (1)
(Ebook) The Bible Knowledge Commentary: An Exposition of the Scriptures by Dallas Seminary Faculty [New Testament Edition] by John F. Walvoord, Roy B. Zuck ISBN 9780882078120, 0882078127 full digital chapters
90 pages
609c0fa7ff9c315f83069083 All UPSTOX SIGNED
No ratings yet
609c0fa7ff9c315f83069083 All UPSTOX SIGNED
51 pages
Car South Africa - January 2024
No ratings yet
Car South Africa - January 2024
156 pages
RMPR Project Work Completed
100% (1)
RMPR Project Work Completed
66 pages
Brochure General PDF
No ratings yet
Brochure General PDF
10 pages
Floral Induction in Plant Development
No ratings yet
Floral Induction in Plant Development
26 pages
Flutter for Cross-Platform Devs
No ratings yet
Flutter for Cross-Platform Devs
2 pages
Castlegar/Slocan Valley Pennywise May 8, 2018
No ratings yet
Castlegar/Slocan Valley Pennywise May 8, 2018
40 pages
Aditya Angha Noc Vikas Purushottam Jichkar
No ratings yet
Aditya Angha Noc Vikas Purushottam Jichkar
3 pages
Smell Tech PPT - 115345
No ratings yet
Smell Tech PPT - 115345
16 pages
Redeeming The Time: A Christian Approach To Work and Leisure
No ratings yet
Redeeming The Time: A Christian Approach To Work and Leisure
31 pages
Seabiscuit Golf LSV Price List (2025-01-24 18 - 52 - 47)
No ratings yet
Seabiscuit Golf LSV Price List (2025-01-24 18 - 52 - 47)
1 page
2022 08 16 EMS Service Rates
No ratings yet
2022 08 16 EMS Service Rates
7 pages
Chapter 2 Theory of Evolution
No ratings yet
Chapter 2 Theory of Evolution
14 pages
Validation
No ratings yet
Validation
6 pages
East Asian Painting Techniques and History
100% (2)
East Asian Painting Techniques and History
4 pages
KORG MS-20 Controller Owner's Manual
No ratings yet
KORG MS-20 Controller Owner's Manual
16 pages
5G - Mixed Tenses
No ratings yet
5G - Mixed Tenses
2 pages
Grade 11 English Exam
No ratings yet
Grade 11 English Exam
4 pages
Direct and Indirect Speech Quiz
No ratings yet
Direct and Indirect Speech Quiz
3 pages
Andromeda Council: Elenin, Tekoma, Nibiru Explained
No ratings yet
Andromeda Council: Elenin, Tekoma, Nibiru Explained
4 pages
Sodium Sulfite Oxidation Kinetics
No ratings yet
Sodium Sulfite Oxidation Kinetics
9 pages

Data Refinery

Uploaded by

Data Refinery

Uploaded by

Hi, I'm Sonali Surange Dev.

You might also like