Testing Strategies in Data Science

This document discusses testing in data science. It outlines two main types of tests: 1) tests for data analysis and 2) tests for machine learning models. For data analysis, tests validate code on previously unseen data by checking properties of the outcome rather than the values. Libraries like Hypothesis generate random test data and check that it satisfies specified properties. For machine learning, tests validate non-ML code with PyTest and use techniques like blackbox testing and metrics to indirectly test models by checking output data properties and model quality.

Uploaded by

Sajal Khandelwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views2 pages

Testing Strategies in Data Science

Uploaded by

Sajal Khandelwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Testing in Data Science

This is what you need for testing, btw.

In Data Science, two types of tests can be written, in addition to the usual
tests written using the PyTest Library:
1. For Data Analysis
2. For Machine Learning
In Data analysis, you need to test the code for previously unseen data
(basically data validation).
You do that by checking the properties of the outcome rather than the value
of the outcome. There are libraries for that:
I found 4 of them, there are obviously more:
1. En garde
2. Hypothesis
3. Feature Forge
4. Voluptuous
These libraries check for properties of output data, rather than the data itself.
In addition, NumPy and Pandas have builtin data validation libraries that you
can use for this.
For example, Hypothesis (which seems to be the most useful in our case),
create random data given some specifications and runs it through our code to
assert some properties that we want to check for. It also looks for most edge
cases on its own and provides feedback.

This blog basically confirms your doubts

An example of Hypothesis
These talks would help:
1. Testing for Properties

Testing in Data Science 1

2. Data Validation
NumPy builtin data validation
In testing ML models, there are a couple steps involved. You need to PyTest all
the non machine learning code.
Since models cannot be tested directly, there are ways to get around it.
1. Blackbox Testing for Machine Learning
2. QA for ML Models

You can still do the property checks on the output data. Feature Forge is
specifically used in ML.
Then there are the Metrics we talked about in class yesterday that are used to
check the quality of the model.
In our specific problem, we could use the Hypothesis library to get a random
Dataframe to pass through our function and check if any rows still have
correlation more than a certain number. Since the data is random but
parameters can be defined, we can get exactly the kind of test we want.
I'll write a test for this later. I'll share the code once it works.
Hope this helps.

Testing in Data Science 2

ML Model Testing Tools Guide
No ratings yet
ML Model Testing Tools Guide
24 pages
03 ML Testing
No ratings yet
03 ML Testing
51 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
Innomatics Data Science Curriculum Overview
No ratings yet
Innomatics Data Science Curriculum Overview
10 pages
Industrialreport
No ratings yet
Industrialreport
26 pages
Kaggle Competition Mastery Guide
100% (1)
Kaggle Competition Mastery Guide
74 pages
On Testing Machine Learing Programs - Braiek & Khomh
No ratings yet
On Testing Machine Learing Programs - Braiek & Khomh
15 pages
Common DS Interview Questions and Answers - 5
No ratings yet
Common DS Interview Questions and Answers - 5
4 pages
Data Science With Python-Sasmita PDF
67% (3)
Data Science With Python-Sasmita PDF
9 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
ML Da
No ratings yet
ML Da
55 pages
Data Science & ML Course Overview
No ratings yet
Data Science & ML Course Overview
40 pages
Intro to Data Science with Python
No ratings yet
Intro to Data Science with Python
24 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
Free Online Machine Learning Bootcamp
No ratings yet
Free Online Machine Learning Bootcamp
9 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
Part 1 Lectures
No ratings yet
Part 1 Lectures
100 pages
Data Science Basics with Python
100% (1)
Data Science Basics with Python
25 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Supervised Learning Research Paper Final With Images
No ratings yet
Supervised Learning Research Paper Final With Images
11 pages
Tips For Testing in Python 1646539645
No ratings yet
Tips For Testing in Python 1646539645
23 pages
Zen Data Science Syllabus
No ratings yet
Zen Data Science Syllabus
13 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Ai Syllabus
No ratings yet
Ai Syllabus
7 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Course Curriculum Batch 2025 - ML FEBRUARY
No ratings yet
Course Curriculum Batch 2025 - ML FEBRUARY
3 pages
Machine Learning With Pythone - Syllabus
No ratings yet
Machine Learning With Pythone - Syllabus
13 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
No ratings yet
AI Project Report: By: Neha Kalra (17csu122) and Prerna Pathak (17csu143)
22 pages
Testing Machine Learning Systems - Code, Data and Models - Made With ML
No ratings yet
Testing Machine Learning Systems - Code, Data and Models - Made With ML
33 pages
Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
No ratings yet
Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
2 pages
Preparing Data For Machine Learning - Pluralsight PDF
No ratings yet
Preparing Data For Machine Learning - Pluralsight PDF
74 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
72 pages
Python
100% (2)
Python
635 pages
Data Science and AI Certification Courses
No ratings yet
Data Science and AI Certification Courses
30 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Data Validation for ML Practitioners
No ratings yet
Data Validation for ML Practitioners
3 pages
Random Forest Classifier for Car Safety
No ratings yet
Random Forest Classifier for Car Safety
6 pages
Data Science - 2 Sets
No ratings yet
Data Science - 2 Sets
10 pages
Company Wise Data Science Interview Questions
100% (2)
Company Wise Data Science Interview Questions
39 pages
Notes For Data Science
No ratings yet
Notes For Data Science
6 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
DS&a + AI ML Nov 23 6868 - Calendar
No ratings yet
DS&a + AI ML Nov 23 6868 - Calendar
9 pages
Data Science and Machine Learning Course
No ratings yet
Data Science and Machine Learning Course
10 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
No ratings yet
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
47 pages
MLOps Getting From Good To Great
No ratings yet
MLOps Getting From Good To Great
41 pages
Data Science & Machine Learning 2024
No ratings yet
Data Science & Machine Learning 2024
2 pages
Data Science in FInancial Services - 3
No ratings yet
Data Science in FInancial Services - 3
76 pages
AI & Machine Learning Training Course
No ratings yet
AI & Machine Learning Training Course
6 pages
Master Data Science, Data Analytics and Machine Learning Using Python
No ratings yet
Master Data Science, Data Analytics and Machine Learning Using Python
16 pages
Article 22 Immutable Laws of Branding
No ratings yet
Article 22 Immutable Laws of Branding
5 pages
Who Says Akbar Was Great
0% (1)
Who Says Akbar Was Great
222 pages
CM4
No ratings yet
CM4
61 pages
Washington University Rare Book Collections
100% (1)
Washington University Rare Book Collections
11 pages
Who Says Akbar Was Great
0% (1)
Who Says Akbar Was Great
222 pages
Gravitational Time Dilation
No ratings yet
Gravitational Time Dilation
1 page
PDS Lala
No ratings yet
PDS Lala
4 pages
How To Find Articles
No ratings yet
How To Find Articles
27 pages
متحف
No ratings yet
متحف
2 pages
Ulster Museum Ceramics Survey
No ratings yet
Ulster Museum Ceramics Survey
2 pages
CLS Final Exam Q&a
No ratings yet
CLS Final Exam Q&a
7 pages
Multiple Choice Part 1 Script - Perfect Ielts Listening
No ratings yet
Multiple Choice Part 1 Script - Perfect Ielts Listening
3 pages
Full Download The Project Risk Maturity Model Measuring and Improving Risk Management Capability 1st Edition Hopkinson PDF
100% (9)
Full Download The Project Risk Maturity Model Measuring and Improving Risk Management Capability 1st Edition Hopkinson PDF
51 pages
Combatants in African Conflicts Professionals Praetorians Militias Insurgents and Mercenaries 1st Edition Simon David Taylor Available Any Format
No ratings yet
Combatants in African Conflicts Professionals Praetorians Militias Insurgents and Mercenaries 1st Edition Simon David Taylor Available Any Format
119 pages
Notation
No ratings yet
Notation
22 pages
Emmanuel's Dream - Laurie Ann Thompson
No ratings yet
Emmanuel's Dream - Laurie Ann Thompson
40 pages
A New Species of Riverine Crab of The Genus Sundathelphusa Bott, 1969 (Crustacea: Brachyura: Gecarcinucidae) From Northeastern Luzon, Philippines
No ratings yet
A New Species of Riverine Crab of The Genus Sundathelphusa Bott, 1969 (Crustacea: Brachyura: Gecarcinucidae) From Northeastern Luzon, Philippines
10 pages
Counting Koh Khee Meng Tay Eng Guan ebook Kindle & PDF
No ratings yet
Counting Koh Khee Meng Tay Eng Guan ebook Kindle & PDF
67 pages
Information Storage and Retrival
No ratings yet
Information Storage and Retrival
5 pages
Salon des Refusés: Birth of Modern Art
No ratings yet
Salon des Refusés: Birth of Modern Art
3 pages
11.laws and Practices Related To Librarianship
No ratings yet
11.laws and Practices Related To Librarianship
15 pages
I. Find A Mistake in The Four Underlined Parts of Each Sentence 1. 2. 3. 4 5. 6 7. 8 9. 10 11 12 13 14
No ratings yet
I. Find A Mistake in The Four Underlined Parts of Each Sentence 1. 2. 3. 4 5. 6 7. 8 9. 10 11 12 13 14
6 pages
That Sounds Fun The Joys of Being An Amateur, The Power of Falling in Love, and Why You Need A Hobby Full Book Download
100% (8)
That Sounds Fun The Joys of Being An Amateur, The Power of Falling in Love, and Why You Need A Hobby Full Book Download
17 pages
Validation of Food Preservation Processes Based On Novel Technologies Tatiana Koutchma
No ratings yet
Validation of Food Preservation Processes Based On Novel Technologies Tatiana Koutchma
51 pages
English Fil b1 WB - Unit 1
No ratings yet
English Fil b1 WB - Unit 1
9 pages
Design of Structural Elements Concrete Steelwork Masonry and Timber Designs To Eurocodes 4th Edition Chanakya Arya Full Chapters Instanly
No ratings yet
Design of Structural Elements Concrete Steelwork Masonry and Timber Designs To Eurocodes 4th Edition Chanakya Arya Full Chapters Instanly
85 pages
The Cambridge Companion To Nabokov Julian W. Connolly Instant Download
No ratings yet
The Cambridge Companion To Nabokov Julian W. Connolly Instant Download
40 pages
Georgia Land Surveying History and Law by Farris W. Cadle
100% (6)
Georgia Land Surveying History and Law by Farris W. Cadle
586 pages
Filing and Shelving PDF
No ratings yet
Filing and Shelving PDF
30 pages
Intergenerational Solidarity in Children s Literature and Film 1st Edition Justyna Deszcz-Tryhubczak ebook fully updated 2025
100% (4)
Intergenerational Solidarity in Children s Literature and Film 1st Edition Justyna Deszcz-Tryhubczak ebook fully updated 2025
151 pages
Imperial Persuaders Images of Africa and Asia in British Advertising 1st Edition Anandi Ramamurthy All Chapter Instant Download
100% (19)
Imperial Persuaders Images of Africa and Asia in British Advertising 1st Edition Anandi Ramamurthy All Chapter Instant Download
85 pages
4-Day Travel Itinerary To Turkmenistan
No ratings yet
4-Day Travel Itinerary To Turkmenistan
3 pages
Suzhou's Iconic Architecture Overview
No ratings yet
Suzhou's Iconic Architecture Overview
7 pages
Future Simple: WILL: A. Complete The Sentences With The Future Simple (Will) - Keep The Same Meaning
No ratings yet
Future Simple: WILL: A. Complete The Sentences With The Future Simple (Will) - Keep The Same Meaning
2 pages
Hardbound Submission Checklist Guide
100% (1)
Hardbound Submission Checklist Guide
1 page
Cross Sectional Atlas of The Human Head With 0.1 MM Pixel Size Color Images Full Version Download
100% (20)
Cross Sectional Atlas of The Human Head With 0.1 MM Pixel Size Color Images Full Version Download
16 pages

Testing Strategies in Data Science

Uploaded by

Testing Strategies in Data Science

Uploaded by

Testing in Data Science

This is what you need for testing, btw.

This blog basically confirms your doubts

Testing in Data Science 1

Testing in Data Science 2

You might also like