0% found this document useful (0 votes)

78 views86 pages

FSDL Berkeley Lecture8 Data Management

The document provides an overview of data management in deep learning, emphasizing the importance of data sources, storage solutions, and processing techniques. Key points include the necessity of exploring data extensively, the benefits of data augmentation, and the use of various storage options like databases, data lakes, and object storage. It also discusses the role of frameworks and tools for managing data workflows and processing tasks efficiently.

Uploaded by

Syafri Arlis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views86 pages

FSDL Berkeley Lecture8 Data Management

Uploaded by

Syafri Arlis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Management

Full Stack Deep Learning - UC Berkeley Spring 2021 - Sergey Karayev, Josh Tobin, Pieter Abbeel
https://veekaybee.github.io/2019/02/13/data-science-is-diﬀerent/

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021 2
Data Sources Training

Images

Text Corpus Local Filesystem

Different for every project / company!
+
Logs
GPU
DB records

Full Stack Deep Learning - UC Berkeley Spring 2021

Data Sources Training

Images

Full Stack Deep Learning - UC Berkeley Spring 2021

Data Sources Training

Text Corpus
+

Full Stack Deep Learning - UC Berkeley Spring 2021

Data Sources Training

+
Logs

DB records

Full Stack Deep Learning - UC Berkeley Spring 2021

Countless possibilities

Full Stack Deep Learning - UC Berkeley Spring 2021 7

Key Points

Let the data flow through you

• Spend 10x as much time exploring the data as you would like to

• Adding/augmenting data is the best way to improve performance

• KISS

Full Stack Deep Learning - UC Berkeley Spring 2021 8

“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Sources

• Most DL applications require lots of proprietary data

• Exceptions: RL, GANs, GPT-3

• Publicly available datasets = No competitive advantage

• But can serve as starting point

Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 11
Usually: spend $$$ and time to label own data

https://cdn-sv1.deepsense.ai/wp-content/uploads/2017/04/sample_image_from_the_training_set.jpg
Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 12
Data flywheel
Enables rapid improvement with user labels

Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 13
Semi-supervised learning
14

Use parts of data to label other parts

Very important idea!

Fig. 1. A great summary of how self-supervised learning tasks can be constructed (Image source: LeCun’s talk)

Fig. 4. Illustration of self-supervised learning by predicting the relative position of two random patches. (Image
source: Doersch et al., 2015)

https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence
https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html

Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021
Semi-supervised learning
15

• Trained on 1B random images

• Achieved SOTA accuracy on ImageNet top-1 prediction

• Open-source library

https://ai.facebook.com/blog/seer-the-start-of-a-more-powerful-flexible-and-accessible-era-for-computer-vision
Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021
Image data augmentation

• Must do for training vision models

• Frameworks (e.g. torchvision) provide
functions that do this
• Done in parallel to GPU training on the CPU

https://towardsdatascience.com/1000x-faster-data-augmentation-b91bafee896c
Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 16
Other data augmentation

• Tabular
• Delete some cells to simulate missing
data
• Text
• No well established techniques, but
replace words with synonyms, change
order of things.
• Speech/video
• Change speed, inserts pauses, etc

https://github.com/makcedward/nlpaug
Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 17
Synthetic data
Underrated idea that is often
worth starting with

https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/

Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 18
This can get pretty deep!

Andrew Moﬀat - https://github.com/amoﬀat/metabrite-receipt-tests

Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 19
Especially for driving and robotics

https://microsoft.github.io/AirSim/ https://openai.com/blog/ingredients-for-robotics-research/
Data Management - sources Full Stack Deep Learning - UC Berkeley Spring 2021 20
Questions?

Full Stack Deep Learning - UC Berkeley Spring 2021 21

Data Storage
1. Building blocks

- Filesystem

- Object Storage

- Database

- Data Lake / Data Warehouse

2. What goes where

3. Where to learn more

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 22
Filesystem
• Foundational layer of storage.

• Fundamental unit is a "file", which can be text or binary, is not versioned,

and is easily overwritten.

• Can be as simple as a locally mounted disk containing all the files you need.

• Can be networked (e.g. NFS): accessible over network by multiple machines.

• Can be distributed (e.g. HDFS): stored and accessed over multiple machines

• Fastest option

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 23
Hard Drive Speeds

https://www.pcworld.com/article/2899351/everything-you-need-to-know-about-nvme.html

Full Stack Deep Learning - UC Berkeley Spring 2021 24

Local Data Format
• Binary data: just files

• TFRecord batches files -- doesn't seem necessary with NVMe drives

• For large tabular / text data, have choices:

• HDF5 is powerful, but bloated and declining

• Parquet is widespread and recommended

• Feather is powered by Apache Arrow, up-and-coming

• Try to use native Tensorflow and PyTorch dataset classes

Full Stack Deep Learning - UC Berkeley Spring 2021 25

Object Storage
• An API over the filesystem. GET, PUT, DELETE files to a service, without
worrying where they are actually stored.

• Fundamental unit is an "object". Usually binary: image, sound file, etc.

• Versioning, redundancy can be built into the service.

• Not as fast as local, but fast enough within the cloud

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 26
Database
• Persistent, fast, scalable storage and retrieval of structured data that will be accessed repeatedly.

• AKA Online Transaction Processing (OLTP)

• Mental model: everything is actually in RAM, but software ensures that everything is logged to
disk and never lost.

• Not for binary data! Store references instead.

• Postgres is the right choice most of the time. Supports unstructured JSON.

• SQLite is perfectly good for small projects.

• "NoSQL" was a big craze in 2010's. Mostly avoid.

• Redis is very useful when you need a simple key-value store.

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 27
Data Warehouse
• Structured aggregation of data for analysis

• AKA Online Analytical Processing (OLAP)

• Another acronym: ETL

https://addepto.com/implement-data-warehouse-business-intelligence/
Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 28
SQL and DataFrames
• Most data solutions use
SQL. Some, like Databricks,
use DataFrames.

• SQL is the standard interface

for structured data.

• Pandas is the main

DataFrame in the Python
ecosystem.

• Our advice: become fluent in

both
https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html
Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 29
Data Lake
• Unstructured aggregation of data from multiple sources, e.g. databases,
logs, expensive data transformations.

• ELT: dump everything in, then transform for specific needs later.

https://medium.com/data-ops/throw-your-data-in-a-lake-32cd21b6de02

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 30
Trend: Lake House

Full Stack Deep Learning - UC Berkeley Spring 2021 31

For now

• Binary data (images, sound files, compressed texts) is stored as

objects.

• Metadata (labels, user activity) is stored in database.

• If need features which are not obtainable from database (e.g logs), set
up data lake and a process to aggregate needed data.

• At training time, copy the data that is needed to a filesystem on a fast

drive.

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 32
There's a lot more to the story

https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/

Full Stack Deep Learning - UC Berkeley Spring 2021 33

Blueprint for AI and ML

https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/

Full Stack Deep Learning - UC Berkeley Spring 2021 34

If you're truly interested

https://dataintensive.net

Data Management - storage Full Stack Deep Learning - UC Berkeley Spring 2021 35
Questions?

Full Stack Deep Learning - UC Berkeley Spring 2021 36

“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Motivational Example
•We have to train a photo popularity predictor every night.

• For each photo, training data must include:

• Metadata such as posting time, title, location

• Some features of the user, such as how many times they

logged in today.

• Outputs of photo classifiers (content, style)

Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 38
Task Dependencies
• Some tasks can't be
started until other tasks
are finished.

• Finishing a task should

"kick oﬀ" its dependencies

Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 39
Desiderata

• Re-computation should depend on content

• Dependencies are not files, but programs and databases

• Work needs to be spread over many machines

• Many dependency graphs are executing all at once

Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 40
Hadoop/Spark
• Map/Reduce
implementations

• Running data processing

operations and simple
ML on commodity
hardware, with tricks to
speed things up

https://data-flair.training/blogs/spark-vs-hadoop-mapreduce/
Full Stack Deep Learning - UC Berkeley Spring 2021 41
Airflow

https://www.slideshare.net/PyData/how-i-learned-to-time-travel-or-data-pipelining-and-scheduling-with-airflow-67650418

Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 42
Distributing work
• The workflow manager has a queue for the tasks, and manages workers
that pull from it, restarting jobs if they fail.

http://site.clairvoyantsoft.com/making-apache-airflow-highly-available/

Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 43
Tensorflow Datasets + Apache Beam

For example,
the 7TB
Colossal
Clean Corpus

https://www.tensorflow.org/datasets/beam_datasets
Full Stack Deep Learning - UC Berkeley Spring 2021 44
Prefect

Full Stack Deep Learning - UC Berkeley Spring 2021 45

dbt

Full Stack Deep Learning - UC Berkeley Spring 2021 46

Dagster

Full Stack Deep Learning - UC Berkeley Spring 2021 47

“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Feature Store

https://eng.uber.com/michelangelo-machine-learning-platform/
Full Stack Deep Learning - UC Berkeley Spring 2021 49
50

https://www.tecton.ai

Full Stack Deep Learning - UC Berkeley Spring 2021

Try to keep things simple

• Don't overengineer

• For example, UNIX has powerful

parallelism, streaming, highly
optimized tools

https://adamdrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
Data Management - processing Full Stack Deep Learning - UC Berkeley Spring 2021 52
Questions?

Full Stack Deep Learning - UC Berkeley Spring 2021 53

“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Pandas

• The workhorse of Python data science

• Definitely do a few projects using it if you haven't used it before

https://projectcodeed.blogspot.com/2019/08/setting-up-jupyter-notebooks-for-data.html

Full Stack Deep Learning - UC Berkeley Spring 2021 55

Dask

Full Stack Deep Learning - UC Berkeley Spring 2021 56

Full Stack Deep Learning - UC Berkeley Spring 2021 57
“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Data Labeling

1. User Interfaces

2. Sources of labor

3. Service companies

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 59
Standard set of features:

- bounding boxes,
segmentations,
keypoints, cuboids

- set of applicable
classes

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 60
Training the annotators is crucial

Quality assurance is key

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 61
Sources of Labor
• Hire own annotators, promote best ones to quality control
• Pros: secure, fast (once hired), less QC needed
• Cons: expensive, slow to scale, admin overhead
• ...or, crowdsource (Mechanical Turk)
• Pros: cheaper, more scalable
• Cons: not secure, significant QC eﬀort required
• ...or, full-service data labeling companies
Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 62
Service Companies

• Data labeling requires separate software stack, temporary labor, and

quality assurance. Makes sense to outsource.

• Dedicate several days to selecting the best one for you:

• Label gold standard data yourself

• Sales calls with several contenders, ask for work sample on same data

• Ensure agreement with your gold standard, and evaluate on value

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 63
FigureEight is the original AI data labeling company

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 64
Scale.ai is a dominant up-and-comer

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 65
And there are a ton of others

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 66
And there are a ton of others

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 67
Software

• Full-service data labeling is always pricy

• But some companies oﬀer their software without labor

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 68
Label Studio
• Open-source edition to run yourself
• Enterprise edition for managed hosting
• Using in lab!

Full Stack Deep Learning - UC Berkeley Spring 2021 69

Prodigy

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 70
Aquarium

https://www.aquariumlearning.com

Full Stack Deep Learning - UC Berkeley Spring 2021 71

Weak supervision

• Snorkel

• Open-source
project snorkel.org

• Commercial
platform snorkel.ai

Full Stack Deep Learning - UC Berkeley Spring 2021 72

• Conclusions

• outsource to full-service company if you can aﬀord it

• if not, then at least use existing software

• hiring part-time makes more sense than trying to make crowdsourcing

work

Data Management - labeling Full Stack Deep Learning - UC Berkeley Spring 2021 73
Questions?

Full Stack Deep Learning - UC Berkeley Spring 2021 74

“All-in-one”

Hyperparameter Tuning Feature

Store Monitoring
Versioning Labeling

Frameworks &
Distributed Training Experiment Management

Edge Web
Processing Exploration

Resource Management Software Engineering

Data Lake / Warehouse CI / Testing

or or
Sources Compute

Data Training/Evaluation Deployment

Data Management - overview Full Stack Deep Learning - UC Berkeley Spring 2021
Data Versioning

Level 0: unversioned

Level 1: versioned via snapshot at training time

Level 2: versioned as a mix of assets and code

Level 3: specialized data versioning solution

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 76
Level 0

• Data lives are on filesystem/S3 and database

• Problem: Deployments must be versioned. Deployed machine learning

models are part code, part data. If data is not versioned, deployed
models are not versioned.

• Problem you will face: inability to get back to a previous level of

performance

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 77
Level 1

• Data is versioned by storing a snapshot of everything at training time

• This allows you to version deployed models, and to get back to past
performance, but is super hacky.

• Would be far better to be able to version data just as easily as code.

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 78
Level 2

• Data is versioned as a mix of assets and code.

• Heavy files stored in S3, with unique ids. Training data is stored as JSON or
similar, referring to these ids and include relevant metadata (labels, user activity,
etc).

• JSON files can get big, but using git-lfs lets us store them just as easily as code

• Can improve further with "lazydata": only syncing files that are needed.

• The git signature + of the raw data file defines the version of the dataset

• Often helpful to add timestamp

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 79
Level 3

• Specialized solutions for versioning data.

• Avoid these until you can fully explain how they will improve your project.

• Leading solutions are DVC, Pachyderm, Quill.

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 80
Data Versioning Solutions

https://dagshub.com/blog/data-version-control-tools/

Full Stack Deep Learning - UC Berkeley Spring 2021 81

DVC
1

4
2 3

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 82
Dolt
A nice simple solution for
versioning databases,
that speaks SQL.

Data Management - versioning Full Stack Deep Learning - UC Berkeley Spring 2021 83
Questions?

Full Stack Deep Learning - UC Berkeley Spring 2021 84

Privacy
• Federated Learning: training a global
model from data on local devices,
without ever having access to the
data

• Diﬀerential privacy: aggregating data

such that individual points cannot be
identified

• Another topic: Learning on encrypted

data

• Let us know about the best resources!

https://blog.ml.cmu.edu/2019/11/12/federated-learning-challenges-methods-and-future-directions/
https://blogs.nvidia.com/blog/2019/10/13/what-is-federated-learning/

Full Stack Deep Learning - UC Berkeley Spring 2021 85

Thank you!

Full Stack Deep Learning - UC Berkeley Spring 2021 86

Adaptation To AI: Platforms For ML, AI and Data Science Best Practices
No ratings yet
Adaptation To AI: Platforms For ML, AI and Data Science Best Practices
7 pages
FSDL 2022 Lecture4 Data Management
No ratings yet
FSDL 2022 Lecture4 Data Management
83 pages
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
No ratings yet
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
9 pages
Ai Roadmap
No ratings yet
Ai Roadmap
15 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
Data Engineer Generative Ai
No ratings yet
Data Engineer Generative Ai
17 pages
ML Roadmap
No ratings yet
ML Roadmap
11 pages
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
No ratings yet
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
51 pages
Ai ML Roadmap
No ratings yet
Ai ML Roadmap
7 pages
Data Science & ML Full Stack Guide
No ratings yet
Data Science & ML Full Stack Guide
9 pages
Data Engineering Nanodegree Program Syllabus
33% (3)
Data Engineering Nanodegree Program Syllabus
15 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
The Rise of AI Data Infrastructure
No ratings yet
The Rise of AI Data Infrastructure
14 pages
DesignSafe Bootcamp V1
No ratings yet
DesignSafe Bootcamp V1
129 pages
Project Selection
No ratings yet
Project Selection
5 pages
AI & ML Course Plan for Coders
No ratings yet
AI & ML Course Plan for Coders
18 pages
01 Course Logistics
No ratings yet
01 Course Logistics
12 pages
6 Open Source Data Science Projects Interviewer
No ratings yet
6 Open Source Data Science Projects Interviewer
7 pages
Data Science Student Schedule
No ratings yet
Data Science Student Schedule
7 pages
Data Science Roadmap: Mathematics and Statistics
No ratings yet
Data Science Roadmap: Mathematics and Statistics
5 pages
Ai For IT Non Coders
No ratings yet
Ai For IT Non Coders
14 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
5 pages
Architecture of AI Systems - Engineering For Big Data and AI (Grokking)
No ratings yet
Architecture of AI Systems - Engineering For Big Data and AI (Grokking)
60 pages
? Complete Roadmap To Become A Professional Data Scientist
No ratings yet
? Complete Roadmap To Become A Professional Data Scientist
5 pages
A Complete ML Engineer RoadMap
No ratings yet
A Complete ML Engineer RoadMap
5 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
Doubt
No ratings yet
Doubt
9 pages
GE 461: Data Science Overview
No ratings yet
GE 461: Data Science Overview
39 pages
Data Management For Machine Learning
No ratings yet
Data Management For Machine Learning
7 pages
Professional Machine Learning Engineer-Part1
No ratings yet
Professional Machine Learning Engineer-Part1
250 pages
1-Pre Requisite For Data Scientist-03!01!2025
No ratings yet
1-Pre Requisite For Data Scientist-03!01!2025
26 pages
ML Environment Setup Guide
No ratings yet
ML Environment Setup Guide
8 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
12 pages
AI ML DS Complete Roadmap Detailed
No ratings yet
AI ML DS Complete Roadmap Detailed
5 pages
Data Acquisition
No ratings yet
Data Acquisition
19 pages
Data Science Roadmap for Beginners
No ratings yet
Data Science Roadmap for Beginners
4 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
Week 1 Slides
No ratings yet
Week 1 Slides
16 pages
Data Engineering Course Overview
No ratings yet
Data Engineering Course Overview
33 pages
5-Day KVCET Bootcamp - Data Analytics
No ratings yet
5-Day KVCET Bootcamp - Data Analytics
6 pages
BE-AIML-7th Sem
No ratings yet
BE-AIML-7th Sem
34 pages
Data Science Roadmap
No ratings yet
Data Science Roadmap
4 pages
Big Data With Artificial Intelligence and Cloud
No ratings yet
Big Data With Artificial Intelligence and Cloud
7 pages
Research in Data Science
No ratings yet
Research in Data Science
22 pages
Data Science Career
No ratings yet
Data Science Career
6 pages
Data Science AI ML Roadmap
No ratings yet
Data Science AI ML Roadmap
7 pages
AL ML 3 Months Roadmap - 250530 - 164720
No ratings yet
AL ML 3 Months Roadmap - 250530 - 164720
6 pages
Full Stack Data Science Guide 2023
No ratings yet
Full Stack Data Science Guide 2023
17 pages
Roadmap
No ratings yet
Roadmap
7 pages
ML AI Roadmap Guide To Epert
No ratings yet
ML AI Roadmap Guide To Epert
6 pages
Dic PLB L1
No ratings yet
Dic PLB L1
64 pages
Data Science C
No ratings yet
Data Science C
21 pages
Berkeley Data Analytics Stack BDAS Overview Ion Stoica Strata 2013
No ratings yet
Berkeley Data Analytics Stack BDAS Overview Ion Stoica Strata 2013
28 pages
Berkeley Data Analytics Stack Overview
No ratings yet
Berkeley Data Analytics Stack Overview
28 pages
Week 5 Internship Report: Data Visualization & PyTorch
No ratings yet
Week 5 Internship Report: Data Visualization & PyTorch
4 pages
DE in AI
No ratings yet
DE in AI
14 pages
BIG DATA Class 1 1741496163
No ratings yet
BIG DATA Class 1 1741496163
108 pages
DLMBMMIIT01 Session5
No ratings yet
DLMBMMIIT01 Session5
25 pages
No ratings yet
27 pages
DF-L08-Working With Windows and CLI Systems
No ratings yet
DF-L08-Working With Windows and CLI Systems
78 pages
Made A Companion Cube NAS
No ratings yet
Made A Companion Cube NAS
14 pages
DNVGL ST 0373
No ratings yet
DNVGL ST 0373
25 pages
CT Security Patch Updates
No ratings yet
CT Security Patch Updates
8 pages
Notes of Advance Java
No ratings yet
Notes of Advance Java
65 pages
Xpon Ont Flash 2k05x Dual Band US - Spec Sheet
No ratings yet
Xpon Ont Flash 2k05x Dual Band US - Spec Sheet
4 pages
Fundamentals of Internet Project
No ratings yet
Fundamentals of Internet Project
6 pages
A Software Engineering Mini Project On Online Trading System
100% (1)
A Software Engineering Mini Project On Online Trading System
29 pages
Quiz 6
No ratings yet
Quiz 6
2 pages
Project Report New
No ratings yet
Project Report New
4 pages
04 - AWS Basics - Auto Scaling - 20130517
No ratings yet
04 - AWS Basics - Auto Scaling - 20130517
32 pages
IT112: Computer Systems Lab (End Sem Exam Questions Set - April 2022)
No ratings yet
IT112: Computer Systems Lab (End Sem Exam Questions Set - April 2022)
2 pages
Quantum Pathfinding Drones
No ratings yet
Quantum Pathfinding Drones
10 pages
ACOS Virtual Chassis Systems
No ratings yet
ACOS Virtual Chassis Systems
61 pages
Oracle AIM Methodology: An Overview
No ratings yet
Oracle AIM Methodology: An Overview
33 pages
NguyenDangTai Lab8 215051972
No ratings yet
NguyenDangTai Lab8 215051972
2 pages
Web App Test Cases: Examples & Guide
No ratings yet
Web App Test Cases: Examples & Guide
4 pages
102 Algorithm Specification
No ratings yet
102 Algorithm Specification
36 pages
Certificate Under 63 (4) of BHARATIYA SAKSHYA ADHINIYAM
71% (7)
Certificate Under 63 (4) of BHARATIYA SAKSHYA ADHINIYAM
2 pages
SLR1 (AS & A) - Structure and Function of The Processor
No ratings yet
SLR1 (AS & A) - Structure and Function of The Processor
2 pages
HY Exam Revision (11/9/2024)
No ratings yet
HY Exam Revision (11/9/2024)
15 pages
?????? ???????????!
No ratings yet
?????? ???????????!
129 pages
ICT612 Assessment3 Complete Report
No ratings yet
ICT612 Assessment3 Complete Report
16 pages
C Fundamentals: Week: 1 Duration: 60 Mins
No ratings yet
C Fundamentals: Week: 1 Duration: 60 Mins
8 pages
AZ900 Questions#3
No ratings yet
AZ900 Questions#3
64 pages
Lecture-1 (Intro To Microprocessors)
No ratings yet
Lecture-1 (Intro To Microprocessors)
21 pages
Funix Data Science Graduate Profile
No ratings yet
Funix Data Science Graduate Profile
2 pages
Poweredge t360 Technical Guide
No ratings yet
Poweredge t360 Technical Guide
60 pages
Fundamentals of Operating Systems
No ratings yet
Fundamentals of Operating Systems
26 pages
PQube Classic Configurator App Note
No ratings yet
PQube Classic Configurator App Note
3 pages