0% found this document useful (0 votes)

19 views6 pages

Notesv 1

The document outlines various tools and APIs for machine learning and data management, including the Sequential and Functional Model APIs, Auto SXS for A/B testing, and Vertex AI for model creation and monitoring. It also discusses the importance of Responsible AI principles, data preparation with Dataprep, and features of BigQuery and Dataflow for data processing. Additionally, it covers model evaluation metrics, handling imbalanced datasets, and techniques for ensuring model explainability and compliance with regulations.

Uploaded by

raneli9600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

Notesv 1

Uploaded by

raneli9600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

APIS

sequential model api

- allows to create models layer by layer for most problems

-functional model api

- alternative way of creating models with more complexity

Auto SXS
- evaluation tool facilitates A/b testing for LLMs
- core is Autorater

model garden
- quick and easy way to find and apply right model
- foundation models (pretrained multitask models that can be finetuned using vertex
ai)
- task specific (pretrained to solve specific problems)
- finetunable models (open source that can be finetuned using custom notebook or
pipeline

FeatureStore (feature repository

- fully managed solution
- batch and streaming feature ingestion
- share and reuse ml features across use cases
- serve ml features at scale with low latency (offload operational overhead of
handling infra)
- It manages and scales the underlying infrastructure for you such as storage and
compute resources.
- alleviate training-serving skew

Data Catalog
- catalogs native metadata on data assets

Dataprep
- can handle unstructured/structured datasets
- built on top of dataflow
= autoscalable
- flows is sequence of recipes
- recipes are preprocessing steps from library called Wranglers
- combine the flow and their recipes to create your Dataflow pipeline
Dataplex
- enables organizations to centrally manage, monitor and govern their data across
data lakes,
data warehouses and data marts with consistent controls, thus providing access to
trusted data, empowering analytics at scale.

BigQuery
- recommended NoSQL store
- serverless that support sql syntax
- storage at scale and reduce latency
- efficient processing and seamless integration with other google cloud services
- ideal for analytics (visualisation?)
- needs to be at end of pipeline to storea and analyse results (?)
- zscale normalization easy
- minimise computational overhead
- import TF models
Good for analytics and dashboards

- no hyperparameter tuning
- no end to end, needs other tools like vertex ai platform (to serve models)

Vertex AI
- model creation
- provides flexibility, scalability
- training and development
- provides lower infrastructure overhead
- dont need refactor code too much
- train tensorflow estimator code
- distributed training (auto handles training jobs distribution across many
machines)
- automatic scaling of resources, save costs compared to VM
- helps solve
1.

Vertex AI custom containers

- use ML frameworks, non ml dependencies that are not supported on vertex

Vertex AI pipelines
- run modular containerised AI pipeline steps

Vertex AI model monitoring

- fully managed for monitoring at minimal maintenance

Vertex AI Tensorboard
- compact and complete overview of training metrics over time

Dataflow
- data transform and processing
- Unified stream and batch data processing that's serverless, fast, and cost-
effective
- evaluating model on large dataset
- uses apache beam

tabular workflow
- sequential attention
- integrated, managed, scalable pipelines
- end to end ML with tabular data for regression and classification

automl
- tables do not require code
- handles training, validation, test splits auto when specify time column
automl nlp
- have to custom built models for nlp
automl tables
- automates building of ML models from tabular data

cloud filestore
- faster than cloud storage for accessing

Kubeflow
- form end to end architecture
pipelines sdk
- best practice to orchestrate ai pipelines with modular steps

cloud composer
- not cost efficient for one pipeline because env always active
- fully managed workflow orchestration
- used to automate machine learning workflows
- lacks flexibility and scalability as Kubeflow pipelines

cloud vision api

- confidently detects large objects within image
natural language api
- do sentiment analysis
- nlp api gives you sentiment analysis out of the box
- automl vs nlp api
- automl nlp requires custom training

automl nlp
- works well with small datasets
- uses transfer learning

cloud data fusion

- fully managed
- cloud native data integration service
- codeless interface

cloud function
- not good for computationally expensive/heavy data workflows

dataprep
- data preparation like cleaning, new column creation

cloud storage
- managed service for storing unstructured data (binary large objects)
- secured with data encryption

preemptive VM
- purchased VM for a steep discount

Responsible AI
7 Principles and 4 Area to not pursue
- Be socially beneficial
- Avoid creating or reinforcing unfair bias
- Be built and tested for safety
- Be accountable to people
- Incorporate privacy design principles
- Uphold high standards of scientific excellence
- Be made available for uses that follow these principles

4 Area
- likely to cause harm
- main purpose is to cause injury
- tech that gather or use information for surveilliance
- purpose contradicts widely accepted laws and human rights

SHAP lib
- con is computational cost on large feature sets such as images

Learning and Intepretability tool (LIT)

- mainly NLP but preliminary support for image and tabular

TCAV
TCAV focuses on associating predictions with broader human-understandable concepts

XRAI is a technique designed to highlight the specific pixels or regions of an

image that are most influential in the model's decision
ACE is about extracting high-level concepts from the model's decision-making
process

Vertex AI Explainable AI used for model understanding

crossentropy
- sparse categorical crossentropy
- use when classes are mutually exclusive (each sample belongs exclusively to
one class)
- require labels to be integer encoded in a single vector
- categorical crossentropy
- use when one sample can have multiple classes or labels are soft
probabilities
- 1,0

precision
- increasing threshold ->
1. increased precision
2. reduces number of false positives (predict car but no car)
3. might reduce recall (ability to detect all cars)

recommendations
- use frequently bought together to increase revenue while following best
practices

- overfitting
indication: very high auc roc on training
1. dropout param and l2 regularisation helps
2. increase size of network (neurons) makes more complex, no help

situations
- model trained long before, accuracy of model decreased
why?
lack of model training as market changes
- stream files which may contain PII, using Cloud Data Loss Prevention API
- make 3 buckets quarantine, sensitive, nonsensitive. write all data to quarantine,
and do periodic scans using api and move data to either of bucket.

datasets
- [Link] for input data in memory
- tfrecord (most efficient format for tensorflow) for input data in (a file) / non-
memory storage

[Link] is hybrid of Apache Beam on Dataflow and TensorFlow

- preprocessing function is a logical description of a tra

regulated insurance company

- build model that accepts or reject insurance applications
factors for build?
1. traceability (maintaining records of data for regulatory)
2. reproducibility (vital for validating reliability)
3. explainability (model decisions can be easily explained)

TPU reduce bottleneck and speed up training

- interleave for reading data ( helps to parallelize data reading)
- set prefetch option equal to training batch (preload the data)

model skews 6 months later due to change in input data distribution, how to address
- create alerts to monitor for skew, retrain model

time series predictions

- always split by time
- randomly split will artificially increase accuracy ( cant borrow info from future
to predict future)

aggregated data sent at the end of each day

- batch prediction

cnn vs rnn
cnn
- used for com vision
rnn
- time series predictions

tips
sigmoid activation- binary classification
SoftMax activation- multi class classification

- larger batch size require smaller learning ratem

resource tagging/labelling - best way to manage ml resources for medium/big teams

use nested cross validation to avoid data leakage in time series data
feature crosses
- features need to be binned
[Link]
pipelines-and-cloud-build#cicd_architecture
gcp recommends to use cloud build when building Kubeflow pipelines

epochs
- less training, affect model accuracy
learning rate
- converge faster if higher learning rate, might cause exploding gradients

batch size
- reducing results in reduced amount of memory required for each iteration
shape (-1,2) any num of rows, 2 columns (2 elements)

detect car or no car

true positive: predict car got car
true negative: predict no car no car
false positive: predict car no car
false negative: predict no car got car

precision= tp/(tp+fp)
- proportion of positive identifications that are actually correct
recall = tp/(tp+fn)
- proportion of actual positives that correctly identified
accuracy = (tp+tn)/(tp+fp+fn+tn)
- proportion of all correct predictions

auc pr vs auc roc

auc pr
- useful for imbalanced datasets like fraud detection
- considers both precision and recall
auc roc
- less informative for imbalanced datasets because equal weighs true and false
positives

imbalanced dataset
- oversample minority, downsample majority
- upweights of minority

too large to fit in single machine - distributed

alot of dependencies not supported - custom containers
training data split into multi files, reduce execution of input pipeline - parallel
interleave
quickly test, build deploy - automl

train from scratch if model needs to adhere to PII regulations

use key based hashes to tokenize (PII)

post training quantization

- minimally decrease model performance
- reduce model latency when retraining impossible

adam optimiser
- good for large datasets
- alot of parameters to adjust

AutoML Tools
No ratings yet
AutoML Tools
2 pages
Deep Learning on AWS Guide
No ratings yet
Deep Learning on AWS Guide
29 pages
Unit 4
No ratings yet
Unit 4
28 pages
Aws Analytics Aiml
No ratings yet
Aws Analytics Aiml
13 pages
19BCE233 AI Practical 1
No ratings yet
19BCE233 AI Practical 1
4 pages
Create AI Model Guide
No ratings yet
Create AI Model Guide
14 pages
Deep Learning with Databricks Overview
No ratings yet
Deep Learning with Databricks Overview
38 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
11 pages
Own Your AI - Tech Deck
No ratings yet
Own Your AI - Tech Deck
75 pages
Top Deep Learning Frameworks Guide
No ratings yet
Top Deep Learning Frameworks Guide
26 pages
Mooc Progress Report
No ratings yet
Mooc Progress Report
8 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
01 Coding The God Bot (Dragged) 6
No ratings yet
01 Coding The God Bot (Dragged) 6
1 page
10best Neural Network
No ratings yet
10best Neural Network
12 pages
Lecture 06 NN - Framework
No ratings yet
Lecture 06 NN - Framework
5 pages
Deep Learning
No ratings yet
Deep Learning
28 pages
TensorFlow Machine Learning on Cloud
No ratings yet
TensorFlow Machine Learning on Cloud
13 pages
Introduction to TensorFlow for AI
No ratings yet
Introduction to TensorFlow for AI
9 pages
CCD Chapter 6 Notes
No ratings yet
CCD Chapter 6 Notes
18 pages
Bone Fracture Detection
No ratings yet
Bone Fracture Detection
26 pages
ML System Architecture Guide
No ratings yet
ML System Architecture Guide
47 pages
A Comparative Study of Deep Learning
No ratings yet
A Comparative Study of Deep Learning
6 pages
AWS AI Services Overview and Use Cases
No ratings yet
AWS AI Services Overview and Use Cases
5 pages
IEEE Xplore Reference Download 2025.2.9.22.39.11
No ratings yet
IEEE Xplore Reference Download 2025.2.9.22.39.11
3 pages
TensorFlow On Cloud
No ratings yet
TensorFlow On Cloud
13 pages
Tensorflow Proposal
No ratings yet
Tensorflow Proposal
3 pages
Deep Learning Library Comparison
No ratings yet
Deep Learning Library Comparison
11 pages
Production ML Pipelines With TensorFlow Extended - TFX - Presentation
No ratings yet
Production ML Pipelines With TensorFlow Extended - TFX - Presentation
234 pages
Sony Ai Content
No ratings yet
Sony Ai Content
26 pages
Data Lake and Serverless Architecture Guide
No ratings yet
Data Lake and Serverless Architecture Guide
83 pages
Cloud-Native MLOps Framework Overview
No ratings yet
Cloud-Native MLOps Framework Overview
38 pages
Summary
No ratings yet
Summary
2 pages
GCP PMLE Notes
No ratings yet
GCP PMLE Notes
3 pages
AWS SageMaker for Scalable ML Solutions
No ratings yet
AWS SageMaker for Scalable ML Solutions
2 pages
Definition ML GCP
No ratings yet
Definition ML GCP
6 pages
AI Tools for Mechanical Engineering
No ratings yet
AI Tools for Mechanical Engineering
8 pages
Introduction To TensorFlow For Artificial Intelligence
No ratings yet
Introduction To TensorFlow For Artificial Intelligence
41 pages
Unit 1 Supervised Learning
No ratings yet
Unit 1 Supervised Learning
33 pages
ML Modelling - Part 1
No ratings yet
ML Modelling - Part 1
7 pages
U4 BDH
No ratings yet
U4 BDH
19 pages
Machine Learning Guide: Basics to Deployment
No ratings yet
Machine Learning Guide: Basics to Deployment
2 pages
Looper E2e ML Platform
No ratings yet
Looper E2e ML Platform
13 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
11 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Key Features of SageMaker Studio
No ratings yet
Key Features of SageMaker Studio
2 pages
Supervised Learning
No ratings yet
Supervised Learning
237 pages
Coursera 2.4
No ratings yet
Coursera 2.4
41 pages
ML Libraries & Frameworks Guide
No ratings yet
ML Libraries & Frameworks Guide
13 pages
NB4-06 PT I Using CNN
No ratings yet
NB4-06 PT I Using CNN
21 pages
M10 - Introduction To TensorFlow, Deep Learning and Application
No ratings yet
M10 - Introduction To TensorFlow, Deep Learning and Application
25 pages
Machine Learning Ops Tools Guide
No ratings yet
Machine Learning Ops Tools Guide
27 pages
BT4221 FinalCheatsheet
No ratings yet
BT4221 FinalCheatsheet
2 pages
PS Presentation
No ratings yet
PS Presentation
15 pages
Autoencoders & Keras Overview
No ratings yet
Autoencoders & Keras Overview
42 pages
Motion Design Toolkit Principles, Practice, Andsf
No ratings yet
Motion Design Toolkit Principles, Practice, Andsf
24 pages
Run Mikrotik 64-bit RouterOS on PC
No ratings yet
Run Mikrotik 64-bit RouterOS on PC
13 pages
T2 DWC7C V3 00 00 en
No ratings yet
T2 DWC7C V3 00 00 en
151 pages
Excel Formulas List
No ratings yet
Excel Formulas List
11 pages
1.introduction To Bigdata Chap1
No ratings yet
1.introduction To Bigdata Chap1
35 pages
Real-Time Traffic Sign and Light Recognition System For ADAS
No ratings yet
Real-Time Traffic Sign and Light Recognition System For ADAS
18 pages
Knowledge Tree Book 8 - Answer Keys
No ratings yet
Knowledge Tree Book 8 - Answer Keys
8 pages
Semester Project Report Format 2025
No ratings yet
Semester Project Report Format 2025
2 pages
DCS Commissioning Steps
No ratings yet
DCS Commissioning Steps
10 pages
FCHN - Module 3 - IOdevices and Their Interfaces - 2022
No ratings yet
FCHN - Module 3 - IOdevices and Their Interfaces - 2022
24 pages
Red Hat DO280 Training Courses
No ratings yet
Red Hat DO280 Training Courses
9 pages
Introduction to Computer Hardware
No ratings yet
Introduction to Computer Hardware
49 pages
Alv Programming
No ratings yet
Alv Programming
27 pages
DM5E Corrosion Thickness Gauge Product Brochure
No ratings yet
DM5E Corrosion Thickness Gauge Product Brochure
8 pages
Computer Science Exam Questions
No ratings yet
Computer Science Exam Questions
25 pages
Advanced Productivity Tools Quiz
No ratings yet
Advanced Productivity Tools Quiz
8 pages
Free VPS Hosting Methods & Resources
No ratings yet
Free VPS Hosting Methods & Resources
2 pages
CC Imp QNS
No ratings yet
CC Imp QNS
2 pages
TCL/TK Engineering Manual: Sun Microsystems, Inc
No ratings yet
TCL/TK Engineering Manual: Sun Microsystems, Inc
23 pages
Workbook Wold Englis 2 Tercera Edicion Scribd - Búsqueda
No ratings yet
Workbook Wold Englis 2 Tercera Edicion Scribd - Búsqueda
4 pages
Network Keyboard Guide
No ratings yet
Network Keyboard Guide
65 pages
VS Code Keyboard Shortcuts Guide
No ratings yet
VS Code Keyboard Shortcuts Guide
13 pages
Master Computer Science Specialization Artificialintelligence
No ratings yet
Master Computer Science Specialization Artificialintelligence
12 pages
Unit 3
No ratings yet
Unit 3
136 pages
Revit Templates
100% (2)
Revit Templates
15 pages
HEVC Video with Alpha Channel Overview
No ratings yet
HEVC Video with Alpha Channel Overview
134 pages
C++ Database
No ratings yet
C++ Database
52 pages
Bootcamp For Android Development DSA and Web Development
No ratings yet
Bootcamp For Android Development DSA and Web Development
8 pages
Intel® Quartus® Prime Software Pro Edition Features For High-End Designs
No ratings yet
Intel® Quartus® Prime Software Pro Edition Features For High-End Designs
1 page
B.Tech Computer Science Curriculum 2023
No ratings yet
B.Tech Computer Science Curriculum 2023
324 pages