0% found this document useful (0 votes)
104 views8 pages

Dasf (En)

The Databricks AI Security Framework (DASF), oh what a treasure trove of wisdom it is, bestows upon us the grand illusion of control in the wild west of AI systems. It's a veritable checklist of 53 security risks that could totally happen, but you know, only if you're unlucky or something.

Uploaded by

Snarky Security
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views8 pages

Dasf (En)

The Databricks AI Security Framework (DASF), oh what a treasure trove of wisdom it is, bestows upon us the grand illusion of control in the wild west of AI systems. It's a veritable checklist of 53 security risks that could totally happen, but you know, only if you're unlucky or something.

Uploaded by

Snarky Security
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Read more: Boosty | Sponsr | TG

end AI workflow, while Unity Catalog provides a unified


governance solution for data and AI assets. The platform
architecture is a hybrid PaaS that is data-agnostic, and the
platform security is based on trust, technology, and transparency
principles.
The DASF is intended for security teams, ML practitioners,
governance officers, and DevSecOps engineering teams. It
provides a structured conversation on new threats and
mitigations without requiring deep expertise crossover. The
DASF also includes a detailed guide for understanding the
security and compliance of specific ML systems, offering
insights into how ML impacts system security, applying security
engineering principles to ML, and providing a detailed guide for
understanding the security and compliance of specific ML
systems.
The DASF concludes with Databricks' final
recommendations on how to manage and deploy AI models
safely and securely, consistent with the core tenets of machine
learning adoption: identify the ML business use case, determine
the ML deployment model, select the most pertinent risks,
enumerate threats for each risk, and choose which controls to
implement. It also provides further reading to enhance
Abstract – This document provides an in-depth analysis of the DASF, knowledge of the AI field and the frameworks reviewed as part
exploring its structure, recommendations, and the practical of the analysis.
applications it offers to organizations implementing AI solutions.
This analysis not only serves as a quality examination but also DASF document serves as a guide for how organizations can
highlights its significance and practical benefits for security experts effectively utilize the framework to enhance the security of their
and professionals across different sectors. By implementing the AI systems, promoting a collaborative and comprehensive
guidelines and controls recommended by the DASF, organizations approach to AI security across various teams and AI model
can safeguard their AI assets against emerging threats and types:
vulnerabilities.
• Collaborative Use: The DASF is designed for
I. INTRODUCTION collaborative use by data and AI teams along with their
security counterparts. It emphasizes the importance of
The Databricks AI Security Framework (DASF) is a these teams working together throughout the AI
comprehensive guide designed to address the evolving risks lifecycle to ensure the security and compliance of AI
associated with the widespread integration of AI globally. The systems.
framework is created by the Databricks Security team and aims
to provide actionable defensive control recommendations for AI • Applicability Across Teams: The concepts in the
systems, covering the entire AI lifecycle and facilitating DASF are applicable to all teams, regardless of whether
collaboration between business, IT, data, AI, and security teams. they use Databricks to build their AI solutions. This
The DASF is not limited to securing models or endpoints but inclusivity ensures that the framework can be utilized by
adopts a holistic approach to mitigate cyber risks in AI systems, a broad audience to enhance AI security.
based on real-world evidence indicating that attackers employ
simple tactics to compromise ML-driven systems. • Guidance on AI Model Types: The document suggests
that organizations first identify what types of AI models
The DASF identifies 55 technical security risks across 12 are being built or used. It categorizes models broadly
foundational components of a generic data-centric AI system, into predictive ML models, state-of-the-art open
including raw data, data prep, datasets, data catalog governance, models, and external models, providing a framework for
machine learning algorithms, evaluation, machine learning understanding the specific security considerations for
models, model management, model serving and inference, each type.
inference response, machine learning operations (MLOps), and
data and AI platform security. Each risk is mapped to a set of • Understanding AI System Components:
mitigation controls that are ranked in prioritized order, starting Organizations are encouraged to review the 12
with perimeter security to data security. foundational components of a generic data-centric AI
system as outlined in the document.
The Databricks Data Intelligence Platform is highlighted as
a key component of the DASF, offering a unified foundation for • Risk Identification and Mitigation: The DASF guides
all data and governance. The platform includes Mosaic AI, organizations to identify relevant risks and determine
Databricks Unity Catalog, Databricks Platform Architecture, applicable controls from a comprehensive list provided
and Databricks Platform Security. Mosaic AI covers the end-to- in the document. This structured approach helps in
Read more: Boosty | Sponsr | TG
prioritizing security measures based on the specific emerge and additional controls become available. This
needs of the organization. ensures that organizations can stay current with evolving
AI security threats.
• Documentation and Features in Databricks
Terminology: While the document refers to • Applicability: The DASF is applicable to organizations
documentation or features in Databricks terminology, it using various AI models, including predictive ML
aims to be accessible to those who do not use models, generative AI models, and external models.
Databricks. This approach helps in making the This broad applicability makes it a valuable resource for
document useful for a wider audience while maintaining a wide range of organizations.
its practicality for Databricks users.
• Integration with Databricks Data Intelligence
II. AUDIENCE Platform: For organizations using the Databricks Data
Intelligence Platform, the DASF offers specific
• Security Teams: This includes Chief Information guidance on leveraging the platform's AI risk mitigation
Security Officers (CISOs), security leaders, controls. This helps organizations maximize the security
DevSecOps, Site Reliability Engineers (SREs), and benefits of the platform.
others responsible for the security of systems. They can
use the DASF to understand how machine learning B. Drawbacks
(ML) will impact system security and to grasp some of • Complexity: The DASF covers a wide range of AI
the basic mechanisms of ML. security risks and mitigation controls, which may be
• ML Practitioners and Engineers: This group overwhelming for organizations new to AI security or
comprises data engineers, data architects, ML engineers, with limited resources. Implementing the framework
and data scientists. The DASF helps them understand may require a significant investment of time and effort.
how security engineering and the "secure by design" • Databricks-centric guidance: While the DASF offers
mentality can be applied to ML. valuable guidance for organizations using the
• Governance Officers: These individuals are Databricks Data Intelligence Platform, some of the
responsible for ensuring that data and AI practices recommendations may be less applicable or actionable
within an organization comply with relevant laws, for organizations using different AI platforms or tools.
regulations, and policies. The DASF provides guidance • Evolving landscape: As the AI security landscape
on how ML impacts system security and compliance. continues to evolve, organizations may need to
• DevSecOps Engineering Teams: These teams focus on continually update their security controls and practices
integrating security into the development and operations to stay current.
processes. The DASF offers a structured way for these • Lack of specific examples: The DASF provides a high-
teams to have conversations about new threats and level overview of AI security risks and mitigation
mitigations without requiring deep expertise crossover. controls, but it may lack specific examples or case
III. BENEFITS AND DRAWBACKS studies to illustrate how these risks and controls apply in
real-world scenarios.
Databricks AI Security Framework (DASF) offers a
comprehensive and actionable guide for organizations looking • Focus on technical risks: The DASF primarily focuses
to understand and mitigate AI security risks. However, its on technical security risks and mitigation controls.
complexity and Databricks-centric guidance may present While this is an essential aspect of AI security,
challenges for some organizations. organizations should also consider non-technical risks,
such as ethical, legal, and social implications of AI,
A. Benefits which are not extensively covered in the DASF.
• Holistic approach: The DASF takes a holistic approach
to AI security, addressing risks across the entire AI IV. FRAMEWORK ALIGNMENT
lifecycle and all components of a generic data-centric AI The Databricks AI Security Framework (DASF) is designed
system. This comprehensive approach helps to complement and integrate with other security frameworks,
organizations identify and mitigate security risks more such as NIST, HITRUST, ISO/IEC 27001 and 27002, and CIS
effectively. Critical Security Controls. The DASF takes a holistic approach
to mitigating AI security risks instead of focusing only on the
• Collaboration: The framework is designed to facilitate security of models or model endpoints. This approach aligns
collaboration between business, IT, data, AI, and with the principles of these frameworks, which provide a
security teams. This encourages a unified approach to AI structured process for identifying, assessing, and mitigating
security and helps bridge the gap between different cybersecurity risks.
disciplines.
• Actionable recommendations: The DASF provides V. DATABRICKS AI SECURITY FRAMEWORK
actionable defensive control recommendations for each The framework categorizes the AI system into 12 primary
identified risk, which can be updated as new risks components, each associated with specific security risks
identified through extensive analysis. This analysis includes
Read more: Boosty | Sponsr | TG
predictive ML models, generative foundation models, and detailed mitigation controls for each. These controls
external models, informed by customer inquiries, security include effective access management, data
assessments, workshops with Chief Information Security classification, data quality enforcement, storage and
Officers (CISOs), and surveys on AI risks. The identified risks encryption, data versioning, data lineage, data
are then mapped to corresponding mitigation controls within the trustworthiness, legal considerations, handling stale
Databricks Data Intelligence Platform, with links to detailed data, and data access logs.
product documentation for each risk.
• Access Management: Ensuring that only authorized
The document outlines the AI system components and their individuals or groups can access specific datasets is
associated risks as follows: fundamental to data security. This involves
authentication, authorization, and finely tuned access
• Data Operations: This stage encompasses the initial controls.
handling of raw data, including ingestion,
transformation, and ensuring data security and • Data Classification: Classifying data is critical for
governance. It is crucial for the development of reliable governance, enabling organizations to sort and
ML models and a secure DataOps infrastructure. A total categorize data by sensitivity, importance, and
of 19 specific risks are identified in this category, criticality, which is essential for implementing
ranging from insufficient access controls to lack of end- appropriate security measures and governance policies.
to-end ML lifecycle management.
• Data Quality: High data quality is crucial for reliable
• Model Operations: This stage involves the creation of data-driven decisions and is a cornerstone of data
ML models, whether through building predictive governance. Organizations must rigorously evaluate key
models, acquiring models from marketplaces, or data attributes to ensure analytical accuracy and cost-
utilizing APIs like OpenAI. It requires a series of effectiveness.
experiments and tracking mechanisms to compare
various conditions and outcomes. There are 14 specific • Storage and Encryption: Encrypting data at rest and in
risks identified, including issues like lack of experiment transit is vital to protect against unauthorized access and
reproducibility and model drift. to comply with industry-specific data security
regulations.
• Model Deployment and Serving: This stage focuses on
securely deploying model images, serving models, and • Data Versioning and Lineage: Versioning data and
managing features such as automated scaling and rate tracking change logs are important for rolling back or
limiting. It also includes the provision of high- tracing back to the original data in case of corruption.
availability services for structured data in RAG Data lineage helps with compliance and audit-readiness
applications. A total of 15 specific risks are highlighted, by providing a clear understanding and traceability of
including prompt injection and model inversion. data used for AI.

• Operations and Platform: This final stage includes • Trustworthiness and Legal Aspects: Ensuring the
platform vulnerability management, patching, model trustworthiness of data and compliance with legal
isolation, and ensuring authorized access to models with mandates such as GDPR and CCPA is essential. This
security built into the architecture. It also involves includes the ability to "delete" specific data from
operational tooling for CI/CD to maintain secure machine learning systems and retrain models using
MLOps across development, staging, and production clean and ownership-verified datasets.
environments. Seven specific risks are identified, such • Stale Data and Access Logs: Addressing the risks of
as lack of MLOps standards and vulnerability stale data and the lack of data access logs is important
management. for maintaining the efficiency and security of business
processes. Proper audit mechanisms are critical for data
VI. RAW DATA
security and regulatory compliance.
• Importance of Raw Data: Raw data is the foundation
of AI systems, encompassing enterprise data, metadata, VII. DATA PREP
and operational data in various forms such as semi- • Definition and Importance: Data preparation is defined
structured or unstructured data, batch data, or streaming as the process of transforming raw input data into a
data. format that machine learning algorithms can interpret.
• Data Security: Securing raw data is paramount for the This stage is crucial as it directly impacts the security and
integrity of machine learning algorithms and any explainability of an ML system.
technical deployment particulars. It presents unique • Security Risks and Mitigations: The section outlines
challenges, and all data collections in an AI system are various security risks associated with data preparation
subject to standard data security challenges as well as and provides detailed mitigation controls for each. These
new ones. risks include preprocessing integrity, feature
• Risk Mitigation Controls: The document outlines manipulation, raw data criteria, and adversarial partitions.
specific risks associated with raw data and provides
Read more: Boosty | Sponsr | TG
• Preprocessing Integrity: Ensuring the integrity of • Label Flipping: This specific type of data poisoning
preprocessing involves numerical transformations, data involves changing the labels in training data, which can
aggregation, text or image data encoding, and new feature mislead the model during training and degrade its
creation. Mitigation controls include setting up Single performance. Encryption and secure access to datasets
Sign-On (SSO) with Identity Provider (IdP) and Multi- are recommended to mitigate this risk.
Factor Authentication (MFA), restricting access using IP
access lists, and implementing private links to limit the • Mitigation Controls: For each identified risk, the
source for inbound requests. DASF provides detailed mitigation controls. These
controls include the use of Single Sign-On (SSO) with
• Feature Manipulation: This risk involves the potential Identity Providers (IdP), Multi-Factor Authentication
for attackers to manipulate how data is annotated into (MFA), IP access lists, private links, and data encryption
features, which can compromise the integrity and to enhance the security of datasets.
accuracy of the model. Controls include securing model
features to prevent unauthorized updates and employing • Comprehensive Risk Management: The section
data-centric MLOps and LLMOps to promote models as emphasizes the importance of a comprehensive
code. approach to managing dataset security, from the initial
data collection to the deployment of machine learning
• Raw Data Criteria: Understanding the selection criteria models. This includes regular audits, updates to security
for raw data is essential to prevent attackers from protocols, and continuous monitoring of data integrity.
introducing malicious input that compromises system
integrity. Controls include using access control lists and IX. DATA CATALOG GOVERNANCE
data-centric MLOps for unit and integration testing. • Comprehensive Governance Approach: Data catalog
• Adversarial Partitions: This involves the risk of and governance involve managing an organization's data
attackers influencing the partitioning of datasets used in assets throughout their lifecycle, which includes
training and evaluation, potentially controlling the ML principles, practices, and tools for effective management.
system indirectly. Mitigation involves tracking and • Centralized Access Control: Managing governance for
reproducing the training data used for ML model training data and AI assets enables centralized access control,
and identifying ML models and runs derived from a auditing, lineage, data, and model discovery capabilities,
particular dataset. which limits the risk of data or model duplication,
• Comprehensive Mitigation Strategies: The section improper use of classified data for training, loss of
emphasizes the importance of a comprehensive approach provenance, and model theft.
to securing the data preparation process, including the use • Data Privacy and Security: When dealing with datasets
of stringent security measures to safeguard against that may contain sensitive information, it is crucial to
manipulations that can undermine the integrity and ensure that personally identifiable information (PII) and
reliability of ML systems other sensitive data are adequately secured to prevent
breaches and leaks. This is particularly important in
VIII. DATASETS sectors with stringent regulatory requirements.
• Significance of Datasets: Datasets are crucial for
training, validating, and testing machine learning • Audit Trails and Transparency: Proper data catalog
models. They must be carefully managed to ensure the governance allows for audit trails and tracing the origin
integrity and effectiveness of the AI systems. and transformations of data used to train AI models. This
transparency encourages trust and accountability, reduces
• Security Risks: The section outlines various security the risk of biases, and improves AI outcomes.
risks associated with datasets, including data poisoning,
ineffective storage and encryption, and label flipping. • Regulatory Compliance: Ensuring that sensitive
These risks can compromise the reliability and information in datasets is adequately secured is essential
performance of machine learning models. for compliance with regulations such as GDPR and
CCPA. This includes the ability to demonstrate data
• Data Poisoning: This risk involves attackers security and maintain audit trails.
manipulating training data to affect the model's output at
the inference stage. Mitigation strategies include robust • Collaborative Dashboard: For computer vision projects
access controls, data quality checks, and monitoring data involving multiple stakeholders, having an easy-to-use
lineage to prevent unauthorized data manipulation. labeling tool with a collaborative dashboard is essential
to keep everyone on the same page in real-time and avoid
• Ineffective Storage and Encryption: Proper data mission creep.
storage and encryption are critical to protect datasets
from unauthorized access and breaches. The framework • Automated Data Pipelines: For projects with large
recommends encryption of data at rest and in transit, volumes of data, automating data pipelines by connecting
along with stringent access controls. datasets and models using APIs can streamline the
process and make it faster to train ML models.
Read more: Boosty | Sponsr | TG
• Quality Control Workflows: It is important to have o Implementing Single Sign-On (SSO) with Identity
customizable and manageable quality control workflows Provider (IdP) and Multi-Factor Authentication
to validate labels and annotations, reduce errors and bias, (MFA) to limit who can access your data and AI
and fix bugs in datasets. Automated annotation tools can platform.
help in this process
o Using IP access lists to restrict the IP addresses
X. MACHINE LEARNING ALGORITHMS that can authenticate to Databricks.
• Technical Core of ML Systems: Machine learning o Encrypting data at rest and in transit.
algorithms are described as the technical core of any ML o Monitoring data and AI system from a single pane
system, crucial for the functionality and security of the of glass for changes and take action when changes
system. occur.
• Lesser Security Risk: It is noted that attacks against • Importance of Robust Evaluation: Effective
machine learning algorithms generally present evaluation is crucial for ensuring the reliability and
significantly less security risk compared to the data used accuracy of machine learning models. It helps in
for training, testing, and eventual operation. identifying discrepancies or anomalies in the model’s
• Offline and Online Systems: The section distinguishes decision-making process and provides insights into the
between offline and online machine learning algorithms. model’s performance.
Offline systems are trained on a fixed dataset and then
XII. MACHINE LEARNING MODELS
used for predictions, while online systems continuously
learn and adapt through iterative training with new data. • Model Security: Machine learning models are the core
of AI systems, and their security is crucial to ensure the
• Security Advantages of Offline Systems: Offline integrity and reliability of the system. The section
systems are said to have certain security advantages due discusses various risks associated with machine learning
to their fixed, static nature, which reduces the attack models and provides mitigation controls for each risk.
surface and minimizes exposure to data-borne
vulnerabilities over time. • Backdoor Machine Learning/Trojaned Model: This
risk involves an attacker embedding a backdoor in the
• Vulnerabilities of Online Systems: Online systems are model during training, which can be exploited later to
constantly exposed to new data, which increases their manipulate the model's behavior. Mitigation controls
susceptibility to poisoning attacks, adversarial inputs, include monitoring model performance, using robust
and manipulation of learning processes. training data, and implementing access controls.
• Careful Selection of Algorithms: The document • Model Asset Leak: This risk involves the unauthorized
emphasizes the importance of carefully considering the disclosure of model assets, such as model architecture,
choice between offline and online learning algorithms weights, and training data. Mitigation controls include
based on the specific security requirements and encryption, access control, and monitoring for
operating environment of the ML system unauthorized access.
XI. EVALUATION • ML Supply Chain Vulnerabilities: This risk arises
• Critical Role of Evaluation: Evaluation is essential for from vulnerabilities in the ML supply chain, such as
assessing the effectiveness of machine learning systems third-party libraries and dependencies. Mitigation
in achieving their intended functionalities. It involves controls include regular vulnerability assessments, using
using dedicated datasets to systematically analyze the trusted sources, and implementing secure development
performance of a trained model on its specific task. practices.

• Evaluation Data Poisoning: There is a risk of upstream • Source Code Control Attack: This risk involves an
attacks against data, where the data is tampered with attacker gaining unauthorized access to the source code
before it is used for machine learning, significantly repository and modifying the code to introduce
complicating the training and evaluation of ML models. vulnerabilities or backdoors. Mitigation controls include
These attacks can corrupt or alter the data in a way that access control, code review, and monitoring for
skews the training process, leading to unreliable models. unauthorized access.

• Insufficient Evaluation Data: Evaluation datasets can • Model Attribution: This risk involves the unauthorized
also be too small or too similar to the training data to be use of a model without proper attribution to its original
useful. Poor evaluation data can lead to biases, creators. Mitigation controls include using digital
hallucinations, and toxic output. It is difficult to watermarking, maintaining proper documentation, and
effectively evaluate large language models (LLMs), as enforcing licensing agreements.
these models rarely have an objective ground truth • Model Theft: This risk involves an attacker stealing a
labeled. model by reverse-engineering its behavior or directly
• Mitigation Controls: accessing its code. Mitigation controls include
Read more: Boosty | Sponsr | TG
encryption, access control, and monitoring for XIV. MODEL SERVING AND INFERENCE REQUESTS
unauthorized access.
• Model Serving: Model serving is the process of
• Model Lifecycle without HITL: This risk arises from deploying a trained machine learning model in a
the lack of human-in-the-loop (HITL) involvement in production environment to generate predictions on new
the model lifecycle, which can lead to biased or incorrect data.
predictions. Mitigation controls include regular model
• Inference Requests: Inference requests are the input
validation, human review, and continuous monitoring.
data sent to the deployed model for generating
• Model Inversion: This risk involves an attacker predictions.
inferring sensitive information about the training data by
• Security Risks: The section outlines various security
analyzing the model's behavior. Mitigation controls
risks associated with model serving and inference
include using differential privacy, access control, and
requests, including prompt injection, model inversion,
monitoring for unauthorized access.
model breakout, looped input, inferring training data
XIII. MODEL MANAGEMENT membership, discovering ML model ontology, denial of
service (DoS), LLM hallucinations, input resource
• Model Management Overview: Model management is control, and accidental exposure of unauthorized data to
the process of organizing, tracking, and maintaining models.
machine learning models throughout their lifecycle,
from development to deployment and retirement. • Prompt Injection: This risk involves an attacker
injecting malicious input into the model to manipulate
• Security Risks: The section outlines various security its behavior or extract sensitive information.
risks associated with model management, including
model attribution, model theft, model lifecycle without • Model Inversion: This risk involves an attacker
human-in-the-loop (HITL), and model inversion. attempting to reconstruct the original training data or
sensitive features by observing the model's output.
• Model Attribution: This risk involves the unauthorized
use of a model without proper attribution to its original • Model Breakout: This risk involves an attacker
creators. Mitigation controls include using digital exploiting vulnerabilities in the model serving
watermarking, maintaining proper documentation, and environment to gain unauthorized access to the
enforcing licensing agreements. underlying system or data.
• Model Theft: This risk involves an attacker stealing a • Looped Input: This risk involves an attacker submitting
model by reverse-engineering its behavior or directly repeated or looped input to the model to cause resource
accessing its code. Mitigation controls include exhaustion or degrade the system's performance.
encryption, access control, and monitoring for
• Inferring Training Data Membership: This risk
unauthorized access.
involves an attacker attempting to determine whether a
• Model Lifecycle without HITL: This risk arises from specific data point was used in the model's training data.
the lack of human-in-the-loop (HITL) involvement in
• Discovering ML Model Ontology: This risk involves
the model lifecycle, which can lead to biased or incorrect
an attacker attempting to extract information about the
predictions. Mitigation controls include regular model
model's internal structure or functionality.
validation, human review, and continuous monitoring.
• Denial of Service (DoS): This risk involves an attacker
• Model Inversion: This risk involves an attacker
submitting a large volume of inference requests to
inferring sensitive information about the training data by
overwhelm the model serving infrastructure and cause
analyzing the model's behavior. Mitigation controls
service disruption.
include using differential privacy, access control, and
monitoring for unauthorized access. • LLM Hallucinations: This risk involves the model
generating incorrect or misleading output due to the
• Mitigation Controls: For each identified risk, the
inherent uncertainty or limitations of the underlying
DASF provides detailed mitigation controls. These
algorithms.
controls include the use of Single Sign-On (SSO) with
Identity Providers (IdP), Multi-Factor Authentication • Input Resource Control: This risk involves an attacker
(MFA), IP access lists, private links, and data encryption manipulating the input data to consume excessive
to enhance the security of model management. resources during the inference process.
• Comprehensive Risk Management: The section • Accidental Exposure of Unauthorized Data to
emphasizes the importance of a comprehensive Models: This risk involves unintentionally exposing
approach to managing model security, from the initial sensitive or unauthorized data to the model during the
development to the deployment and retirement of inference process.
machine learning models. This includes regular audits,
updates to security protocols, and continuous
monitoring of model integrity.
Read more: Boosty | Sponsr | TG
XV. MODEL SERVING AND INFERENCE RESPONSE • Repeatable Enforced Standards: Enforcing repeatable
• Model Serving: Model serving is the process of standards is crucial for ensuring the security and
deploying a trained machine learning model in a reliability of ML models in production environments.
production environment to generate predictions on new This includes implementing version control, automated
data. testing, and continuous integration and deployment
(CI/CD) pipelines.
• Inference Response: Inference response refers to the
output generated by the deployed model in response to • Lack of Compliance: This risk involves the failure to
the input data sent for prediction. comply with relevant regulations and industry standards,
which can result in legal and financial consequences for
• Security Risks: The section outlines various security the organization.
risks associated with model serving and inference
response, including lack of audit and monitoring • Mitigation Controls: For each identified risk, the DASF
inference quality, output manipulation, discovering ML provides detailed mitigation controls. These controls
model ontology, discovering ML model family, and include the use of Single Sign-On (SSO) with Identity
black-box attacks. Providers (IdP), Multi-Factor Authentication (MFA), IP
access lists, private links, and data encryption to enhance
• Lack of Audit and Monitoring Inference Quality: the security of MLOps
This risk involves the absence of proper monitoring and
auditing mechanisms to ensure the quality and accuracy XVII. DATA AND AI PLATFORM SECURITY
of the model's predictions. • Inherent Risks and Rewards: The choice of platform
• Output Manipulation: This risk involves an attacker used for building and deploying AI models can have
manipulating the model's output to cause incorrect or inherent risks and rewards. Real-world evidence
misleading predictions. suggests that attackers often use simple tactics to
compromise ML-driven systems.
• Discovering ML Model Ontology: This risk involves
an attacker attempting to extract information about the • Lack of Incident Response: AI/ML applications are
model's internal structure or functionality by analyzing mission-critical for businesses, and platform vendors
the output. must address security issues quickly and effectively. A
combination of automated monitoring and manual
• Discovering ML Model Family: This risk involves an analysis is recommended to address general and ML-
attacker attempting to identify the specific type or family specific threats (DASF 39 Platform security — Incident
of the model used in the system by analyzing the output. Response Team).
• Black-Box Attacks: This risk involves an attacker • Unauthorized Privileged Access: Malicious internal
exploiting the model's vulnerabilities by treating it as a actors, such as employees or contractors, can pose a
black box and manipulating the input data to generate significant security threat. They might gain
desired outputs. unauthorized access to private training data or ML
models, leading to data breaches, leakage of sensitive
• Mitigation Controls: For each identified risk, the information, business process abuses, and potential
DASF provides detailed mitigation controls. These sabotage of ML systems. Implementing stringent
controls include the use of Single Sign-On (SSO) with internal security measures and monitoring protocols is
Identity Providers (IdP), Multi-Factor Authentication crucial to mitigate insider risks (DASF 40 Platform
(MFA), IP access lists, private links, and data encryption security — Internal access).
to enhance the security of model serving and inference
response • Poor Security in the Software Development Lifecycle
(SDLC): Software platform security is an important part
XVI. MACHINE LEARNING OPERATIONS (MLOPS) of any progressive security program. Hackers often
• MLOps Definition: MLOps is the practice of combining exploit bugs in the platform where AI is built. The
Machine Learning (ML), DevOps, and Data Engineering security of AI depends on the platform's security (DASF
to automate and standardize the process of deploying, 41 Platform security — secure SDLC).
maintaining, and updating ML models in production • Lack of Compliance: As AI applications become more
environments. prevalent, they are increasingly subject to scrutiny and
• Security Risks: The section outlines various security regulations such as GDPR and CCPA. Utilizing a
risks associated with MLOps, including lack of MLOps, compliance-certified platform can be a significant
repeatable enforced standards, and lack of compliance. advantage for organizations, as these platforms are
specifically designed to meet regulatory standards and
• Lack of MLOps: This risk involves the absence of a provide essential tools and resources to help
standardized and automated process for deploying, organizations build and deploy AI applications that are
maintaining, and updating ML models, which can lead to compliant with these laws
inconsistencies, errors, and security vulnerabilities.
Read more: Boosty | Sponsr | TG
XVIII. DATABRICKS DATA INTELLIGENCE PLATFORM AI and ML mitigation control, its shared responsibility
The Databricks Data Intelligence Platform is a between Databricks and the organization, and the
comprehensive solution for AI and data management. associated Databricks technical documentation.

• Mosaic AI: This component of the platform covers the • Shared Responsibility: The responsibility for
end-to-end AI workflow, from data preparation to model implementing the mitigation controls is shared between
deployment and monitoring. Databricks and the organization using the platform.
Databricks provides the tools and resources needed to
• Databricks Unity Catalog: This is a unified implement the controls, while the organization is
governance solution for data and AI assets. It provides responsible for configuring and managing them
data discovery, data lineage, and fine-grained access according to their specific needs.
control.
• Comprehensive Approach: The Databricks AI risk
• Databricks Platform Architecture: The platform mitigation controls cover a wide range of security risks,
architecture is a hybrid PaaS that is data-agnostic, from data security and access control to model
supporting a wide range of data types and sources. deployment and monitoring. This comprehensive
approach helps organizations reduce overall risk in their
• Databricks Platform Security: The security of the AI system development and deployment processes.
platform is based on trust, technology, and transparency
principles. It includes features like encryption, access • Applicability: The Databricks AI risk mitigation
control, and monitoring. controls are applicable to all types of AI models,
including predictive ML models, generative AI models,
• AI Risk Mitigation Controls: Databricks has identified and external models. This ensures that organizations can
55 technical security risks across 12 foundational implement the appropriate controls based on the specific
components of a generic data-centric AI system. For AI models they are using.
each risk, the platform provides a guide to the AI and
ML mitigation control, its shared responsibility between • Effort Estimation: Each control is tagged as "Out-of-
Databricks and the organization, and the associated the-box," "Configuration," or "Implementation,"
Databricks technical documentation. helping teams estimate the effort involved in
implementing the control on the Databricks Data
XIX. DATABRICKS AI RISK MITIGATION CONTROLS Intelligence Platform. This allows organizations to
• Databricks AI Risk Mitigation Controls: Databricks prioritize their security efforts and allocate resources
has identified 55 technical security risks across 12 effectively
foundational components of a generic data-centric AI
system. For each risk, the DASF provides a guide to the

You might also like