CDMP Fundamentals Notes
CDMP Fundamentals Notes
2
return to Contents datastrategypros.com
Introduction to the CDMP Fundamentals Exam
Thank you for your purchase of the CDMP Fundamentals Notes. We hope you find this guide useful in your
Data Strategy journey.
All CDMP exams are based on the Data Management Body of Knowledge (DMBOK). The CDMP
Fundamentals Notes walks you through the first 14 chapters of the DMBOK at the level of detail required to
serve as an aid to your studying, helping you become familiar with these concepts more quickly.
The Fundamentals Exam is required for all CDMP certification levels. It consists of 100 questions that you
will have 90 minutes1 to answer. In addition to these Notes, Data Strategy Professionals offers a Study Plan
that may assist with your preparation. It’s available as an email series that’s sent over the course of 90-days
or all at once through the ‘immediate access’ option. You may also be interested in the Guided Study
Sessions on each DMBOK chapter offered as part of the Community & Events Membership.
After completing the CDMP Fundamentals Exam, you’ll receive your score immediately. If you scored 80%
or above, congrats, you’re done! If you scored 60-69%, you’re set for the Associate certification, but you’ll
need to retake the exam if you want to proceed to the Practitioner (at least 70%) and/or Master level (at
least 80%).
Beyond helping you ace the Fundamentals Exam, the CDMP Fundamentals Notes are also useful for the
Specialist Exams. Specialist Exams are a deep dive on a specific chapter of the DMBOK. After the
Fundamentals Exam, you must take two Specialist Exams in order to gain recognition at the Practitioner or
Master level. There are seven options for the Specialist Exams, and Data Strategy Professionals offers a
guide to each one.
In addition to reading (or thoroughly skimming) the DMBOK before you take the Fundamentals and/or
Specialist Exams, you may choose to study additional reading materials. We have a list of recommended
reading on our website.
This study guide is not a replacement for the DMBOK. We still recommend the purchase of this book and
its use on the CDMP Fundamentals Exam and CDMP Specialist Exams. Note that only one book can be
used on the CDMP Exam (we strongly recommend the DMBOK). You can use either the hardcopy or
electronic version, but not both.
In terms of choosing which version to use, some members of the CDMP Study Group have enjoyed the
ability to use ctrl + f to find information in the ebook during the exam. Either way, you’re encouraged to
take notes, highlight, and put sticky notes in your copy of the book.
Because CDMP exams are now either open book or open notes, if you chose to use the DMBOK as your one
book, you cannot use this document as a reference during the test.
1
If you purchase the English as a Second Language (ESL) version of the exam, you’ll have 110 minutes to complete it.
There’s no downside to taking the ESL version, so you definitely should if English is not your native language.
3
return to Contents datastrategypros.com
Data Management
chapter 1 | page 17 | 2% of exam
Summary: An organization controls how it obtains and creates data. If data is viewed as an asset as well
as a potential source of risk, then better decisions will be made throughout the data lifecycle. Data
Management requires a collaborative approach to governance, architecture, modeling, and other functions.
Notes:
More data exists today than at any time in history, and understanding how to use data is key to an
organization’s success. Data Management allows an organization to capitalize on its competitive
advantage.
Data Management refers to the development, execution, and supervision of plans, policies, programs, and
practices that deliver, control, protect, and enhance the value of data and information assets throughout the
data lifecycle
Here are some paradoxical, quasi-mystical statements from the DMBOK about the nature of data:
— Data is both an interpretation of the objects it represents and an object that must be interpreted
— Data is not consumed when it is used
— Data begets more data
— Data is easy to copy and transport, but it is not easy to reproduce if it is lost or destroyed
— Data can be stolen without being gone
Data Management is made more complicated by the fact that different types of data have different lifecycle
management requirements
Different departments may have different ways of representing the same concept – subtle or blatant
differences can create significant challenges in managing data. This challenge is covered in Data
Integration & Interoperability (ch. 8).
4
return to Contents datastrategypros.com
Reliable metadata (i.e., data about data) is required to manage the organization's data assets; types of
metadata include:
— Business metadata
— Technical metadata
— Operational metadata
— Data Architecture metadata
— Data models
— Data security requirements
— Data integration standards
— Data operations processes
5
return to Contents datastrategypros.com
Data Ethics
chapter 2 | page 49 | 2% of exam
Summary: Data Ethics refers to a set of standards to manage data assets properly. It includes data integrity
standards and data privacy practices.
Notes:
Ethical principles for Data Management stem from the Belmont Principles (1979):
1. Respect for Persons
2. Beneficence
3. Justice
In 2015, the European Data Protection Supervisor (EDPS) set out an opinion on digital ethics, specifically
focused on Big Data. It called for:
— Future-oriented regulation of data processing and respect for the rights to privacy and to data protection
— Accountable controllers who determine personal information processing
— Privacy conscious engineering and design of data processing products and services
— Empowered individuals
The European Union set forth the General Data Protection Regulation (GDPR) principles in 2016 and
implemented the groundbreaking regulation in 2018. The principles of GDPR are as follows:
— Fairness, Lawfulness, Transparency
— Purpose Limitation
— Data Minimization
— Accuracy
— Storage Limitation
— Integrity and Confidentiality
— Accountability
6
return to Contents datastrategypros.com
Data Governance
chapter 3 | page 67 | 11% of exam
Summary: Data Governance refers to the exercise of authority and control (planning, monitoring, and
enforcement) over the management of data assets. It provides direction and oversight for Data
Management by establishing a system of decision rights over data that accounts for the needs of the
organization.
Notes:
Data Governance should include strategy, policy, standards and quality, oversight, compliance, issue
management, Data Management projects, and data asset valuation. Change management is required for
success.
7
return to Contents datastrategypros.com
What makes for an effective Data Governance process?
— Sustainable: given that governance is a process and not a project with a defined end, it must be "sticky"
— Embedded: efforts need to be incorporated into existing aspects of the organization, including software
development methods, data ownership, master data management, and risk management
— Measured: employing governance has a positive financial impact, but articulating this benefit requires
understanding the baseline and capturing measurable improvements
Generally Accepted Information Principles can be used to calculate the value of information as an asset
1. Accountability 6. Level of Valuation
2. Asset 7. Liability
3. Audit 8. Quality
4. Due Diligence 9. Risk
5. Going Concern 10. Value
Data Stewards should create a business glossary that associates terms (such as synonyms, metrics,
lineage, business rules, owner, etc.) with important metadata
Given that it is a process and not a project with a defined end date, implementing Data Governance requires
flexibility, such as…
— Updates to the mapping between Data Governance outcomes and business needs
— Continual adjustments to roadmap for creating Data Governance
— Ongoing enhancement of the business case for Data Governance
— Frequent assessment of Data Governance metrics
Data Governance is also a regulatory compliance project; it requires working with business and technical
leadership to find the best answers to a standard set of regulatory compliance questions (such as how, why,
when, etc.)
Primary focus of Data Governance must be on improving data (quality, accessibility, security, privacy,
retention, etc.) over time, not simply monitoring for issues
8
return to Contents datastrategypros.com
Data Architecture
chapter 4 | page 97 | 6% of exam
Summary: Data Architecture presents data assets in a structured and easy-to-understand format. It defines
the blueprint for managing data assets in alignment with organizational strategy through the establishment
of strategic data requirements and the development of architectural designs to meet these requirements.
Notes:
Most organizations have more data than a single individual can comprehend — therefore, it’s necessary to
be able to represent data at various levels of abstraction so that it can be understood for decision-making
purposes
Data flow design is a diagram that defines requirements and master blueprint for storage and processing
across databases, applications, platforms, and networks; it illustrates where data originated, where it’s
stored and used, and how it is transformed as it moves inside and between diverse processes and systems
CRUD stands for create, read, update, delete; referring to the four basic operations of persistent storage
Operational data enables up-time services for internet of things (IOT) objects like manufacturing equipment,
healthcare equipment, and consumer goods
Outcomes:
— More accurate project data requirements
— Review project data designs
— Determine data lineage impact
— Data replication control
— Enforce data architecture standards
— Guide data technology and renewal decisions
9
return to Contents datastrategypros.com
Roadmap covers 3-5 years of the development path; move from least dependent at top of diagram to most
dependent at bottom
Agile methodology requires learning, constructing, and testing in discrete delivery packages (called
“sprints”) that are small enough that if work needs to be discarded, not much is lost
Lifecycle stages:
— Current
— Deployment period
— Strategic period
— Retirement
— Preferred
— Containment
— Emerging
— Reviewed
Metrics:
— Architecture standard compliance rate
— Implementation trends:
— Use / reuse / replace / retire measurements
— Project execution efficiency measurements
— Business value measurements
— Business agility improvements
— Business quality
— Business operation quality
— Business environment improvements
10
return to Contents datastrategypros.com
Data Modeling & Design
chapter 5 | page 121 | 11% of exam
Summary: Data Modeling is the process of taking unstructured information and turning it into structured
information through the creation of Conceptual, Logical, and Physical Data Models.
Notes:
Move from Conceptual (taxonomy) to Logical (entity-relationship diagram) to Physical Model (plan for
storage and operations)
Once database architecture is complete, compare a reverse-engineered version of the Physical Model to the
Conceptual Model to ensure initial business requirements have been met
11
return to Contents datastrategypros.com
— NoSQL
Mid to large organizations usually have an application landscape with multiple schemes and models
evolved over time
Cardinality refers to the relationships between the data in two database tables; defines how many instances
of one entity are related to instances of another entity (e.g., zero, one, many)
Unary relationship involves only one entity (i.e., multiple instances of the same type); for example, the
relationship between a pre-requisite and an academic course (both are courses); it is also known as a
recursive or self-referencing relationship
When diagramming, rectangular boxes are used to represent primary key entities, and rounded boxes are
used for foreign key entities
Construction keys:
— Surrogate: simple counter that provides unique id within a table
— Compound: 2+ attributes to uniquely id instance (e.g., phone number is composed of area code +
exchange + local number)
— Composite: compound key + simple or compound key
Function keys:
— Super key: any set of attributes that uniquely id an entity instance
— Candidate key: a minimal set of one or more attributes that id entity instance
— Natural key: business key
— Primary key: candidate key chosen as unique id (versus alternate keys)
Often, primary key is surrogate key, and alternate keys are business keys
When primary key of parent is migrated as foreign key to child’s primary key, this is known as the identifying
relationship
Domain refers to the complete set of possible values an attribute can be assigned; it can be restrict with
constraints (i.e., rules on format and/or logic), and is typically characterized in the following ways:
— Data type ( e.g., Character(30) )
— Data format (e.g., phone number template)
— List
— Range
— Rule-based (e.g., ItemPrice > ItemCost)
Denormalize to…
— Combine data and avoid run-time joins
— Create smaller, pre-filtered copies of data to reduce table scans of large tables
— Pre-calculate and store expensive calcs
Snowflaking refers to the normalization of a dimensional model; this is not recommended because it
degrades performance
In Kimball data model, conformed facts / dimensions are built with entire organization’s needs in mind; they
are standardized definitions, and can be used across data marts
Fact-based modeling is based on forming plausible sentences business person might use; it’s also referred
to as Object-Role Modeling (ORM)
Data vault is a type of time based model where a normalized data store is composed of hubs, links, and
satellites
Partitioning refers to splitting a table to facilitate archiving or to improve retrieval performance – can be
either vertical or horizontal
Canonical model used for data in motion between systems; it describes structure sending and receiving
services should use, a process described in more detail in Data Integration and Interoperability (ch. 8)
13
return to Contents datastrategypros.com
Data Storage & Operations
chapter 6 | page 165 | 6% of exam
Summary: The focus of Data Storage & Operations is to maintain data integrity and ensure availability
throughout the operational lifecycle. Data Storage encapsulates the design, implementation, and support of
stored data to maximize its value. Data Operations provides support throughout the data lifecycle from
planning for collection to designing appropriate strategies for the disposal of data.
Notes:
Goals for Data Storage & Operations:
— Manage availability of data through lifecycle
— Ensure integrity
— Manage performance of data transactions
Best practices:
— Automation opportunities
— Build with reuse in mind
— Connect database standards to support requirements
— Set expectations for database administrators (DBAs) in project work
Information lifecycle:
— Plan: governance, policies, procedures
— Specify: architecture (conceptual, logical, physical modeling)
— Enable: install / provision servers, networks, storage, databases; put access controls into place
— Create & Acquire
— Maintain & Use: validate, edit, cleanse, transform, review, report, analyze
— Archive & Retrieve
— Purge
Activities:
— Database support
— Performance tuning, monitoring, error reporting
— Failover for 24/7 data availability
— Backup and recovery
— Archiving
— Data technology management
Database support:
— Implement and control database environment
— Acquire externally sourced data (optional); metadata very important
— Marketing and demographics
— Industry standards
— Elections data
— Geographic / geospatial data (e.g., images, infrared, etc.)
— Dunning & Bradstreet company hierarchies
— Linkage refers to the relationship between different active business entities or specific sites
within a corporate family
— Linkage occurs in Dunning & Bradstreet’s database when one business location has financial and
legal responsibility for another business location
14
return to Contents datastrategypros.com
— Percentage of financial and legal responsibility determines the type of linkage relationship
— 19.7M active records in Dun & Bradstreet’s global database
— Plan for data recovery: backup and recover data
— Database backup schedule
— Maintain logs
— Provide continuity of data to the organization
— Set database performance service levels
— Monitor and tune database performance
— Archive, retrieve, and purge data
— Test that archives can be retrieved
— Just clicking delete isn’t sufficient; need to follow processes to ensure data is actually removed
— Manage specialized databases
— Geospatial, graph, computer-aided design (CAD), Extensible Markup Language (XML), object etc.
CAP theorem (i.e., Brewer’s theorem) – data practitioners are forced to pick two:
— Consistency
— Availability
— Partition tolerance
15
return to Contents datastrategypros.com
Open Database Connectivity (ODBC) is an application programming interface (API) that enables database
abstraction
Clustering refers to the practice of combining more than one servers or instances connecting a single
database
Columnar storage reduces input / output (I/O) bandwidth by storing column data using compression
Environments:
— Prod
— Pre-Prod
— Test
— Quality Assurance (QA)
— Integration
— User Acceptance Testing (UAT) with realistic use cases
— Performance: high volume / high complexity
— Development
Sandbox allows only read-only connection to production; it is used for experiments by users (not DBAs)
Replication through:
— Mirroring: updates to the primary database are replicated immediately to the secondary database as
part of a two-phase commit process
— Log shipping: a secondary server receives and applies copies of the primary database transaction logs
at regular intervals
Sharding refers to the process by which small chunks of the database are isolated so that they can be can
be updated independently of other shards; replication is merely a file copy
Physical Data Model includes storage objects, indexing objects, and any encapsulated code objects
required to enforce data quality rules, connect database objects, and achieve performance
17
return to Contents datastrategypros.com
Data Security
chapter 7 | page 209 | 6% of exam
Summary: Data Security refers to the set of policies and procedures designed to reduce legal and/or
financial risks and to grow and protect the business. It ensures that data privacy and confidentiality are
maintained, that data is not breached, and that data is accessed appropriately.
Notes:
Security motivations:
— Protect stakeholders (e.g., clients, patients, employees, suppliers, partners, etc.)
— Comply with government regulations
— Protect proprietary business concerns (to protect business competitive advantage)
— Provide legitimate data access
— Meet contractual obligations (e.g., Payment Card Industry [PCI] Standard mandates encryption of user
passwords, etc.)
Steps:
— Identify and classify sensitive data assets depending on industry and organization
— Locate sensitive data throughout the enterprise
— Determine how each asset needs to be protected
— Identify how information interacts with business processes
Large businesses may have a Chief Information Security Officer (CISO) who reports to CIO or CEO
National Institute of Standards and Technology (NIST) provides a Risk Management Framework that
categorizes all enterprise information to locate sensitive info
Sarbanes-Oxley regulations are mostly concerned with protecting financial information integrity by
identifying rules for how financial information can be created and edited.
Methods of encryption:
— Hash: algorithm that converts data into math
— Symmetric / private-key: sender and recipient have key to read original data
18
return to Contents datastrategypros.com
— Public-key: sender and recipient have different keys; sender uses public key that is freely available and
receiver uses a private key to reveal original data (e.g., for a clearinghouse)
— Obfuscation or masking
Backdoor refers to an overlooked or hidden entry into a computer system or application (e.g., accidentally
keeping default password)
Bot / zombie is a workstation that’s been taken over by trojan, virus, phish
Cookie refers to small data file that website installs on a computer’s hard drive to id returning visitors and
provides their preferences; often used for Internet commerce
Firewall is software or hardware that filters network traffic to protect an individual computer or an entire
network from authorized attempts to access or attack the system; may scan both incoming and outgoing
communications for restricted or regulated info to prevent it from passing without permission (i.e., data loss
prevention)
Demilitarized Zone (DMZ) is an area between two firewalls that is used to pass or temporarily store data
between organizations
Super User Account has administrator or root access to a system to be used only in an emergency
Virtual private network (VPN) connection creates an encrypted tunnel into organization’s environment,
allowing communication between users and internal network
Confidentiality levels:
— For general audiences
— Internal use only
— Confidential
— Restricted confidential
— Registered confidential
Family Educational Rights and Privacy Act (FERPA) protects educational records
Vulnerabilities:
— Abuse of excessive privilege: user with privileges that exceed the requirements of their job
— Query-level access control restricts database privileges to minimum-required SQL operations and
data (triggers, row-level security, table security, views)
— Abuse of legitimate privilege (e.g., healthcare worker prying into patient records)
— Typically, apps restrict viewer to accessing one record at a time
— Unauthorized privilege elevation (i.e., taking on privileges of administrator)
— Vulnerabilities may occur in stored procedures, built-in functions, protocol implementations, and SQL
statements
— Intrusion Prevention Systems (IPS)
19
return to Contents datastrategypros.com
— Query-level access control intrusion prevention
— Inspect database traffic to id patterns that correspond to known vulnerabilities
— Service accounts (i.e., batch IDs for specific processes) and shared accounts (i.e., generic IDs created
when an app can’t handle total user accounts) create risk of data security breach, complicating ability to
trace breach to source
— Platform intrusion attacks
— Software updates (i.e., patches)
— Implementation of Intrusion Prevention System (IPS) and Intrusion Detection System (IDS)
— SQL injection attack: attacker inserts unauthorized statements into vulnerable SQL data channel (e.g.,
stored procedures or web application input spaces); execution in database provides attacker
unrestricted access to database
— To prevent, sanitize all inputs before passing them to server
— Change default passwords
— Encrypt backup data
— Social engineering / phishing
— Malware: malicious software (including viruses, worms, spyware, key loggers, adware)
— Adware: from download, captures buying behaviors to sell to marketing firms or for id theft
— Spyware: can store credit card info, etc.
— Trojan horse: destructive program
— Virus: attaches to an executable or vulnerable app
— Worm: built to reproduce and spread across network to send out a continuous stream of infected
messages
20
return to Contents datastrategypros.com
Data Integration & Interoperability
chapter 8 | page 257 | 6% of exam
Summary: Data Integration refers to the process of merging data from various datasets into unified data
using both technical and business processes. This process improves communication and efficiency in an
organization. Data Interoperability refers to the process of designing data systems so that data will be easy
to integrate. These fields involve processes related to the movement and consolidation of data within and
between data stores, applications, and organizations.
Notes:
Fundamental concepts:
— Integration: data exchange; process of sending and receiving data
— Interoperability: data sharing; includes metadata
— Hub: system of record
The Mars Climate Orbiter (1989) represents an example of Data Integration gone wrong. The mission failed
due to a navigation error caused by a failure to translate English units to metric. Commands from Earth
were sent in English units (in this case, pound-seconds) without being converted into the metric standard
(Newton-seconds).
Extract Transform Load (ETL), Extract Load Transform (ELT), and Change Data Capture (CDC) approach to
Data Integration:
— ETL and ELT about batch distribution: scheduling, parallel processing, complex data transformation,
cross reference, and data mapping
21
return to Contents datastrategypros.com
— Typically run overnight
— CDC is event driven and delivers real time incremental replication
— Data moves from database to database
22
return to Contents datastrategypros.com
Document & Content Management
chapter 9 | page 287 | 6% of exam
Summary: Document & Content Management promotes efficient asset retrieval from various platforms and
systems, ensures the availability of semi-structured and unstructured data assets, and aids in compliance
and audit practices. This field includes planning, implementation, and control activities used to manage the
lifecycle of data and information in a range of unstructured media, especially documents needed to support
legal and regulatory compliance requirements.
Notes:
Controlled vocabularies are a type of Reference Data (ch. 10) and records are a subset of documents
Record referns to evidence that actions were taken and decisions were made in keeping with procedures
Information architecture:
— Controlled vocabularies
— Taxonomies and ontologies
— Navigation maps
— Metadata maps
— Search functionality and specifications
— Use cases
— User flows
Semantic modeling is a type of knowledge modeling that describes a network of concepts and their
relationships
Policies:
— Scope and compliance with audits
— Identification and protection of records
— Purpose and schedule for retaining records
— How to respond to information hold orders
— Requirements for onsite and offsite storage
— Use and maintenance of hard drive and shared network drives
— Email management, addressed from content management perspective
— Proper destruction methods for records
Extensible Markup Language (XML) represents both structured and unstructured data
— Resource Description Framework (RDF): standard model for data interchange on the web
— SPARQL: used for semantic querying
— Simple Knowledge Organization System (SKOS)
— OWL (W3C Web Ontology Language): vocabulary extension of RDF; used when information
contained in documents needs to be processed by application
E-discovery is the process of finding electronic records that might serve as evidence in a legal action
23
return to Contents datastrategypros.com
Master & Reference Data Management
chapter 10 | page 327 | 10% of exam
Summary: Master & Reference Data Management supports the organization of enterprise-level data through
ongoing reconciliation and maintenance of core critical shared data that is used to enable consistency
across systems. Master & Reference Data should represent the most accurate, timely, and relevant
information about essential business entities. As such, Master & Reference Data should be considered
infrastructure for the organization.
Notes:
Master Data Management (MDM) provides control over master values and identifiers that enable consistent
use across systems; it provides a single version of customers, accounts, materials, products, etc.
Master Data provides control over domain values and definitions, which may include:
— Codes and descriptions
— Classifications
— Mappings
— Hierarchies
Terms associated with Master Data Management: golden record, system of truth, master values
Terms associated with Reference Data: list of values, taxonomy, cross reference
Both Master Data and Reference Data are forms of Data Integration (ch. 8)
Golden Record encompasses data from multiple source systems, matching and merging processes to
formulate the final “record”
Metrics:
— Data Quality and compliance
— Data change activity
— Data ingestion and consumption
— Service level agreements (SLAs)
— Data Steward coverage
— Total cost of ownership (TCO)
— Data sharing volume and usage
25
return to Contents datastrategypros.com
Data Warehousing & Business Intelligence
chapter 11 | page 359 | 10% of exam
Summary: Data Warehousing & Business Intelligence involves the planning, implementation, and control
processes to manage decision support and to enable knowledge workers to get value from data through
analysis and reporting. The Data Warehouse stores data from various databases and supports strategic
decisions.
Notes:
Warehousing:
— Stores data from other systems
— Storage includes organization that increases value
— Makes data accessible and usable for analysis
Kimball’s Data Warehouse Bus represents shared or conformed dimensions unifying multiple datamarts
Update methods:
— Trickle feeds: source accumulation
— Messaging: bus accumulation
— Streaming: target accumulation
Implementation considerations:
— Conceptual data model
— Data Quality feedback loop
— End-to-end metadata
— End-to-end verifiable data lineage
Metrics:
— Usage
— Subject area coverage percentages
— Response and performance
27
return to Contents datastrategypros.com
Metadata Management
chapter 12 | page 393 | 11% of exam
Summary: Metadata Management refers to the process of ensuring the quality of metadata (i.e., data about
data). This work involves planning, implementation, and control activities to enable access to high quality,
integrated metadata, including definitions, models, data flows, and other information critical to
understanding data and the systems through which it is created, maintained, and accessed.
Notes:
Metadata is data about data:
— Info about technical and business processes
— Data rules and constraints
— Logical and physical data structures
Metadata describes the data itself as well as concepts the data represents, and connections between data
and concepts; it’s important for the organization to standardize access to metadata
Types of metadata:
— Business / Descriptive (e.g., title, owner, business area)
— Technical / Structural (e.g., number of rows / columns)
— Operational / Administrative (e.g., version number, archive date, service level agreement [SLA]
requirements)
Storing metadata:
— Metadata repo
— Business glossary
— Business intelligence
— Configuration management database (CMDB) for IT assets
— Data dictionary
— Data integration tools
— Database management / system catalogs
— Data mapping mgmt tools
— Data Quality tools
— Event messaging tools: move data between diverse systems
— Modeling tools
— Master & Reference Data repos
— Service registries: service-oriented architecture (SOA) perspective enables reuse of services
Activities:
— Readiness / risk assessment
— Cultural analysis and change management
— Create metadata governance
— Data lineage
— Impact analysis
— Apply tags when ingesting data into a data lake
28
return to Contents datastrategypros.com
Some less obvious business drivers of Metadata Management:
— Provide context to increase confidence
— Make it easier to identify redundant data and processes
— Prevent the use of out of date or improper data
— Reduce data-oriented research time
— Improve communications between data consumers and IT
— Create accurate impact analysis
— Reduce training costs associated with data use by improving documentation of data context, history,
and origin
— Support regulatory compliance
“Metadata guides the use of data assets. It supports business intelligence, business decisions, and
business semantics.”
29
return to Contents datastrategypros.com
Data Quality
chapter 13 | page 423 | 11% of exam
Summary: Data Quality assures that data is fit for consumption and meets the organization’s needs. Quality
management techniques should be applied in the planning, implementing, and controlling stages to
measure, assess, and improve the degree to which data is fit for use within an organization.
Notes:
Fundamental frameworks:
— Strong-Wang framework – Intrinsic, Contextual, Representational, Accessibility
— Shewhart / Deming cycle – "plan, do, check, act"
Activities:
— Maturity assessment
— Profiling
Data profiling is a form of data analysis used to inspect data and assess quality using statistical techniques:
— Count of nulls
— Min / max value
— Min / max length
— Frequency distribution
— Data type and format
Profiling also includes cross-column analysis to identify overlapping or duplicate columns and expose
embedded value dependencies
— Inter-table analysis explores overlapping value sets and helps identify foreign key relationships
Data enhancement:
— Date / time stamps
— Audit data (e.g., data lineage)
— Reference vocabularies (i.e., business specific terminology, ontologies, and glossaries)
30
return to Contents datastrategypros.com
— Contextual information (i.e., adding context such as location, environment, or access methods, and
tagging data for review and analysis)
— Geographic information (e.g., geocoding)
— Demographic information
— Psychographic information (i.e., customer segmentation based on behaviors and preferences)
— Valuation information (e.g., asset valuation, inventory, and sale)
Data parsing used to analyze data using predetermined rules to define content or value
Provide continuous monitoring by incorporating control and measurement processes into information
processing flow
Data Quality incident tracking requires staff to be trained on how issues should be classified, logged, and
tracked for root cause remediation
Preventative actions:
— Establish data entry controls
— Train data producers
— Define and enforce rules
— Demand high quality data from data suppliers
— Implement governance and stewardship
— Institute formal change control
Corrective actions:
— Automated
— Manually-directed
— Manual
31
return to Contents datastrategypros.com
Big Data
chapter 14 | page 469 | 2% of exam
Summary: Big Data refers to advanced analytics, data mining, and data science.
Notes:
Extract Load Transform (ELT) is typically used for data lakes; metadata is particularly valuable
Abate Information Triangle shows context added to data and distinction between Business Intelligence and
Data Science
The Vs of Big Data: volume, velocity, variety, viscosity (i.e., how difficult to integrate), volatility, veracity
Data Science:
— Predictive analytics based in probability estimates
— Smooth data with a moving average after regression analysis
— Use unsupervised learning to tag unstructured data
Operational analytics:
— Segmentation
— Sentiment analysis
— Geocoding
— Psychological profiling
Technology enabling data science: Moore’s law, Graphical Processing Units (GPUs), hand-held devices,
internet of things (IOT)
Considerations:
— Relevance
— Readiness
— Economic viability
— Prototype
32
return to Contents datastrategypros.com
Next Steps
Congrats on finishing your review of the CDMP Fundamentals Notes! As a next step, we suggest
purchasing the CDMP Fundamentals Exam if you’ve not done so already. When you do, you’ll get access to
the official CDMP test bank of 200 practice questions. You’ll also receive a free three year membership to
DAMA International.
If you’d like additional structure for your studies, Data Strategy Professionals offers a CDMP Study Plan that
can be sent to you as emails over the course of 90 days or all at once through the ‘immediate access’
option. If you’d like additional practice questions, you can purchase those here. You may also be interested
in the Guided Study Sessions on each DMBOK chapter offered as part of the Community & Events
Membership.
You have an unlimited amount of time between purchasing a CDMP exam and actually taking the test.
When you feel ready, you can take the CDMP Fundamentals Exam using Google Chrome. Your test will be
proctored via the Honorlock browser system. Make sure to have your copy of the DMBOK close at hand
given that the exam is open book.
You’ll receive your score immediately after completing the exam. If you scored a 60-69%, you’re set for the
Associate certification, but you’ll need to retake the exam if you want to proceed to the Practitioner (70%+)
and/or Masters (80%+) levels.
The CDMP awards badges through the openbadges standard at badgr.com. You can share this credential
through LinkedIn and other social platforms of your choosing.
If you do choose to proceed with the Specialist Exams, make sure you sign up using the same email you
used for the Fundamentals Exam. You won’t automatically receive your certification unless you took the
three required exams in the same Canvas account.
To activate your free three year membership to DAMA International, contact their team at [email protected]
at the end of the month following your purchase of the CDMP Fundamentals Exam. Provide your order
number and date, and they will activate your membership for you.
If you have any issues or remaining questions, you can contact DAMA (the organization that runs the CDMP
exams) here.
33
return to Contents datastrategypros.com