White Paper
Test Data Management (TDM): an overview of process, challenges, and
solutions.
By Steve Anderson
Efficient management of data used for testing is essential to maximizing return on investment and
supplementing the testing efforts for the highest levels of success and coverage. If the data used in
testing does not promote ease of use and adaptation, poorly represents the sampled source, or
consumes excessive resources for preparation and maintenance, a negative impact on the desired
outcome quickly manifests and continues to degrade the quality of results. To balance in favor of
positive results and improved returns, consider the process, potential challenges, and possible solutions
involved in TDM.
The TDM Process
A Tester cannot simply claim “there are probably defects” in a system and never attempt to identify and
report the defects. They must interact with the system and replicate potential defects that have been
found. Similarly, a tester can’t provide adequate results if they do not have access to relevant systems
and an appropriate sample of data the system utilizes. For data to return the most value, it must be
managed using quality processes. The key phases involved in a TDM process are:
Planning
Analysis
Design
Build
Maintenance
Table of the test data management process:
Phase Steps Involved
Planning 1. Assign Test Data Manager (TDM)
2. Define data requirements and templates for data management
3. Prepare documentation including list of tests and data landscape reference
4. Establish a service level agreement
5. Set up the test data management team
6. Appropriate plans and papers signed off
Analysis 1. Initial set up and synch exercises involve data profiling for each individual
data store assignment/recording of version numbers for existing data in all
environments
2. Collection/consolidation of data requirements
3. Update project lists
White Paper
4. Analyze data requirements and latest distribution log
5. Asses for gaps and impact of data modification
6. Define data security, back up, storage, and access policy
7. Prepare reports
Design 1. Decide strategy for data preparation
2. Identify regions needing data to be loaded/refreshed
3. Identify appropriate methods
4. Identify data sources and providers
5. Identify tools
6. Data Distribution plans
7. Coordination/communication plan
8. Test activities plan
9. Document for data plan
Build 1. Execute plans
2. Execute masking/de-identification where applicable
3. Back up data
4. Update logs
Maintenance 1. Support change requests, unplanned data needs, problems/incidents
2. Prioritize requests where applicable
3. Analyze requirements and consider if they can be met from existing/modified
current data including data assigned to other projects
4. Required data modification
5. Back up new data
6. Assign version markers and log with appropriate description
7. Review status of ongoing projects
8. Data profile exercises
9. Assess/address gaps
10. Refresh data where needed
11. Schedule and communicate maintenance
12. If necessary, redirect requests
13. Documentation and reports
Tools
The use of quality tools promotes quality results in any line of work, and it is no different when it comes
to TDM. Links with useful tools are provided below.
[Link]
[Link]
[Link]
White Paper
Challenges
There are many challenges that can complicate the TDM process such as sensitive data masking and
resource consumption. An overlooked challenge can cause major setbacks. Several common topics for
consideration have been listed below.
Challenges of Test Data Management include:
Additional time for data set up/management instead of actual testing
Additional administrative efforts in test data management
Additional expense including personnel and hardware
Inaccurate/difficult to access data negatively impacts testing
Sensitivity of private information (credit cards, medical records, etc.)
Storage required for test data
Potential for data loss
Use of real data versus fake data generated from scratch
Data requests poorly communicated result in inadequate data returns
Identification of data anomalies
Test priority confliction
Timely data reversions
Data masking and de-identification
Data masking and de-identification is essential to comply with privacy laws and standards. There are
several approaches that may be taken to use realistic data without betraying the confidentiality of
sensitive data:
You could go through and remove all sensitive information, such as credit cards or social
security numbers, but this may not always be the correct method to accurately cover test
requirements.
One method is to generate fake data from scratch that fits the appropriate format. This can be
time consuming for personnel; however, an automated script can be used to quickly generate
required data.
If you need to return the data to its original format, in some circumstances, a reversible
algorithm can be used to alter the data. However, if the algorithm is known or discovered this
could potentially allow for the private data to be compromised.
A numeric variance, such as +/- 10%, can be used to change information (finance,
demographics, etc.) just enough to make it untrue but still valid enough for appropriate use.
Data Encryption is a very extensive approach that may not be as effective as it appears if access
rights are carelessly given out.
Masking out with viewed values being changed, such as with XX or **, can allow systems to still
use the data without making the data available for easy access.
White Paper
Solutions
Once challenges are reviewed we need to consider solutions to help mitigate the impact of these
challenges. Considerations for TDM improvement have been listed below:
Solutions to reduce challenge impact include:
Ensure connectivity of relevant parties before data set up
Testing environments and data requirements are well-defined
Smaller data sets that accurately sample full data coverage
Involved parties meet and confirm requirements are fully addressed
Back up data and assign versions
Log the versions with relevant details for quick reference and conversions
Data partitions are assigned to entire teams/projects, not to individual members
Maintain records of data distribution
Unused data/partitions made available for other relevant projects
Masking and de-identification of sensitive information
Scope of project defines masking tools for complete and consistent masking with realistic
representation
Masking tools jointly decided by relevant parties
Standard request and documentation templates
Refresh test data as needed, including periodic updates with new extracts, to accurately cover
customer data
Subset of metadata to accommodate changes
Regular scheduled maintenance
Insert row and database editing changes with multilevel undo capabilities
Cloud storage (may violate privacy protection)
Outsourcing of processes to expert companies
Networking with other professionals
Automation can be used to expedite processes and lower resource cost, including:
-Masking/De-identification of sensitive information
-comparisons between baseline and successive test runs
Summary
Efficient Test Data Management (TDM) improves quality of testing results. Improved results lead to an
improved product and higher return on investment. A process with good understanding and meeting of
requirements, coupled with quality solutions to relevant challenges, will help provide the efficiency
desired in TDM. Once TDM is optimized, increases in productivity, results, and profitability should quickly
manifest, allowing more resources and focus can be utilized on continuing quality products and services.