AZ CDISC Implementation
A brief history of CDISC implementation Stephen Harrison
Overview
Background CDISC Implementation Strategy First steps Business as usual ADaM or RDB? Lessons learned Summary
Background
Seven R&D sites all operating in their own environments Creating and maintaining similar tools across the R&D sites Continuous duplication of effort across regions
A&RT Initiative
Project initiation April 2003 Objective: Harmonise the A&R process and environment across ALL R&D sites within AZ Multiple workstreams looking at technology, process and standards Reporting Database (RDB) w/stream
Deliver standardized reusable code or macros to automate production of analysis and report ready datasets
Data Flow Process
CRFs
Module Package RAW Data
Analysis Datasets/ RDB
CSR/ HLD Output
Previous data flow process was a simple route from existing CRFs to Clinical Study Reports/Higher Level Document outputs Reporting Database is created directly from the Module Package Remit of project was to use existing internal data standards Opportunity to implement CDISC standards
5
CDISC Implementation Strategy
CRFs
Module Package RAW Data
Analysis Datasets/ RDB
CSR/ HLD Output
SDTM
RDB completely described in terms of SDTM source good for reviewer No need to construct SDTM at the end of the process
CDISC Implementation Strategy
CRFs
Module Package RAW Data
Analysis Datasets/ RDB
CSR/ HLD Output
SDTM
RDB completely described in terms of SDTM source good for reviewer No need to construct SDTM at the end of the process Linear process fulfils the requirement of traceability
Longer term strategy
CRFs
Module Package RAW Data
Analysis Datasets/ RDB
CSR/ HLD Output
New CRFs/ CDASH
SDTM
Modified CRFs Underlying RAW data standards are SDTM friendly Transformation process is simplified CDASH - Clinical Data Acquisition Standards Harmonization
8
Longer term strategy
CRFs
Module Package RAW Data
Analysis Datasets/ RDB
CSR/ HLD Output
New CRFs/ CDASH
SDTM
ADaM
ADaM Adopt ADaM model, replacing internal data standards Utilise industry standard transformation and derivation processes
First steps
Global team set up August 2005 to specify AZ business rules Application of SDTM Implementation Guide v3.1.1 from an AZ point of view Two team members also part of CDISC SDS team
Inside track to SDTM
Scope - all corporate and TA standard modules (>200) Mapping exercise took nearly 18 months to complete!
10
Manual mapping document
11
Business as usual
Web Interface developed Metadata driven process RAW to SDTM and SDTM to RDB mapping function Inherit Corporate data standards and maps down to project or study level Metadata used by code builder to create executable code
12
RAW Data Metadata
Windows
PMPL
CSV file Variables
A&RT Web Interface A&RT Application Database
Datasets Variables
Project
Dataset
Import
Variables Study
Data Standards
Dataset
13
Standards and Reuse of Code
Corporate
Data standards Mappings
Locked dataset definitions Locked Corporate map
Therapy Area
Data standards Mappings
Locked dataset definitions Locked TA map
Project
Data standards Mappings
Locked dataset definitions Locked Project map
Study
Data standards Mappings
14
Inheritance SDTM
RAW Metadata Corporate
DEM AELOG HISM
SDTM Metadata
mapping
DM
Corporate
AE MH
TA (Respiratory)
PULM RESPHIS
PF
CF
Project
DM DEM AELOG PULM RESPHIS
Project
PF AE MH CF
Study 1
DEM AELOG PULM 15
Study 2
DEM AELOG RESP HIS DM
Study 1
PF AE DM
Study 2
AE CF MH
Example RAW SDTM map
16
Define Simple Mapping
17
Define Macro Mapping
18
Transposition Groups
19
A&RT Mapping Process
Create Mapping Metadata RAW SDTM Import RAW Data Metadata
Create Mapping Metadata SDTM RDB
Web Interface (Oracle) UNIX (SAS) SAS code Execute job
A&RT Application Database
Program Builder
SAS code
Execute job Load RAW Data
20
RAW Database
SDTM Database
Reporting Database
ADaM or RDB?
Well established reporting requirements AZ Reporting Database standards defined and in use before CDISC considered Perception that ADaM model still quite unstable and subject to significant change Unlike SDTM, no regulatory pressure to implement ADaM
21
Reporting Database
Study Database (RAW Data)
WBDC LAB
Mapping to SDTM
Reporting Database Superset
Derived Variables Unaltered Source Data in SDTM format Supplemental Qualifiers
New Dataset
Key ID Variables
CRO RAW Module Package Datasets AMOS
SDTM Data Domains
Derived Variables
Supplemental Qualifiers Etc GRand
Derived Observations
R_AE R_DM R_VS Etc.
RD_xx RH_xx Etc.
22
Reporting Datasets (R_)
Datasets must remain fundamentally unchanged from the SDTM source data. An R_ dataset is a superset of the SDTM dataset SDTM RDB
SuppVS
Variables
VS
VS
R_VS (Superset)
Observations SuppVS
Original SDTM dataset name retained, but prefixed with R_ All information from SUPP-- datasets re-attached to parent RDB dataset
23
RDB General Conventions
All reporting must take place directly from Reporting Database defined at study level All variables used for reporting must be created in relevant reporting dataset Subject datasets must have at least 1 observation per randomized subject All SDTM data must be present in Reporting Database Original SDTM data cannot be amended, but new variables or observations can be created as needed (e.g., imputing dates) All naming conventions defined by SDTM must be followed when generating additional variables
24
RDB Common Dataset Features
Datasets taken from source database name prefixed with R_ (e.g., DM becomes R_DM) New derived datasets name prefixed with RD_ (e.g., RD_SUBJ) Transposed datasets name prefixed with RH_ (e.g., R_LB becomes RH_LB) Datasets must contain Key variables to uniquely identify every observation Duplication of variables across multiple datasets should be avoided (except for Key and Cross variables) Duplication of source (SDTM) variables should be avoided Variables defined at a higher level must not have attributes changed, except in the following circumstances: Length may be increased Algorithm may be project-specific
25
RDB Use of Codes and Decodes
Historically, codes and decodes used widely Associated using SAS formats Loses all meaning outside of SAS SDTM does not use codes and decodes Variables defined using explicit text values to describe observations Clear, unambiguous and interpretable irrespective of the tools or software used RDB based on SDTM Codes and decodes not used in final reporting datasets
26
Transposed Datasets
RAW datasets may be transposed to contain re-structured RAW data (e.g., RH_dataset = horizontal structure, RV_dataset = vertical) Normally only considered for Findings domains Original dataset must still exist as R_dataset May make reporting easier (e.g., lab parameters reported as columns)
27
Transposed Datasets
Carefully consider whether transposed data is essential and/or appropriate Duplicates data Variable names driven by --TESTCD can be meaningless, e.g.,:
Unique subject Identifier USUBJID Visit name VISIT Alanine Aminotranferase (ukat/L) L01101 Albumin (g/L) L01118 Alkaline Aspartate Phosphotase Aminotranferase (ukat/L) (ukat/L) L01104 L01102
Significant loss of information e.g., original results, units, reference ranges, analysis flags, etc. Contravenes CDISC SDTM convention to store units as a separate variable qualifier to the test result
28
Example SDTM to RDB map
29
Lessons learned
Mapping takes a lot of effort!
Ambiguity in guidance Individual opinions and interpretations
Get your conventions right
Often had to revisit decisions as experience grew
Big differences between CRF and SDTM standards:
Purpose: data collection vs. data storage Coding: codes vs. text (e.g., 1, 2, 3 vs. mild, moderate, severe) Structure: horizontal vs. vertical
SDTM IG v3.1.2 a big improvement
Introduction of Clinical Findings (CF) domain really helped with many difficult mappings
30
Changes for SDTM IG v3.1.2 CF
General Observation Classes
Special Purpose Datasets
Interventions
Events
Findings
Demographics
Clinical Findings (CF) Domain
Findings about Events or Interventions that dont fit in SDTM domain variables for those classes CFOBJ (Object of Measurement): Event or Intervention that is the subject of the test evaluation Mandatory, but wont necessarily have a parent record in another domain
Comments
Related Records Supplemental Qualifiers
Trial Design
31
Changes for SDTM IG v3.1.2 CF
MHCAT
MHSTDTC
MHTERM
32
MHOCCUR MHPRESP
Changes for SDTM IG v3.1.2 CF
CFCAT CFTESTCD = OCCUR CFORRES = answer provided in checkbox
CFOBJ
33
Changes for SDTM IG v3.1.2 CF
CFCAT
CFORRES
CFOBJ CFTEST
34
Changes for SDTM IG v3.1.2 CF
Example
Row 1 2 3 4 USUBJID D06-608-123 D06-608-123 D06-608-123 D06-608-123 CFSEQ 1 2 3 4 CFOBJ HYPERTENSION MYOCARDIAL INFARCTION MYOCARDIAL INFARCTION MYOCARDIAL INFARCTION CFTEST OCCURRENCE OCCURRENCE DATE OF MOST RECENT MI NUMBER OF MI CFTESTCD OCCUR OCCUR MY_LDAT MYNO CFDTC 2006-08-28 2006-08-28 2006-06-20 2006-08-28
(continued)
Row 1 2 3 4 USUBJID D06-608-123 D06-608-123 D06-608-123 D06-608-123 VISITNUM 1 1 1 1 CFORRES CURRENT PAST 2006-06-20 2 CFSTRESC CURRENT PAST 2006-06-20 2 CFCAT SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY
35
Summary
CDISC Implementation is a huge task AZ strategy allows for step-wise implementation
CDASH ADaM
Mapping tool really assists process
Easy inheritance Reuse of standards and code
SDTM IG v3.1.2 big improvement
36
Questions and Answers
37
38
Thank You