DROMA_Set is a comprehensive R package for managing and analyzing drug response and omics data across multiple projects. It provides a robust framework for handling complex multi-omics datasets with integrated drug sensitivity information, enabling seamless cross-project comparisons and analyses.
It is a part of DROMA project. Visit the official DROMA website for comprehensive documentation and interactive examples.
- π¬ Multi-omics Data Management: Support for various molecular profile types (mRNA, CNV, mutations, methylation, proteomics)
- π Drug Response Integration: Comprehensive treatment response data handling and analysis
- π Cross-Project Analysis: Advanced tools for comparing and analyzing data across multiple projects
- π Sample Overlap Detection: Automatic identification and analysis of overlapping samples between projects
- ποΈ Database Integration: Robust SQLite database connectivity with efficient data storage and retrieval
- π Flexible Data Loading: Smart data loading with filtering by data type, tumor type, and specific features
- π― Metadata Management: Comprehensive sample and treatment metadata handling with ProjectID tracking
# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install DROMA_Set
devtools::install_github("mugpeng/DROMA_Set")Required packages:
DBI(>= 1.1.0)RSQLite(>= 2.2.0)methods
Suggested packages for enhanced functionality:
data.table: For efficient large dataset processingparallel: For parallel processing of multiple molecular types (Unix/Linux/macOS)
These will be automatically installed when you install DROMA_Set.
library(DROMA.Set)# Connect to your DROMA database
connectDROMADatabase("path/to/your/droma.sqlite")
# List available projects
projects <- listDROMAProjects()
print(projects)# Create a single DromaSet for one project
gCSI <- createDromaSetFromDatabase("gCSI", "path/to/droma.sqlite")
# Create a DromaSet with automatic data loading
gCSI <- createDromaSetFromDatabase("gCSI", "path/to/droma.sqlite", auto_load = TRUE)
# Create a MultiDromaSet for multiple projects
multi_set <- createMultiDromaSetFromDatabase(
project_names = c("gCSI", "CCLE"),
db_path = "path/to/droma.sqlite"
)
# Create MultiDromaSet with specific dataset types
multi_set <- createMultiDromaSetFromDatabase(
project_names = c("gCSI", "PDX_data"),
db_path = "path/to/droma.sqlite",
dataset_types = c("CellLine", "PDX")
)# Load molecular profiles
gCSI <- loadMolecularProfiles(gCSI, molecular_type = "mRNA",
features = c("BRCA1", "BRCA2", "TP53"))
# Load molecular profiles with advanced filtering
gCSI <- loadMolecularProfiles(gCSI, molecular_type = "mRNA",
data_type = "CellLine",
tumor_type = "breast cancer",
chunk_size = 100000,
validate_features = TRUE)
# Load treatment response data
gCSI <- loadTreatmentResponse(gCSI, drugs = c("Tamoxifen", "Cisplatin"))
# Load treatment response with filtering
gCSI <- loadTreatmentResponse(gCSI, drugs = c("Tamoxifen", "Cisplatin"),
data_type = "CellLine",
tumor_type = "breast cancer")
# Cross-project molecular analysis
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
overlap_only = FALSE)
# Cross-project treatment response analysis
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"),
overlap_only = FALSE)The DromaSet class represents a single project's drug response and omics data:
# Create DromaSet
dataset <- createDromaSetFromDatabase("project_name", "database.sqlite")
# Load all molecular profiles
dataset <- loadMolecularProfiles(dataset, molecular_type = "all")
# Check available data types
availableMolecularProfiles(dataset)
availableTreatmentResponses(dataset)Key Methods:
loadMolecularProfiles(): Load omics data (mRNA, CNV, mutations, etc.) with advanced filtering optionsloadTreatmentResponse(): Load drug sensitivity data with filtering by data type and tumor typeavailableMolecularProfiles(): List available molecular data typesavailableTreatmentResponses(): List available treatment response types
The MultiDromaSet class manages multiple projects for cross-project analysis:
# Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"), "database.sqlite")
# Create from existing DromaSet objects
multi_set <- createMultiDromaSetFromObjects(gCSI, CCLE)
# Add new DromaSet to existing MultiDromaSet
multi_set <- addDromaSetToMulti(multi_set, new_dromaset)
# Remove DromaSet from MultiDromaSet
multi_set <- removeDromaSetFromMulti(multi_set, "CCLE")
# Create subset of MultiDromaSet
subset_multi <- subset(multi_set, projects = c("gCSI"))
# Find overlapping samples
overlap_info <- getOverlappingSamples(multi_set)
# Load molecular data across projects
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA")
# Load treatment response data across projects
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"))Key Methods:
getOverlappingSamples(): Identify samples present in multiple projectsloadMultiProjectMolecularProfiles(): Load molecular data across multiple projects with filteringloadMultiProjectTreatmentResponse(): Load treatment response data across multiple projects with filteringgetDromaSet(): Extract individual DromaSet from MultiDromaSetavailableProjects(): List available projectscreateMultiDromaSetFromObjects(): Create from existing DromaSet objectsaddDromaSetToMulti(): Add new DromaSet to existing MultiDromaSetremoveDromaSetFromMulti(): Remove DromaSet from MultiDromaSetsubset(): Create subset with specific projects
# Check and harmonize sample names
sample_mapping <- checkDROMASampleNames(colnames(my_data))
# Update sample annotations with harmonized names
updateDROMAAnnotation("sample", sample_mapping, project_name = "MyProject",
data_type = "CellLine", tumor_type = "breast cancer")
# Check and harmonize drug names
drug_mapping <- checkDROMADrugNames(rownames(my_drug_data))
# Update drug annotations
updateDROMAAnnotation("drug", drug_mapping, project_name = "MyProject")
# Get feature data with advanced filtering
feature_data <- getFeatureFromDatabase("mRNA", "BRCA1",
data_sources = c("gCSI", "CCLE"),
data_type = "CellLine")
# Create MultiDromaSet from all available projects
multi_all <- createMultiDromaSetFromAllProjects("droma.sqlite",
exclude_projects = "test_data")Table names should follow {project}_{feature_type} (e.g. experiment1_mRNA) so
getFeatureFromDatabase() can discover them.
# Store matrix data in a SQLite file
storeMatricesInDatabase("my_database.sqlite", expression_matrix, "experiment1_mRNA")
# List tables and inferred dimensions
matrix_tables <- listMatrixTables("my_database.sqlite")
# Read back via the same API as the main DROMA database
connectDROMADatabase("my_database.sqlite")
retrieved_list <- getFeatureFromDatabase("mRNA", "all", projects = "experiment1")
subset_list <- getFeatureFromDatabase(
"mRNA", c("BRCA1", "TP53", "EGFR"), projects = "experiment1"
)
# retrieved_list$experiment1 and subset_list$experiment1 are matrices (subset is row-filtered)
closeDROMADatabase()# Load all available molecular profile types
all_data <- loadMolecularProfiles(dataset, molecular_type = "all")
# Cross-project loading of all molecular types
all_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "all")# Filter by data type and tumor type
filtered_data <- loadMolecularProfiles(dataset,
molecular_type = "mRNA",
data_type = "CellLine",
tumor_type = "breast cancer")
# Load specific features and samples
specific_data <- loadMolecularProfiles(dataset,
molecular_type = "mRNA",
features = c("BRCA1", "TP53"),
samples = c("sample1", "sample2"))
# Cross-project filtering by data type and tumor type
filtered_cross_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
data_type = "CellLine",
tumor_type = "breast cancer",
overlap_only = FALSE)# Connect to database
connectDROMADatabase("droma.sqlite")
# Add new data to database
updateDROMADatabase(expression_matrix, "new_project_mRNA")
# List all tables with metadata
tables <- listDROMADatabaseTables()
# List available projects
projects <- listDROMAProjects()
# Update project metadata
updateDROMAProjects("gCSI", dataset_type = "CellLine")
# List features for a specific project and data type
features <- listDROMAFeatures("gCSI", "mRNA", limit = 100)
# List samples for a project
samples <- listDROMASamples("gCSI", data_type = "CellLine")
# Get annotation data
sample_anno <- getDROMAAnnotation("sample", project_name = "gCSI")
drug_anno <- getDROMAAnnotation("drug", project_name = "gCSI")
# Close connection
closeDROMADatabase()# 1. Create MultiDromaSet
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"))
# 2. Find overlapping samples
overlaps <- getOverlappingSamples(multi_set)
cat("Found", overlaps$overlap_count, "overlapping samples")
# 3. Load molecular data for overlapping samples
mRNA_data <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
features = c("BRCA1", "BRCA2"),
overlap_only = FALSE,
data_type = "CellLine")
# 4. Load drug response data for overlapping samples
drug_data <- loadMultiProjectTreatmentResponse(multi_set,
drugs = c("Tamoxifen", "Cisplatin"),
overlap_only = FALSE,
data_type = "CellLine")
# 5. Perform correlation analysis
for (project in names(mRNA_data)) {
if (project %in% names(drug_data)) {
# Analyze correlations between gene expression and drug response
# Your analysis code here
}
}- mRNA: Gene expression data
- cnv: Copy number variation data
- mutation_gene: Gene-level mutation data
- mutation_site: Site-specific mutation data
- fusion: Gene fusion data
- meth: DNA methylation data
- proteinrppa: Reverse-phase protein array data
- proteinms: Mass spectrometry proteomics data
- drug: Drug sensitivity/response data
The DROMA database uses a standardized table naming convention:
{project}_{datatype}: Data tables (e.g.,gCSI_mRNA,CCLE_drug)sample_anno: Sample metadata with ProjectID trackingdrug_anno: Drug/treatment metadata with ProjectID trackingprojects: Project summary information
connectDROMADatabase(): Connect to DROMA databasecloseDROMADatabase(): Close database connectionconnectCTRDatabase(): Connect to Clinical Trial Response DatabasecloseCTRDatabase(): Close CTRDB connection
updateDROMADatabase(): Add/update data tablesupdateDROMAProjects(): Update project metadataupdateDROMAAnnotation(): Update sample/drug annotations with harmonized names
listDROMAProjects(): List available projectslistDROMADatabaseTables(): List all data tables with metadatalistDROMAFeatures(): List features for specific project/data typelistDROMASamples(): List samples with filtering optionsgetDROMAAnnotation(): Get annotation datagetFeatureFromDatabase(): Get feature data with complex filtering
checkDROMASampleNames(): Check and harmonize sample namescheckDROMADrugNames(): Check and harmonize drug names
storeMatricesInDatabase(): Store matrix data in a SQLite file by pathlistMatrixTables(): List matrix tables with metadatagetFeatureFromDatabase(): Retrieve full tables or multiplefeature_idrows (continuous omics)
getPatientExpressionData(): Retrieve patient expression data from CTRDBconnectCTRDatabase(): Connect to CTRDB databasecloseCTRDatabase(): Close CTRDB database connection
Comprehensive examples are provided in the examples/ directory:
examples/produce_dromaset.R: Basic DromaSet usageexamples/produce_multidromaset.R: MultiDromaSet cross-project analysisexamples/produce_droma_database.R: Database creation and management
# 1. Connect to database
library(DROMA.Set)
con <- connectDROMADatabase("path/to/droma.sqlite")
# 2. List available projects and data types
projects <- listDROMAProjects()
print(projects)
# 3. Create DromaSet with automatic loading
gCSI <- createDromaSetFromDatabase("gCSI", auto_load = TRUE)
# 4. Load specific molecular profiles with filtering
gCSI <- loadMolecularProfiles(gCSI,
molecular_type = "mRNA",
features = c("BRCA1", "BRCA2", "TP53"),
data_type = "CellLine",
tumor_type = "breast cancer")
# 5. Create MultiDromaSet for cross-project analysis
multi_set <- createMultiDromaSetFromDatabase(c("gCSI", "CCLE"))
# 6. Find overlapping samples
overlaps <- getOverlappingSamples(multi_set)
print(paste("Found", overlaps$overlap_count, "overlapping samples"))
# 7. Load cross-project data
cross_mRNA <- loadMultiProjectMolecularProfiles(multi_set,
molecular_type = "mRNA",
overlap_only = TRUE)
# 8. Clean up
closeDROMADatabase()- Use
overlap_only = TRUEwhen loading cross-project data to focus on overlapping samples - Specify
featuresparameter to load only genes/drugs of interest - Use
return_data = TRUEwhen you only need the data without updating the object - Filter by
data_typeandtumor_typeto reduce data loading time and focus on specific sample types - Load molecular profiles incrementally rather than using
molecular_type = "all"for large datasets - Use
chunk_sizeparameter for large datasets to optimize memory usage (default: 100,000 rows) - Set
validate_features = FALSEto skip feature validation for faster loading when you're confident features exist - Use parallel processing - the package automatically uses parallel processing for loading multiple molecular types on Unix-like systems
- Leverage database indexing - the package creates indexes on feature_id columns for faster queries
- Use
limitparameter in list functions to preview data before loading full datasets
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
If you use DROMA_Set in your research, please cite:
Li, S., Peng, Y., Chen, M. et al. Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny. Commun Biol 7, 1200 (2024). https://doi.org/10.1038/s42003-024-06891-2
This project is licensed under the MPL-2 License - see the LICENSE file for details.
- π§ Email: [email protected]
- π Issues: GitHub Issues
- π Documentation: Package Documentation
Current stable release with comprehensive documentation updates and enhanced functionality.
Refactor updateDROMADatabase and updateDROMAProjects functions to improve project tracking and metadata handling; enhance listDROMADatabaseTables to filter out backup tables and include created/updated dates; update documentation for new parameters in updateDROMAAnnotation function to support vector inputs for age, data type, and other attributes.
Enhancements Made: β Removed projects table auto-updates from updateDROMADatabase β Added _mutation_raw table exclusion across all relevant functions β Added dataset_type parameter to updateDROMAProjects β Enhanced updateDROMAAnnotation with vector support and created_date logic β Improved parameter validation and documentation
Add updateDROMAProjects function to manage project metadata in DROMA database; enhance listDROMADatabaseTables with feature and sample counts; minor adjustments in example script.
- Initial release
- DromaSet and MultiDromaSet classes
- Database integration and management
- Cross-project analysis capabilities
- Comprehensive molecular profile support
- Sample overlap detection and analysis
- Enhanced metadata management with ProjectID tracking
- Support for loading all molecular profile types with
molecular_type = "all" - Split cross-project data loading into specialized functions:
loadMultiProjectMolecularProfiles()for molecular dataloadMultiProjectTreatmentResponse()for treatment response data
- Added
data_typeandtumor_typefiltering parameters for enhanced sample selection
DROMA_Set - Empowering multi-project drug response and omics analysis π§¬π