Skip to content

aLiehrmann/CROCS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CROCS (Changepoints for a Range of ComplexitieS)

CROCS is an extension of the CROPS (Haynes et al. 2017) and sequential search (Hocking et al. 2018) algorithm which allows to compute all optimal changepoint segmentations of data sequences for all penalty values accross a peak range. This package implements the CROCS algorithm as well as segmentation models for peak calling. They have been described in our study Liehrmann et al. 2020 in a genomic context. On top of the gfpop method (Runge et al. 2020), it proposes a framework to design your peak caller. The user can choose among several noise assumptions CROCS::lossFactory, transformations CROCS::transformationFactory, peak shape assumptions CROCS::graphFactory and peak start/end post-processing rules CROCS::postProcessingRuleFactory to build a peak caller CROCS::peakCallerFactory. The CROCS::CROCS output is the first step to compute an objective function that can be optimized in the supervised learning procedure. We give two examples of its use in the package’s Vignette.

Installation

We install the package from Github.

#devtools::install_github("aLiehrmann/CROCS")

Quick Start

library(CROCS)
library(data.table)
library(purrr)
library(ggplot2)

We prepare the data. Here we use a coverage profile from a ChIP-Seq experiment directed to H3K4me3 histone marks.

counts <- as.data.table(CROCS::counts)
counts <- counts[sample.id == "McGill0036" & chunk==3,]
weights <- counts$chromEnd - counts$chromStart
coverage <- counts$coverage

The histone marks of interest appear as regions with high read density referred to as peaks in the coverage profile.

ggplot(data = counts, aes(y=coverage, x=chromStart/10^3))+
  geom_line(color="grey40")+
  theme_bw()+
  xlab('position on chromosome (kb: kilo bases)')+
  ylab("aligned sequence reads")+
  theme(
    strip.background = element_rect(fill="grey95"), 
    text = element_text(size=15),
  )+
  coord_cartesian(xlim=c(40035,40140))

We define the peak caller. Here we take as example the unconstrained segmentation model ("std") with gaussian transformed noise ("mean"), based on anscombe transformation ("anscombe_poisson"), and the largest peak post-processing rule as peak caller ("largest_peak").

std_g <- graphFactory(graph="std")
loss <- lossFactory(type="mean")
tr <- transformationFactory(transformation="anscombe_poisson")
rule <- postProcessingRuleFactory(rule="largest_peak")

my_peak_caller <- peakCallerFactory(
  mygraph=std_g, 
  transformation_f=tr, 
  loss_f=loss,
  postProcessingRule_f=rule
) 

We compute all optimal models between 1 and 3 peaks with the help of CROCS::CROCS. In the case we use the negative binomial noise model, CROCS as a phi parameter which must be previded by the user. The dispersion phi parameter links the variance and the mean of the observations.

fit <- CROCS(
  data = coverage, 
  weights = weights, 
  lower_bound_peak = 1, 
  upper_bound_peak = 3, 
  solver = my_peak_caller
)

We can visualize all output optimal segmentations on the profile.

data <- data.table(x=counts$chromStart/10^3, y=tr(coverage))
plotOptModels(
  CROCS_results_p = fit,
  data_p = data,
  xlab_p = "position on chromosome (kb: kilo bases)",
  ylab_p = "anscombe transformation: sqrt(aligned sequence reads + 3/8)",
  xmin_p = 40035,
  xmax_p = 40140
)
#> Warning in as.data.table.list(x, keep.rownames = keep.rownames, check.names =
#> check.names, : Item 1 has 0 rows but longest item has 1; filled with NA

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages