{scdrake}
is a scalable and reproducible pipeline for secondary
analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and
spot-based spatial transcriptomics data (SRT). {scdrake}
is an R
package built on top of the {drake}
package, a
Make-like pipeline toolkit for R
language.
The main features of the {scdrake}
pipeline are:
- Import of scRNA-seq data: 10x Genomics Cell
Ranger
output, delimited table, or
SingleCellExperiment
object. - Import of SRT data: 10x Genomics Space
Ranger
output, delimited table, or
SingleCellExperiment
object, and tissue positions file as in Space ranger. - Quality control and filtering of cells/spots and genes, removal of empty droplets.
- Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction.
- Spatially variable genes detection (for SRT data)
- Cell type annotation using reference sets, cell type annotation using user-provided marker genes.
- Integration of multiple datasets.
- Computation of cluster markers and differentially expressed genes between clusters (denoted as “contrasts”).
- Rich graphical and HTML outputs based on customizable RMarkdown
documents.
- You can find links to example outputs here.
- Thanks to
{drake}
, the pipeline is highly efficient, scalable and reproducible, and also extendable.- Want to change some parameter? No problem! Only parts of the pipeline which changed will rerun, while up-to-date ones will be skipped.
- Want to reuse the intermediate results for your own analyses? No
problem! The pipeline has smartly defined checkpoints which can
be loaded from a
{drake}
cache. - Want to extend the pipeline? No problem! The pipeline definition is just an R object which can be arbitrarily extended.
For whom is {scdrake}
purposed? It is primarily intended for
tech-savvy users (bioinformaticians), who pass on the results (reports,
images) to non-technical persons (biologists). At the same time,
bioinformaticians can quickly react to biologists’ needs by changing the
parameters of the pipeline, which then efficiently skips already
finished parts. This dialogue between the biologist and the
bioinformatician is indispensable during scRNA-seq data analysis.
{scdrake}
ensures that this communication is performed in an effective
and reproducible manner.
The pipeline structure along with
diagrams
and links to outputs is described in vignette("pipeline_overview")
(link).
If you use {scdrake}
in your research, please, consider citing
Kubovciak J, Kolar M, Novotny J (2023). “Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.” Bioinformatics Advances, 3(1). doi:10.1093/bioadv/vbad089.
Huge thanks go to the authors of the Orchestrating Single-Cell Analysis
with Bioconductor book on
whose methods and recommendations is {scdrake}
largely based.
A Docker image based on the official Bioconductor
image (version 3.15) is
available. This is the most handy and reproducible way how to use
{scdrake}
as all the dependencies are already installed and their
versions are fixed. In addition, the parent Bioconductor image comes
bundled with RStudio Server.
The complete guide to the usage of {scdrake}
’s Docker image can be
found in the Docker
vignette.
We strongly recommend to go through even if you are an experienced
Docker user. Below you can find just the basic command to download the
image and to run a detached container with RStudio in Docker or to run
{scdrake}
in Singularity.
You can also run the image in
SingularityCE
(without RStudio) - see the Singularity section in the Docker vignette
above. If the image is already downloaded in the local Docker storage,
you can use singularity pull docker-daemon:<image>
You can pull the Docker image with the latest stable {scdrake}
version
using
docker pull jirinovo/scdrake:1.6.0
singularity pull docker:jirinovo/scdrake:1.6.0
or list available versions in our Docker Hub repository.
For the latest development version use
docker pull jirinovo/scdrake:latest
singularity pull docker:jirinovo/scdrake:latest
Note for Mac users with M1/M2 chipsets: until version 1.5.0
(inclusive), arm64
images are available.
docker pull jirinovo/scdrake:1.5.0-bioc3.15-arm64
For the most common cases of host machines: Linux running Docker Engine, and Windows or MacOS running Docker Desktop.
First make a shared directory that will be mounted to the container:
mkdir ~/scdrake_projects
cd ~/scdrake_projects
And run the image that will expose RStudio Server on port 8787 on your host:
docker run -d \
-v $(pwd):/home/rstudio/scdrake_projects \
-p 8787:8787 \
-e USERID=$(id -u) \
-e GROUPID=$(id -g) \
-e PASSWORD=1234 \
jirinovo/scdrake:1.6.0
For Singularity, also make shared directories and execute the container (“run and forget”):
mkdir -p ~/scdrake_singularity
cd ~/scdrake_singularity
mkdir -p home/${USER} scdrake_projects
singularity exec \
-e \
--no-home \
--bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
--pwd "/home/${USER}/scdrake_projects" \
path/to/scdrake_image.sif \
scdrake <args> <command>
Click for details
- For Linux, follow the commands for your distribution here.
- For MacOS:
$ brew install libxml2 imagemagick@6 harfbuzz fribidi libgit2 geos pandoc
See https://cloud.r-project.org/
From now on, all commands are for R.
{renv}
is an R package for
management of local R libraries. It is intended to be used on a
per-project basis, i.e. each project should use its own library of R
packages.
install.packages("renv")
Switch to directory where you will analyze data and initialize a new
{renv}
library:
renv::consent(TRUE)
renv::init()
Now exit and run again R. You should see a message that renv library has been activated.
renv::install("BiocManager")
BiocManager::install(version = "3.15")
{renv}
also allows to export the current installed versions of R
packages (and other things) into a lockfile. Such lockfile is available
for {scdrake}
and you can use it to install all dependencies by
## -- This is a lockfile for the latest stable version of scdrake.
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/1.6.0/renv.lock")
## -- You can increase the number of CPU cores to speed up the installation.
options(Ncpus = 2)
renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())
For the lockfile for the latest development version use
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/main/renv.lock")
Now we can finally install the {scdrake}
package, but using a
non-standard approach - without its dependencies (which are already
installed from the lockfile).
remotes::install_github(
"bioinfocz/[email protected]",
dependencies = FALSE, upgrade = FALSE,
keep_source = TRUE, build_vignettes = TRUE,
repos = BiocManager::repositories()
)
For the latest development version use "bioinfocz/scdrake"
.
Optionally, you can install {scdrake}
’s CLI scripts with
scdrake::install_cli()
CLI should be now accessible as a scdrake
command. By default, the CLI
is installed into ~/.local/bin
, which is usually present in the PATH
environment variable. In case it isn’t, just add to your ~/.bashrc
:
export PATH="${HOME}/.local/bin:${PATH}"
Every time you will be using the CLI make sure your current working
directory is inside an {renv}
project. You can read the reasons
below.
Show details
You might notice that a per-project {renv}
library and an installed
CLI are “disconnected” and if you install {scdrake}
and its CLI within
multiple projects ({renv}
libraries), then the CLI scripts in
~/.local/bin
will be overwritten each time. But when you run the
scdrake
command inside an {renv}
project, the renv
directory is
automatically detected and the {renv}
library is activated by
renv::load()
, so the proper, locally installed {scdrake}
package is
then used.
Also, there is a built-in guard: the version of the CLI must match the
version of the bundled CLI scripts inside the installed {scdrake}
package. Anyway, we think changes in the CLI won’t be very frequent, so
this shouldn’t be a problem most of the time.
TIP: To save time and space, you can symlink the
renv/library
directory to multiple{scdrake}
projects.
First run the scdrake
image in Docker or Singularity - see the Docker
vignette
Then you can go through the Get Started vignette
See https://bioinfocz.github.io/scdrake for a documentation website of the latest stable version (1.6.0) where links to vignettes below become real :-)
See https://bioinfocz.github.io/scdrake/dev for a documentation website of the current development version.
- Guides:
- Using the Docker image:
https://bioinfocz.github.io/scdrake/articles/scdrake_docker.html
(or
vignette("scdrake_docker")
) - 01 Quick start (single-sample pipeline):
vignette("scdrake")
- 02 Integration pipeline guide:
vignette("scdrake_integration")
- Advanced topics:
vignette("scdrake_advanced")
- Extending the pipeline:
vignette("scdrake_extend")
{drake}
basics:vignette("drake_basics")
- Or the official
{drake}
book: https://books.ropensci.org/drake/
- Or the official
- Using the Docker image:
https://bioinfocz.github.io/scdrake/articles/scdrake_docker.html
(or
- General information:
- Pipeline overview:
vignette("pipeline_overview")
- FAQ & Howtos:
vignette("scdrake_faq")
- Spatial extension:
vignette("scdrake_spatial")
- Command line interface (CLI):
vignette("scdrake_cli")
- Config files (internals):
vignette("scdrake_config")
- Environment variables:
vignette("scdrake_envvars")
- Pipeline overview:
- General configs:
- Pipeline config ->
vignette("config_pipeline")
- Main config ->
vignette("config_main")
- Pipeline config ->
- Pipelines and stages:
- Single-sample pipeline:
- Stage
01_input_qc
: reading in data, filtering, quality control ->vignette("stage_input_qc")
- Stage
02_norm_clustering
: normalization, HVG selection, SVG selection, dimensionality reduction, clustering, (marker-based) cell type annotation ->vignette("stage_norm_clustering")
- Stage
- Integration pipeline:
- Stage
01_integration
: reading in data and integration ->vignette("stage_integration")
- Stage
02_int_clustering
: post-integration clustering and cell annotation ->vignette("stage_int_clustering")
- Stage
- Common stages:
- Stage
cluster_markers
->vignette("stage_cluster_markers")
- Stage
contrasts
(differential expression) ->vignette("stage_contrasts")
- Stage
- Single-sample pipeline:
We encourage all users to read
basics of the {drake}
package.
While it is not necessary to know all {drake}
internals to
successfully run the {scdrake}
pipeline, its knowledge is a plus. You
can read the minimum basics in vignette("drake_basics")
.
Also, the prior knowledge of Bioconductor and its classes (especially the SingleCellExperiment) is considerable.
Below is the citation output from using citation("scdrake")
in R.
Please run this yourself to check for any updates on how to cite
scdrake.
print(citation("scdrake"), bibtex = TRUE)
To cite package ‘scdrake’ in publications use:
Jiri Novotny and Jan Kubovciak (2021). scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis.
https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake.
A BibTeX entry for LaTeX users is
@Manual{,
title = {scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis},
author = {Jiri Novotny and Jan Kubovciak},
year = {2021},
note = {https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake},
}
Please note that the {scdrake}
was only made possible thanks to many
other R and bioinformatics software authors, which are cited either in
the vignettes and/or the paper(s) describing this package.
In case of any problems or suggestions, please, open a new issue. We will be happy to answer your questions, integrate new ideas, or resolve any problems :blush:
You can also use GitHub Discussions, mainly for topics not related to development (bugs, feature requests etc.), but if you need e.g. a general help.
If you want to contribute to {scdrake}
, read the contribution
guide, please. All pull requests are welcome!
:slightly_smiling_face:
Please note that the {scdrake}
project is released with a Contributor
Code of
Conduct. By
contributing to this project, you agree to abide by its terms.
This work was supported by ELIXIR CZ research infrastructure project (MEYS Grant No: LM2018131 and LM2023055) including access to computing and storage facilities.
Many things are used by {scdrake}
, but these are really worth
mentioning:
- The Bioconductor ecosystem.
- The Orchestrating Single-Cell Analysis with Bioconductor book.
- The scran, scater, and other great packages from Aaron Lun et al.
- The drake package.
- The rmarkdown package, and other ones from the tidyverse ecosystem.
- Continuous code testing is possible thanks to GitHub
Actions through
{usethis}
,{remotes}
, and{rcmdcheck}
. Customized to use Bioconductor’s docker containers. - The documentation website is
generated by
{pkgdown}
. - The code is styled automatically thanks to
{styler}
. - The documentation is formatted thanks to
{devtools}
and{roxygen2}
.
This package was developed using {biocthis}
.