Skip to content

Commit

Permalink
Merge pull request #94 from KevinMenden/development
Browse files Browse the repository at this point in the history
Bug fix release v1.1.1
  • Loading branch information
KevinMenden authored May 22, 2021
2 parents 3028486 + 0b1993c commit 03bb0a6
Show file tree
Hide file tree
Showing 11 changed files with 130 additions and 99 deletions.
14 changes: 10 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Scaden Changelog

## Version 1.1.1

* Fixed bugs in scaden model definition [[#88](https://github.com/KevinMenden/scaden/issues/88)]
* removed installation instructions for bioconda as not functional at the moment [[#86](https://github.com/KevinMenden/scaden/issues/86)]
* Fixed bug in `scaden example` [[#85](https://github.com/KevinMenden/scaden/issues/85)]

## Version 1.1.0

* Reduced memory usage of `scaden simulate` significantly by performing simulation for one dataset at a time.
Expand All @@ -25,13 +31,13 @@ of simulated datasets
* Rebuild Scaden model and training to use TF2 Keras API instead of the old compatibility functions
* added `scaden example` command which allows to generate example data for test-running scaden and to inpstec the expected file format
* added more tests and checks input reading function in `scaden simulate`
* fixed bug in reading input data
* fixed bug in reading input data

### Version 0.9.6

+ fixed Dockerfile (switched to pip installation)
+ added better error messages to `simulate` command
+ cleaned up dependencies
* fixed Dockerfile (switched to pip installation)
* added better error messages to `simulate` command
* cleaned up dependencies

### v0.9.5

Expand Down
34 changes: 15 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,40 @@
![Scaden](docs/img/scaden_logo.png)
# Single-cell assisted deconvolutional network

![Scaden](docs/img/scaden_logo.png)

![Scaden version](https://img.shields.io/badge/scaden-v1.1.0-cyan)
![Scaden version](https://img.shields.io/badge/scaden-v1.1.1-cyan)
![MIT](https://img.shields.io/badge/License-MIT-black)
![Install with pip](https://img.shields.io/badge/Install%20with-pip-blue)
[![Downloads](https://pepy.tech/badge/scaden)](https://pepy.tech/project/scaden)
![Docker](https://github.com/kevinmenden/scaden/workflows/Docker/badge.svg)
![Scaden CI](https://github.com/kevinmenden/scaden/workflows/Scaden%20CI/badge.svg)

## Single-cell assisted deconvolutional network

Scaden is a deep-learning based algorithm for cell type deconvolution of bulk RNA-seq samples. It was developed
at the DZNE Tübingen and the ZMNH in Hamburg.
at the DZNE Tübingen and the ZMNH in Hamburg.
The method is published in Science Advances:
[Deep-learning based cell composition analysis from tissue expression profiles](https://advances.sciencemag.org/content/6/30/eaba2619)

A complete documentation is available [here](https://scaden.readthedocs.io)


![Figure1](docs/img/figure1.png)

Scaden overview. a) Generation of artificial bulk samples with known cell type composition from scRNA-seq data. b) Training
of Scaden model ensemble on simulated training data. c) Scaden ensemble architecture. d) A trained Scaden model can be used
to deconvolve complex bulk mixtures.



## Installation guide
Scaden can be easily installed on a Linux system, and should also work on Mac.

Scaden can be easily installed on a Linux system, and should also work on Mac.
There are currently two options for installing Scaden, either using [Bioconda](https://bioconda.github.io/) or via [pip](https://pypi.org/).

### pip

To install Scaden via pip, simply run the following command:

`pip install scaden`


### Bioconda
Bioconda installation is currently not supported for the newest Scaden versions, but this will hopefully change soon.
It is therefore highly recommended to install via pip.

`conda install -c bioconda scaden`

### GPU

If you want to make use of your GPU, you will have to additionally install `tensorflow-gpu`.

For pip:
Expand All @@ -54,6 +46,7 @@ For conda:
`conda install tensorflow-gpu`

### Docker

If you don't want to install Scaden at all, but rather use a Docker container, we provide that as well.
For every release, we provide two version - one for CPU and one for GPU usage.
To pull the CPU container, use this command:
Expand All @@ -65,16 +58,19 @@ For the GPU container:
`docker pull ghcr.io/kevinmenden/scaden/scaden-gpu`

### Webtool (beta)

Additionally, we now proivde a web tool:

[https://scaden.ims.bio](https://scaden.ims.bio)

It contains pre-generated training datasets for several tissues, and all you need to do is to upload your expression data. Please note that this is still in preview.

## Usage

We provide a detailed instructions for how to use Scaden at our [Documentation page](https://scaden.readthedocs.io/en/latest/usage/)

A deconvolution workflow with Scaden consists of four major steps:

* data simulation
* data processing
* training
Expand All @@ -83,10 +79,12 @@ A deconvolution workflow with Scaden consists of four major steps:
If training data is already available, you can start at the data processing step. Otherwise you will first have to process scRNA-seq datasets and perform data simulation to generate a training dataset. As an example workflow, you can use Scaden's function `scaden example` to generate example data and go through the whole pipeline.

First, make an example data directory and generate the example data:

```bash
mkdir example_data
scaden example --out example_data/
```

This generates the files "example_counts.txt", "example_celltypes.txt" and "example_bulk_data.txt" in the "example_data" directory. Next, you can generate training data:

```bash
Expand All @@ -113,10 +111,8 @@ scaden predict --model_dir model example_data/example_bulk_data.txt
Now you should have a file called "scaden_predictions.txt" in your working directory, which contains your estimated cell compositions.
### 1. System requirements
Scaden was developed and tested on Linux (Ubuntu 16.04 and 18.04). It was not tested on Windows or Mac, but should
also be usable on these systems when installing with Pip or Bioconda. Scaden does not require any special
hardware (e.g. GPU), however we recommend to have at least 16 GB of memory.
Expand Down
15 changes: 10 additions & 5 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Scaden Changelog

## Version 1.1.1

* Fixed bugs in scaden model definition [[#88](https://github.com/KevinMenden/scaden/issues/88)]
* removed installation instructions for bioconda as not functional at the moment [[#86](https://github.com/KevinMenden/scaden/issues/86)]
* Fixed bug in `scaden example` [[#85](https://github.com/KevinMenden/scaden/issues/85)]

## Version 1.1.0

* Reduced memory usage of `scaden simulate` significantly by performing simulation for one dataset at a time.
Expand All @@ -10,7 +16,6 @@
of simulated datasets
* Added `scaden merge` command which allows merging of previously created datasets


### Version 1.0.2

* General improvement of logging using the 'rich' library for colorized output
Expand All @@ -26,13 +31,13 @@ of simulated datasets
* Rebuild Scaden model and training to use TF2 Keras API instead of the old compatibility functions
* added `scaden example` command which allows to generate example data for test-running scaden and to inpstec the expected file format
* added more tests and checks input reading function in `scaden simulate`
* fixed bug in reading input data
* fixed bug in reading input data

### Version 0.9.6

+ fixed Dockerfile (switched to pip installation)
+ added better error messages to `simulate` command
+ cleaned up dependencies
* fixed Dockerfile (switched to pip installation)
* added better error messages to `simulate` command
* cleaned up dependencies

### v0.9.5

Expand Down
13 changes: 4 additions & 9 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,16 @@
# Installation

Scaden be easily installed on a Linux system, and should also work on Mac.
There are currently two options for installing Scaden, either using [Bioconda](https://bioconda.github.io/) or via [pip](https://pypi.org/).


## pip

To install Scaden via pip, simply run the following command:

`pip install scaden`


## Bioconda
Bioconda installation is currently not supported for the newest Scaden versions, but this will hopefully change soon.
It is therefore highly recommended to install via pip.

`conda install -c bioconda scaden`


## Docker

If you don't want to install Scaden at all, but rather use a Docker container, we provide that as well.
For every release, we provide two version - one for CPU and one for GPU usage.
To pull the CPU container, use this command:
Expand All @@ -28,6 +22,7 @@ For the GPU container:
`docker pull ghcr.io/kevinmenden/scaden/scaden-gpu`

## Webtool (beta)

We now also provide a webtool for you:

[https://scaden.ims.bio](https://scaden.ims.bio)
Expand Down
30 changes: 21 additions & 9 deletions scaden/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,16 @@
import rich.logging
import logging
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import tensorflow as tf
from scaden.train import training
from scaden.predict import prediction
from scaden.process import processing
from scaden.simulate import simulation
from scaden.example import exampleData
from scaden.merge import merge_datasets

"""
author: Kevin Menden
Expand All @@ -31,8 +34,6 @@
)
)

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"


def main():
text = """
Expand Down Expand Up @@ -147,7 +148,7 @@ def predict(data_path, model_dir, outname, seed):
"--var_cutoff",
default=0.1,
help="Filter out genes with a variance less than the specified cutoff. A low cutoff is recommended,"
"this should only remove genes that are obviously uninformative.",
"this should only remove genes that are obviously uninformative.",
)
def process(data_path, prediction_data, processed_path, var_cutoff):
""" Process a dataset for training """
Expand Down Expand Up @@ -187,7 +188,7 @@ def process(data_path, prediction_data, processed_path, var_cutoff):
multiple=True,
default=["unknown"],
help="Specifiy cell types to merge into the unknown category. Specify this flag for every cell type you want to "
"merge in unknown. [default: unknown]",
"merge in unknown. [default: unknown]",
)
@click.option(
"--prefix",
Expand All @@ -211,7 +212,7 @@ def simulate(out, data, cells, n_samples, pattern, unknown, prefix, data_format)
pattern=pattern,
unknown_celltypes=unknown,
out_prefix=prefix,
fmt=data_format
fmt=data_format,
)


Expand All @@ -221,9 +222,18 @@ def simulate(out, data, cells, n_samples, pattern, unknown, prefix, data_format)


@cli.command()
@click.option("--data", "-d", default=".", help="Directory containing simulated datasets (in .h5ad format)")
@click.option("--prefix", "-p", default="data", help="Prefix of output file [default: data]")
@click.option("--files", "-f", default=None, help="Comma-separated list of filenames to merge")
@click.option(
"--data",
"-d",
default=".",
help="Directory containing simulated datasets (in .h5ad format)",
)
@click.option(
"--prefix", "-p", default="data", help="Prefix of output file [default: data]"
)
@click.option(
"--files", "-f", default=None, help="Comma-separated list of filenames to merge"
)
def merge(data, prefix, files):
""" Merge simulated datasets into on training dataset """
merge_datasets(data_dir=data, prefix=prefix, files=files)
Expand All @@ -244,4 +254,6 @@ def merge(data, prefix, files):
)
def example(cells, genes, samples, out, types):
""" Generate an example dataset """
exampleData(n_cells=cells, n_genes=genes, n_samples=samples, out_dir=out, n_types=types)
exampleData(
n_cells=cells, n_genes=genes, n_samples=samples, out_dir=out, n_types=types
)
6 changes: 3 additions & 3 deletions scaden/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ def exampleData(n_cells=10, n_genes=100, n_samples=10, n_types=5, out_dir="./"):
sys.exit(1)

# Generate example scRNA-seq data
counts = np.random.randint(low=0, high=1000, size=(n_cells, n_genes))
counts = np.random.randint(low=1, high=10, size=(n_cells, n_genes))
gene_names = ["gene"] * n_genes
for i in range(len(gene_names)):
gene_names[i] = gene_names[i] + str(i)
df = pd.DataFrame(counts, columns=gene_names)

# Generate example celltype labels
celltypes = ["celltype"] * np.random.randint(n_types)
celltypes = ["celltype"] * n_types
for i in range(len(celltypes)):
celltypes[i] = celltypes[i] + str(i)
celltype_list = random.choices(celltypes, k=n_cells)
celltype_list = np.random.choice(celltypes, size=n_cells)
ct_df = pd.DataFrame(celltype_list, columns=["Celltype"])

# Generate example bulk RNA-seq data
Expand Down
12 changes: 7 additions & 5 deletions scaden/model/scaden.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
from rich.progress import Progress, BarColumn

logger = logging.getLogger(__name__)
tf.get_logger().setLevel('ERROR')
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"


class Scaden(object):
"""
Expand Down Expand Up @@ -304,7 +303,9 @@ def train(self, input_path, train_datasets):
BarColumn(bar_width=None),
)

training_progress = progress_bar.add_task(self.model_name, total=self.num_steps, step=0, loss=1)
training_progress = progress_bar.add_task(
self.model_name, total=self.num_steps, step=0, loss=1
)
with progress_bar:

for step in range(self.num_steps):
Expand All @@ -319,13 +320,14 @@ def train(self, input_path, train_datasets):

optimizer.apply_gradients(zip(grads, self.model.trainable_weights))

progress_bar.update(training_progress, advance=1, step=step, loss=f"{loss:.4f}")
progress_bar.update(
training_progress, advance=1, step=step, loss=f"{loss:.4f}"
)

# Collect garbage after 100 steps - otherwise runs out of memory
if step % 100 == 0:
gc.collect()


# Save the trained model
self.model.save(self.model_dir)
pd.DataFrame(self.labels).to_csv(
Expand Down
14 changes: 4 additions & 10 deletions scaden/predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ def prediction(model_dir, data_path, out_name, seed=0):
do_rates=M256_DO_RATES,
)
# Predict ratios
preds_256 = cdn256.predict(
input_path=data_path
)
preds_256 = cdn256.predict(input_path=data_path)

# Mid model predictions
cdn512 = Scaden(
Expand All @@ -64,22 +62,18 @@ def prediction(model_dir, data_path, out_name, seed=0):
do_rates=M512_DO_RATES,
)
# Predict ratios
preds_512 = cdn512.predict(
input_path=data_path
)
preds_512 = cdn512.predict(input_path=data_path)

# Large model predictions
cdn1024 = Scaden(
model_dir=model_dir + "/m1024",
model_name="m1024",
seed=seed,
hidden_units=M1024_HIDDEN_UNITS,
do_rates=M256_DO_RATES,
do_rates=M1024_DO_RATES,
)
# Predict ratios
preds_1024 = cdn1024.predict(
input_path=data_path
)
preds_1024 = cdn1024.predict(input_path=data_path)

# Average predictions
preds = (preds_256 + preds_512 + preds_1024) / 3
Expand Down
Loading

0 comments on commit 03bb0a6

Please sign in to comment.