Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do the various methods of filtering low-expressed genes now implemented affect the results? #59

Open
NathanSkene opened this issue Mar 10, 2022 · 4 comments
Assignees
Labels
benchmarking EWCE benchmarking analyses

Comments

@NathanSkene
Copy link
Owner

No description provided.

@bschilder
Copy link
Collaborator

So i don't think EWCE filters lowly expressed genes currently, only genes that are totally non-expressed in any cell-type.

drop_uninformative_genes performs differential gene expression across cell-types to determine which genes don't vary sufficiently across cell-types to be informative (and drops them).

In addition, we could add an argument to drop_uninformative_genes that removes genes that are lowly expressed in all cell-types (not just non-expressed). Though since this is at the aggregate level of cell-types, it might make more sense to implement this within the generate_celltype_data function instead.

@bschilder
Copy link
Collaborator

Alternatively, we could just come up with a combined metric for specificity + mean_exp. This way we wouldn't have to drop any genes, which seems to be the number 1 factor affecting cell-type enrichment when using different CTDs (at least across species).

@bschilder
Copy link
Collaborator

Alternatively alternatively, when creating the specificity matrix, for each celltype we could identify genes that have low expression and artificially set specificity to 0 in that celltype. Then when we compute specificity quantiles, the gene is retained in the vector but it is not used as a marker for that celltype.

@bschilder
Copy link
Collaborator

bschilder commented Jun 15, 2022

Just to recap for @ss8518, @NathanSkene and I just discussed this and decided that while the strategies I described would be worth exploring in the future, it's not a priority at the moment.

Instead, we'd like you to focus on testing the effect of running drop_uninformative_genes in different conditions:

  • dge_method=NULL: Don't run any differential gene expression.
  • dge_method="limma", adj_pval_thresh=<values>: Run DGE with limma across a variety of adjusted p-value thresholds.
  • dge_method="deseq2": Run DGE with DESeq2.
  • dge_method="MAST": Run DGE with MAST.

I've just exposed the dge_method argument to users in the dev version of EWCE. You can install it with:

remotes::install_github("NathanSkene/EWCE")

@bschilder bschilder added the benchmarking EWCE benchmarking analyses label May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking EWCE benchmarking analyses
Projects
None yet
Development

No branches or pull requests

2 participants