Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sylph unable to profile low coverage SAGs #40

Open
lingrongjin opened this issue Jan 2, 2025 · 1 comment
Open

Sylph unable to profile low coverage SAGs #40

lingrongjin opened this issue Jan 2, 2025 · 1 comment

Comments

@lingrongjin
Copy link

I'm trying to use sylph to profile single cell amplified genomes (SAGs); however, I found that many of my SAG sample files do not pass the profiling threshold. I tried profiling over pre-built gtdb database and sylsp database built from MAGs and SAGs assembled from the same samples, but only 2700 out of ~17000 SAGs with >10000 clean reads can be profiled by sylph. I'm wondering what can be the causes of the low classification rate - from my understanding, most SAGs represent single-species bacterial genomes, and with around ~10000 reads, sylph should be able to classify them if they are represented in the database?

@bluenote-1577
Copy link
Owner

@lingrongjin there are a few things that come to mind

  • I've never profiled with single cell sequencing, but the coverage distribution is -- from my understanding -- very skewed compared to metagenomics. sylph assumes a metagenomics-like read coverage distribution across the genome
  • are you sure the SAGs have a species-level representative in GTDB? sylph can only do species-level profiling well, so if your SAG is a new species, sylph won't work.
  • you can try doing -m 85; this will check if there are genomes in the database present with > 85% ANI to your SAG (very approximate/rough)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants