You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use sylph to profile single cell amplified genomes (SAGs); however, I found that many of my SAG sample files do not pass the profiling threshold. I tried profiling over pre-built gtdb database and sylsp database built from MAGs and SAGs assembled from the same samples, but only 2700 out of ~17000 SAGs with >10000 clean reads can be profiled by sylph. I'm wondering what can be the causes of the low classification rate - from my understanding, most SAGs represent single-species bacterial genomes, and with around ~10000 reads, sylph should be able to classify them if they are represented in the database?
The text was updated successfully, but these errors were encountered:
@lingrongjin there are a few things that come to mind
I've never profiled with single cell sequencing, but the coverage distribution is -- from my understanding -- very skewed compared to metagenomics. sylph assumes a metagenomics-like read coverage distribution across the genome
are you sure the SAGs have a species-level representative in GTDB? sylph can only do species-level profiling well, so if your SAG is a new species, sylph won't work.
you can try doing -m 85; this will check if there are genomes in the database present with > 85% ANI to your SAG (very approximate/rough)
I'm trying to use sylph to profile single cell amplified genomes (SAGs); however, I found that many of my SAG sample files do not pass the profiling threshold. I tried profiling over pre-built gtdb database and sylsp database built from MAGs and SAGs assembled from the same samples, but only 2700 out of ~17000 SAGs with >10000 clean reads can be profiled by sylph. I'm wondering what can be the causes of the low classification rate - from my understanding, most SAGs represent single-species bacterial genomes, and with around ~10000 reads, sylph should be able to classify them if they are represented in the database?
The text was updated successfully, but these errors were encountered: