You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm running simultate on 12 datasets and generating 250 pseudobulk samples, and I find it to be surprisingly slow. It takes on the order of 10 hours (50 GB memory). Is this to be expected? I was looking into the code, and found this section:
yeah it can be rather slow, although this seems to be an extreme case. I would not expect to take so long, honestly ... 🤔 Not quite sure what the issue is. Are those datasets large?
Good point, a running some would work here, too. I'm not so sure whether this will speed up training that much though ... but maybe a little bit.
The data simulation can be made to use multiple cores "relatively easy", I have that in my backlog for some time now, and wanted to add this feature for the next release. This should speed up simulation significantly. Just haven't found the time to do it yet ... maybe this weekend.
If you want to make a PR to implement the running sum code, that would be highly appreciated :)
Hi! I'm running simultate on 12 datasets and generating 250 pseudobulk samples, and I find it to be surprisingly slow. It takes on the order of 10 hours (50 GB memory). Is this to be expected? I was looking into the code, and found this section:
Could you not just keep a running sum instead of appending, concatenating and summing?
Cheers,
Rasmus
The text was updated successfully, but these errors were encountered: