Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulate slow #90

Open
khkk378 opened this issue May 20, 2021 · 2 comments
Open

Simulate slow #90

khkk378 opened this issue May 20, 2021 · 2 comments

Comments

@khkk378
Copy link

khkk378 commented May 20, 2021

Hi! I'm running simultate on 12 datasets and generating 250 pseudobulk samples, and I find it to be surprisingly slow. It takes on the order of 10 hours (50 GB memory). Is this to be expected? I was looking into the code, and found this section:

artificial_samples = []
    for i in range(no_avail_cts):
        ct = available_celltypes[i]
        cells_sub = x.loc[np.array(y["Celltype"] == ct), :]
        cells_fraction = np.random.randint(0, cells_sub.shape[0], samp_fracs[i])
        cells_sub = cells_sub.iloc[cells_fraction, :]
        artificial_samples.append(cells_sub)

    df_samp = pd.concat(artificial_samples, axis=0)
    df_samp = df_samp.sum(axis=0)

Could you not just keep a running sum instead of appending, concatenating and summing?

Cheers,
Rasmus

@KevinMenden
Copy link
Owner

Hi,

yeah it can be rather slow, although this seems to be an extreme case. I would not expect to take so long, honestly ... 🤔 Not quite sure what the issue is. Are those datasets large?

Good point, a running some would work here, too. I'm not so sure whether this will speed up training that much though ... but maybe a little bit.

The data simulation can be made to use multiple cores "relatively easy", I have that in my backlog for some time now, and wanted to add this feature for the next release. This should speed up simulation significantly. Just haven't found the time to do it yet ... maybe this weekend.

If you want to make a PR to implement the running sum code, that would be highly appreciated :)

@KevinMenden
Copy link
Owner

#82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants