Simulate slow #90

khkk378 · 2021-05-20T22:13:19Z

Hi! I'm running simultate on 12 datasets and generating 250 pseudobulk samples, and I find it to be surprisingly slow. It takes on the order of 10 hours (50 GB memory). Is this to be expected? I was looking into the code, and found this section:

artificial_samples = []
    for i in range(no_avail_cts):
        ct = available_celltypes[i]
        cells_sub = x.loc[np.array(y["Celltype"] == ct), :]
        cells_fraction = np.random.randint(0, cells_sub.shape[0], samp_fracs[i])
        cells_sub = cells_sub.iloc[cells_fraction, :]
        artificial_samples.append(cells_sub)

    df_samp = pd.concat(artificial_samples, axis=0)
    df_samp = df_samp.sum(axis=0)

Could you not just keep a running sum instead of appending, concatenating and summing?

Cheers,
Rasmus

The text was updated successfully, but these errors were encountered:

KevinMenden · 2021-05-21T06:18:04Z

Hi,

yeah it can be rather slow, although this seems to be an extreme case. I would not expect to take so long, honestly ... 🤔 Not quite sure what the issue is. Are those datasets large?

Good point, a running some would work here, too. I'm not so sure whether this will speed up training that much though ... but maybe a little bit.

The data simulation can be made to use multiple cores "relatively easy", I have that in my backlog for some time now, and wanted to add this feature for the next release. This should speed up simulation significantly. Just haven't found the time to do it yet ... maybe this weekend.

If you want to make a PR to implement the running sum code, that would be highly appreciated :)

KevinMenden · 2021-05-21T06:21:39Z

#82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulate slow #90

Simulate slow #90

khkk378 commented May 20, 2021

KevinMenden commented May 21, 2021

KevinMenden commented May 21, 2021

Simulate slow #90

Simulate slow #90

Comments

khkk378 commented May 20, 2021

KevinMenden commented May 21, 2021

KevinMenden commented May 21, 2021