-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling GroupedDataFrames (rand) #3437
Comments
We could add it. @nalimilan, what do you think about adding:
? |
Duplicate of #2097. It would make sense to define For |
Ah - good catch. So - now I responded positively as
And the key word is If we also added I am not sure this is useful, but this could work. This is different from The question is if users would find it intuitive and useful? |
Thanks for all the answers! Sorry about the missed duplicate issue.
AFAIK the only interface that As for usefulness, in my case, I was looking to sample groups of data (hence the groupby), and it did feel jarring that I couldn't just sample the N = 100
tdf = transform(df, [:x1, :x2] => ByRow(string))
keys = unique(tdf[!, :x1_x2_string])
subset(tdf, :x1_x2_string => ByRow(in(rand(keys, N)))) # DataFrame, have to drop :x1_x2_string VS N = 100
gdf = groupby(df, [:x1, :x2])
rand(gdf, N) # Array of GroupedDataFrame? GroupedDataFrame? I don't think it's intuitive for P.S.: I did not go into the implementation of |
This is the same reason why Adding |
Hello,
Currently, we cannot sample from a GroupedDataFrame directly.
Stacktrace
One way to circumvent that MethodError is to sample from the idx
Code: #3102
What would be needed to implement this interface? Or, is it undesirable to do so?
versioninfo and package version
EDIT: reproducible on v1.7.0 (main)
The text was updated successfully, but these errors were encountered: