Releases: LudvigOlsen/groupdata2
groupdata 2.0.5
- Fixes bug in
all_groups_identical()
when there were different numbers of groups in the two input vectors.
groupdata 2.0.3
-
Fixes some warnings.
-
Fixes rounding error issue on PowerPC (#10). Thanks @barracuda156.
groupdata2 2.0.2
- Makes use of suggested packages conditional.
- Makes testing conditional on the availability of xpectr.
- Fixes tidyselect-related warnings.
- Removes hydroGOF from suggested packages.
groupdata2 2.0.1
- Regenerates documentation.
groupdata 2.0.0
Summary
This version introduces collapse_groups()
and friends, as well as summarize_balances()
and ranked_balances()
. It also improves numerical balancing in fold()
which breaks reproducibility.
Changes
-
Breaking: The numerical balancing (
num_col
) infold()
gets multiple improvements. This breaks reproducibility in some contexts.-
Fixes bug with selection of groups to redistribute when
extreme_pairing_levels > 1
. The groupings were likely to be fine, but the fix should give better groupings on average. -
When possible, it redistributes the smallest and/or largest group if they are 1 standard deviation from the second smallest/largest group to avoid imbalances due to very small/large scores.
-
Adds use of extreme triplet grouping when too few grouping columns are created with extreme pairing. This can lead to an increase in the number of created fold columns. In some cases, these groupings may be more balanced than with extreme pairing, but on average extreme pairing leads to more balanced groupings. See
rearrr::triplet_extremes()
for more on extreme triplet grouping. -
Adds argument
use_of_triplets
infold()
to allow using extreme triplet grouping instead of extreme pairing or disabling it completely.
-
-
Adds
collapse_groups()
for collapsing a set of existing groups into a smaller set of groups. Can balance the
new groups by size and by numeric, categorical and ID columns. The more of these you balance at a time, the less balanced each will tend to be. Compare settings by summarizing the balances withsummarize_balances()
afterwards. For creating the most balanced groups, enableauto_tune
. -
Adds
collapse_groups_by_size()
,collapse_groups_by_numeric()
,collapse_groups_by_levels()
, andcollapse_groups_by_ids()
. These are wrappers ofcollapse_groups()
for a simplified interface. -
Adds
summarize_balances()
for inspecting the balance of numeric, categorical, and ID columns in-and-between groups. -
Adds
ranked_balances()
for extracting the across-group standard deviations of balances from the output ofsummarize_balances()
. The standard deviations are a measure of how balanced a split is. -
Adds
"every"
method to grouping functions. Groups everyn
data points together. -
Prepares package's tests for
checkmate 2.1.0
.
groupdata2 1.5.0
-
Breaking: Rewrites large parts of the numerical balancing engine used in
fold()
andpartition()
. This produces different groups in some cases. Outsources extreme pairing torearrr::pair_extremes()
. Now uses hierarchical shuffling (rearrr::shuffle_hierarchy()
) inpartition()
and some cases offold()
(relevant whenextreme_pairing_levels
> 1).
If you need reproducibility, the last version prior to this breaking change can be installed withdevtools::install_github("ludvigolsen/[email protected]")
. -
Imports
rearrr
for use in numerical balancing. -
Minor improvements to vignettes.
groupdata2 1.4.2
- Improves documentation for core grouping functions.
groupdata2 1.4.1
-
Adds
summarize_group_cols()
for finding the number of groups per fold column along with statistics about the number of rows per group. -
Breaking: Fixes internal sorting of fold columns. This sometimes changes the order of fold columns, compared to the previous version.
-
Adds
tidyr
as a required dependency. Previously, it was suggested.
groupdata2 1.4.0
-
Breaking: In
fold()
, thek
argument can now be a multi-element vector with onek
(number of folds) per fold column. This functionality required a minor rewrite, why you might see interchanged fold column names in comparison to the previous versions. -
Bug fix: In
fold()
andpartition()
, when specifying multiplecat_col
columns andnum_col
in the same call, it would fail. This now works.
groupdata2 1.3.0
- Breaking: The following functions now work with grouped
data.frames
(meaning that they are applied group-wise):fold()
,partition()
,group()
,group_factor()
,splt()
,balance()
,upsample()
,downsample()
,differs_from_previous()
, andfind_missing_starts()
. A message is generated once per session, when the input is grouped, to help users understand why their code is breaking.