behavior of on=[] #91

gfudenberg · 2021-08-26T20:49:42Z

How to infer the space of all possible values of columns passed into on=[] argument. e.g. this arises for implementing complement(..., on=['strand']), which is relied on in subtract.

The simplest solution for inferring all possibilities is by looking at all unique values in these columns. This creates questions:

we need to know the space of all possibilities, even for combinations of ['chrom']+on that are not represented in any interval of the input dataframe. Thus we need a way to specify this space.
we need to specify the behavior for pd.NA values in columns passed to on.

Potential solutions:
For (1):

require formatting the column as a categorical with all desired possibilities before passing to bioframe functions (as they call groupby). We could provide a utility function to parse/cast strand column as a categorical.
develop a new input format, e.g. pass a dictionary: on={‘strand’: (‘-‘, ‘+’, pd.NA)}

For (2), three options for how to deal with missing values in columns passed to on. We could allow the user to select one of these with a flag.

drop any intervals with pd.NA in the on column from the operation
add any intervals with pd.NA to each group.
treat pd.NA as a separate category for groupby

The text was updated successfully, but these errors were encountered:

agalitsyna · 2022-11-08T03:40:53Z

For strand column, pd.NA here should actually '.' according to the bioframe specs: https://bioframe.readthedocs.io/en/latest/guide-specifications.html
This does not change the logic for some unknown columns, though

nvictus · 2023-04-03T21:46:30Z

For the behavior in (2), I'd try to align as close as possible to the native behavior of applying df.groupby() to a categorical column where some instances of an allowed categorical value are missing.

gfudenberg added the enhancement label Aug 26, 2021

gfudenberg mentioned this issue Aug 26, 2021

0.5.0 roadmap #92

Closed

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

behavior of on=[] #91

behavior of on=[] #91

gfudenberg commented Aug 26, 2021

agalitsyna commented Nov 8, 2022

nvictus commented Apr 3, 2023

behavior of on=[] #91

behavior of on=[] #91

Comments

gfudenberg commented Aug 26, 2021

agalitsyna commented Nov 8, 2022

nvictus commented Apr 3, 2023