You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Possible edge case concerning the inplace argument for the pa.check_types decorator where _check_arg only executes the
schema's validate method if the argument does not have a 'pandera' attribute, or if the argument.schema is None or != schema model determined earlier.
This will produce an edge case where if the argument is valid versus the corresponding schema, it will not execute validate. Thus when pa.check_types(inplace=False) is called the expected behaviour of copying the input data does not happen leading to side effects that persist in outer scopes (see example below).
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
importpanderaaspafrompandera.typingimportDataFrameclassExampleSchema(pa.DataFrameModel):
column1: intclassMyClass:
@pa.check_types(inplace=False) # should copy input argument/ dataframesdefmy_method(self, input_dataframe: DataFrame[ExampleSchema]) ->None:
# inplace = False should copy input_dataframe meaning any modifications here# should not persist in the outer scope.input_dataframe["column2"] =0.0returnif__name__=="__main__":
c=MyClass()
example_df=DataFrame[ExampleSchema]({"column1": [1]})
print(example_df.head()) # only column1 existsc.my_method(example_df)
print(example_df.head()) # column2 exists as a side effect of my_method
Expected behavior
After calling my_method in the above example with pa.check_types(inplace=False), I expected a copy of input_dataframe to be made and thus any operations to not persist to the example_df in the __main__ scope (i.e. column2 should not be present).
Desktop (please complete the following information):
OS: Ubuntu 20.04.6 LTS
Version: 0.20.4
Output
column1
0 1
column1 column2
0 1 0.0
Fix?
For the above example adding the below condition to _check_arghere, seems to work (but haven't done any wider checks):
if (
nothasattr(arg_value, "pandera")
orarg_value.pandera.schemaisNone# don't re-validate a dataframe that contains the same# exact schemaorarg_value.pandera.schema!=schemaorinplaceisFalse# This is new
):
try:
arg_value=schema.validate(
arg_value,
head,
tail,
sample,
random_state,
lazy,
inplace,
)
The text was updated successfully, but these errors were encountered:
Describe the bug
Possible edge case concerning the
inplace
argument for thepa.check_types
decorator where_check_arg
only executes theschema's validate method if the argument does not have a 'pandera' attribute, or if the argument.schema is None or != schema model determined earlier.
This will produce an edge case where if the argument is valid versus the corresponding schema, it will not execute validate. Thus when
pa.check_types(inplace=False)
is called the expected behaviour of copying the input data does not happen leading to side effects that persist in outer scopes (see example below).Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
After calling
my_method
in the above example withpa.check_types(inplace=False)
, I expected a copy of input_dataframe to be made and thus any operations to not persist to the example_df in the__main__
scope (i.e. column2 should not be present).Desktop (please complete the following information):
Output
Fix?
For the above example adding the below condition to
_check_arg
here, seems to work (but haven't done any wider checks):The text was updated successfully, but these errors were encountered: