check_types argument inplace ignored #1851

lukepeck · 2024-11-11T17:04:40Z

Describe the bug
Possible edge case concerning the inplace argument for the pa.check_types decorator where _check_arg only executes the
schema's validate method if the argument does not have a 'pandera' attribute, or if the argument.schema is None or != schema model determined earlier.

This will produce an edge case where if the argument is valid versus the corresponding schema, it will not execute validate. Thus when pa.check_types(inplace=False) is called the expected behaviour of copying the input data does not happen leading to side effects that persist in outer scopes (see example below).

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandera.
(optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandera as pa
from pandera.typing import DataFrame


class ExampleSchema(pa.DataFrameModel):
    column1: int


class MyClass:

    @pa.check_types(inplace=False)   # should copy input argument/ dataframes
    def my_method(self, input_dataframe: DataFrame[ExampleSchema]) -> None:
        # inplace = False should copy input_dataframe meaning any modifications here
        # should not persist in the outer scope.
        input_dataframe["column2"] = 0.0
        return


if __name__ == "__main__":
    c = MyClass()
    example_df = DataFrame[ExampleSchema]({"column1": [1]})
    print(example_df.head())  # only column1 exists
    c.my_method(example_df)
    print(example_df.head())  # column2 exists as a side effect of my_method

Expected behavior

After calling my_method in the above example with pa.check_types(inplace=False), I expected a copy of input_dataframe to be made and thus any operations to not persist to the example_df in the __main__ scope (i.e. column2 should not be present).

Desktop (please complete the following information):

OS: Ubuntu 20.04.6 LTS
Version: 0.20.4

Output

   column1
0        1
   column1  column2
0        1      0.0

Fix?

For the above example adding the below condition to _check_arg here, seems to work (but haven't done any wider checks):

                if (
                    not hasattr(arg_value, "pandera")
                    or arg_value.pandera.schema is None
                    # don't re-validate a dataframe that contains the same
                    # exact schema
                    or arg_value.pandera.schema != schema
                    or inplace is False    # This is new
                ):
                    try:
                        arg_value = schema.validate(
                            arg_value,
                            head,
                            tail,
                            sample,
                            random_state,
                            lazy,
                            inplace,
                        )

The text was updated successfully, but these errors were encountered:

lukepeck added the bug Something isn't working label Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check_types argument inplace ignored #1851

check_types argument inplace ignored #1851

lukepeck commented Nov 11, 2024

check_types argument inplace ignored #1851

check_types argument inplace ignored #1851

Comments

lukepeck commented Nov 11, 2024

Code Sample, a copy-pastable example

Expected behavior

Desktop (please complete the following information):

Output

Fix?