Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily disable Pandera validation? #1873

Open
christiansegercrantz opened this issue Dec 9, 2024 · 1 comment
Open

Temporarily disable Pandera validation? #1873

christiansegercrantz opened this issue Dec 9, 2024 · 1 comment
Labels
question Further information is requested

Comments

@christiansegercrantz
Copy link

Is it possible to temporarily disable Panderas validation for something like a test?

Hi there, I've been enjoying using Pandera a lot and it's great in production and when working with larger complex codes. However, I am noticing I have a problem when testing.

My scenario is this: I have a large function that needs to take in a large validated data frame, with say ~15 columns. My function then calls sub-functions that need approx ~1-5 of the columns of the data frame. When I want to test these sub-functions, I would like to not have to create my whole validated data frame in the tests. Thus, I'd like to be able to turn off validation, during runtime temporarily. Is this possible? If not, it would be great it there was some kind of method like with pandera.no_validation():

I know about using PANDERA_VALIDATION_ENABLED=False but the problem is that this is then global for all functions. I would like to at runtime turn it off temporarily and not globally at import. Below an example of what the problem is

import pandas as pd
import pandera as pa
from pandera import Column, DataFrameSchema


class ValidatedDf:
    df: pd.DataFrame

    _schema = DataFrameSchema(
        coerce=True,
        columns={
            "col1": Column(pa.String, unique=True),
            "col2": Column(pa.Int, pa.Check.greater_than(0)),
            "col3": Column(pa.Float, nullable=True),
        },
    )

    def __init__(self, df: pd.DataFrame):
        object.__setattr__(self, "df", self.__class__._schema(df, lazy=True))


def sub_function(validated_df: ValidatedDf):
    return validated_df.df["col2"].sum()


def big_function(validated_df: ValidatedDf):
    important_number = sub_function(validated_df)


def test_sub_function():
    df = pd.DataFrame(
        {
            "col1": ["a", "b", "c"],
            "col2": [1, 2, 3],
        }
    )

    # This will error
    validated_df = ValidatedDf(df)
    # And we can't actually check the real stuff
    assert sub_function(validated_df) == 6
    
test_sub_function()

It would be great if the end could work like:

...

def test_sub_function():
    df = pd.DataFrame(
        {
            "col1": ["a", "b", "c"],
            "col2": [1, 2, 3],
        }
    )

    with pa.no_validation():
       validated_df = ValidatedDf(df)
       assert sub_function(validated_df) == 6
    
test_sub_function()

If something like this exists, please point me in the right direction!

@christiansegercrantz christiansegercrantz added the question Further information is requested label Dec 9, 2024
@cosmicBboy
Copy link
Collaborator

Hi @christiansegercrantz, you can use pandera.config.config_context to temporarily set the config settings within the with block: https://pandera.readthedocs.io/en/latest/reference/generated/pandera.config.config_context.html#pandera-config-config-context. See here for example usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants