Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySpark Dataframe, how to build DataFrameModel for nested objects #1877

Open
Nivyaal-zenity opened this issue Dec 18, 2024 · 0 comments
Open
Labels

Comments

@Nivyaal-zenity
Copy link

Location of the documentation

https://pandera.readthedocs.io/en/latest/pyspark_sql.html

Documentation problem

I have schema with nested objects and i cant find if it is supported by pandera or not, and if it is how to implemnt it

for example: this is my schema:

    data = [
        ({"displayName": "John Doe", "id": "buyer_1"},5, "Bread", Decimal(44.4), ["description of product"], {"product_category": "dairy"}),
        ({"displayName": "Jane Smith", "id": "buyer_2"},15, "Butter", Decimal(99.0), ["more details here"], {"product_category": "bakery"}),
    ]

    spark_schema = StructType(
        [
            StructField("buyer", StructType(
                [
                    StructField("id", StringType(), True),
                    StructField("displayName", StringType(), True)

                ]
            ), False),
            StructField("id", IntegerType(), False),
            StructField("product_name", StringType(), False),
            StructField("price", DecimalType(), False),
            StructField("description", ArrayType(StringType(), False), False),
            StructField(
                "meta", MapType(StringType(), StringType(), False), False
            ),
        ],
    )

Suggested fix for documentation

[this should explain the suggested fix and why it's better than the existing documentation]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant