PySpark Dataframe, how to build DataFrameModel for nested objects #1877

Nivyaal-zenity · 2024-12-18T13:11:55Z

Location of the documentation

https://pandera.readthedocs.io/en/latest/pyspark_sql.html

Documentation problem

I have schema with nested objects and i cant find if it is supported by pandera or not, and if it is how to implemnt it

for example: this is my schema:

    data = [
        ({"displayName": "John Doe", "id": "buyer_1"},5, "Bread", Decimal(44.4), ["description of product"], {"product_category": "dairy"}),
        ({"displayName": "Jane Smith", "id": "buyer_2"},15, "Butter", Decimal(99.0), ["more details here"], {"product_category": "bakery"}),
    ]

    spark_schema = StructType(
        [
            StructField("buyer", StructType(
                [
                    StructField("id", StringType(), True),
                    StructField("displayName", StringType(), True)

                ]
            ), False),
            StructField("id", IntegerType(), False),
            StructField("product_name", StringType(), False),
            StructField("price", DecimalType(), False),
            StructField("description", ArrayType(StringType(), False), False),
            StructField(
                "meta", MapType(StringType(), StringType(), False), False
            ),
        ],
    )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PySpark Dataframe, how to build DataFrameModel for nested objects #1877

PySpark Dataframe, how to build DataFrameModel for nested objects #1877

Nivyaal-zenity commented Dec 18, 2024

PySpark Dataframe, how to build DataFrameModel for nested objects #1877

PySpark Dataframe, how to build DataFrameModel for nested objects #1877

Comments

Nivyaal-zenity commented Dec 18, 2024

Location of the documentation

Documentation problem

Suggested fix for documentation