[SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method #49320

zhengruifeng · 2024-12-27T13:35:30Z

What changes were proposed in this pull request?

Optimize the DDL parse method in Python

Why are the changes needed?

to reduce the Py4J calls

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

Was this patch authored or co-authored using generative AI tooling?

No

zhengruifeng · 2024-12-30T01:47:04Z

thanks, merged to master

tedyu · 2024-12-30T02:04:35Z

sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala

+        } catch {
+          case _: Throwable =>
+            try {
+              // For backwards compatibility, "fieldname: datatype, fieldname: datatype" case.


should we check for fieldname: datatype at the beginning so that the number of exceptions is reduced ?
Using exception is more expensive than checking.

val dataType = try { if (ddl indexof "struct<" >= 0) return parseDataType(ddl) if (ddl indexof ":" < 0 && ddl indexof ' ' < 0) return parseDataType(ddl)

fix

3281cb7

zhengruifeng requested a review from HyukjinKwon December 27, 2024 13:35

github-actions bot added SQL PYTHON labels Dec 27, 2024

HyukjinKwon approved these changes Dec 27, 2024

View reviewed changes

ueshin approved these changes Dec 28, 2024

View reviewed changes

zhengruifeng changed the title ~~[WIP][PYTHON] Optimize Py4J call for DDL parse method~~ [SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method Dec 30, 2024

zhengruifeng closed this in 73a2ebb Dec 30, 2024

zhengruifeng deleted the py_opt_ddl branch December 30, 2024 01:47

tedyu reviewed Dec 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method #49320

[SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method #49320

zhengruifeng commented Dec 27, 2024

zhengruifeng commented Dec 30, 2024

tedyu Dec 30, 2024 •

edited

Loading

[SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method #49320

[SPARK-50696][PYTHON] Optimize Py4J call for DDL parse method #49320

Conversation

zhengruifeng commented Dec 27, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

zhengruifeng commented Dec 30, 2024

tedyu Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

tedyu Dec 30, 2024 •

edited

Loading