feat(starrocks): add partition by range and unique key #4509

pickfire · 2024-12-12T07:06:32Z

No description provided.

georgesittas

Thanks for the PR @pickfire, please share documentation about the added features.

sqlglot/dialects/starrocks.py

georgesittas · 2024-12-16T10:56:37Z

sqlglot/parser.py

-    def _parse_interval(self, match_interval: bool = True) -> t.Optional[exp.Add | exp.Interval]:
+    def _parse_interval(
+        self, match_interval: bool = True, keep_number: bool = False
+    ) -> t.Optional[exp.Add | exp.Interval]:


Let's avoid adding the new keep_number kwarg here. I tested INTERVAL '1' DAY against starrocks and even though it accepts it in expression like DATE_ADD, it doesn't accept it in the DDLs relevant to this PR. However, we can still output a number at generation time, so that would be preferable to adding complexity in the parser.

INTERVAL '1' DAY gets an error in starrocks, that's why I did it this way, so what you mean is during output we tweak it to number?

sqlglot/parser.py

georgesittas · 2024-12-16T11:11:05Z

sqlglot/parser.py

+    def _parse_unique_property(self) -> exp.UniqueKeyProperty:
+        self._match_text_seq("KEY")
+        expressions = self._parse_wrapped_id_vars()
+        return self.expression(exp.UniqueKeyProperty, expressions=expressions)


It could be worth consolidating this and _parse_duplicate into a single definition that gets a Type[Expression] and instantiates it according to this (common) logic.

I don't get what do you want me to do here.

Something like:

def _parse_composite_key_property(self, expr_type: t.Type[E]) -> E: self._match_text_seq("KEY") expressions = self._parse_wrapped_id_vars() return self.expression(expr_type, expressions=expressions

So we can use it for both _parse_unique_property and _parse_duplicate in the base parser.

georgesittas · 2024-12-16T11:15:02Z

sqlglot/dialects/starrocks.py

+            if self._match_text_seq("RANGE"):
+                partition_expressions = self._parse_wrapped_id_vars()
+                create_expressions = self._parse_wrapped_csv(
+                    self._parse_partitioning_granularity_dynamic
+                )
+                return self.expression(
+                    exp.PartitionByRangeProperty,
+                    partition_expressions=partition_expressions,
+                    create_expressions=create_expressions,
+                )
+            return super()._parse_partitioned_by()


Just double-checking: there's also this in starrocks' docs:

PARTITION <partition1_name> VALUES LESS THAN ("<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] )

From what I understand, your changes only add support for the (START...END...EVERY...) syntax.

What happens today if the user tries to parse the alternative syntax?

Same question as (1), but given the changes in your PR?

I wanna ensure there are no regressions.

I did not support this as I don't have the need to use it, other people that need it can add it later, later they can just add another expression for this, since it is totally different.

Yup, it only supports the dynamic partitioning range method (START...END...EVERY...) syntax like you mentioned.

This makes sense, but there's some nuance here. Some unsupported DDLs are still "parsed" today, meaning that we produce a Command node and store the input as raw text in it. The motivation behind this is to avoid crashing in cases where transpilation / parsing is not required for certain statements.

Since you introduced new logic for this property, I wanted to ensure that we don't break these Command statements accidentally by entering this if self._match_text_seq("RANGE") branch.

Basically let's test what happened before vs after. If the VALUES LESS THAN clause was not supported before, it's fine. If it was parsed into a Command, we need to make sure that if the parser fails, we'll fallback to a command as well.

Does this make sense?

I think I will ignore this for now, don't want to spend too much time on this.

sqlglot/dialects/starrocks.py

georgesittas · 2024-12-18T11:09:07Z

Thanks for the PR.

georgesittas reviewed Dec 12, 2024

View reviewed changes

sqlglot/dialects/starrocks.py Outdated Show resolved Hide resolved

pickfire force-pushed the starrocks-partition-by-range branch from 1f1a0a3 to 7c35b32 Compare December 12, 2024 09:13

georgesittas reviewed Dec 16, 2024

View reviewed changes

pickfire added 4 commits December 17, 2024 10:48

feat(starrocks): add partition by range and unique key

6091ae3

feat(starrocks): add largeint

1075a02

fix(starrocks): remove PARTITION BY RANGE as token

4496619

refactor(starrocks): move partitionby to base generator

30621b1

pickfire force-pushed the starrocks-partition-by-range branch from 6f93937 to 30621b1 Compare December 17, 2024 02:48

refactor(parser): add composite key property

36431fc

georgesittas merged commit ee7dc96 into tobymao:main Dec 18, 2024
0 of 7 checks passed

pickfire deleted the starrocks-partition-by-range branch December 19, 2024 02:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(starrocks): add partition by range and unique key #4509

feat(starrocks): add partition by range and unique key #4509

pickfire commented Dec 12, 2024

georgesittas left a comment

georgesittas Dec 16, 2024

pickfire Dec 18, 2024

georgesittas Dec 18, 2024

georgesittas Dec 16, 2024

pickfire Dec 17, 2024

georgesittas Dec 17, 2024

pickfire Dec 18, 2024

georgesittas Dec 18, 2024

georgesittas Dec 16, 2024

pickfire Dec 17, 2024

georgesittas Dec 17, 2024

pickfire Dec 18, 2024

georgesittas commented Dec 18, 2024

feat(starrocks): add partition by range and unique key #4509

feat(starrocks): add partition by range and unique key #4509

Conversation

pickfire commented Dec 12, 2024

georgesittas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgesittas commented Dec 18, 2024