Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(starrocks): add partition by range and unique key #4509

Merged
merged 5 commits into from
Dec 18, 2024

Conversation

pickfire
Copy link
Contributor

No description provided.

Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @pickfire, please share documentation about the added features.

sqlglot/dialects/starrocks.py Outdated Show resolved Hide resolved
@pickfire pickfire force-pushed the starrocks-partition-by-range branch from 1f1a0a3 to 7c35b32 Compare December 12, 2024 09:13
sqlglot/dialects/starrocks.py Show resolved Hide resolved
Comment on lines -4587 to +4599
def _parse_interval(self, match_interval: bool = True) -> t.Optional[exp.Add | exp.Interval]:
def _parse_interval(
self, match_interval: bool = True, keep_number: bool = False
) -> t.Optional[exp.Add | exp.Interval]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid adding the new keep_number kwarg here. I tested INTERVAL '1' DAY against starrocks and even though it accepts it in expression like DATE_ADD, it doesn't accept it in the DDLs relevant to this PR. However, we can still output a number at generation time, so that would be preferable to adding complexity in the parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INTERVAL '1' DAY gets an error in starrocks, that's why I did it this way, so what you mean is during output we tweak it to number?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

sqlglot/parser.py Outdated Show resolved Hide resolved
Comment on lines 2211 to 2219
def _parse_unique_property(self) -> exp.UniqueKeyProperty:
self._match_text_seq("KEY")
expressions = self._parse_wrapped_id_vars()
return self.expression(exp.UniqueKeyProperty, expressions=expressions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be worth consolidating this and _parse_duplicate into a single definition that gets a Type[Expression] and instantiates it according to this (common) logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what do you want me to do here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

def _parse_composite_key_property(self, expr_type: t.Type[E]) -> E:
    self._match_text_seq("KEY")
    expressions = self._parse_wrapped_id_vars()
    return self.expression(expr_type, expressions=expressions

So we can use it for both _parse_unique_property and _parse_duplicate in the base parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Comment on lines +99 to +109
if self._match_text_seq("RANGE"):
partition_expressions = self._parse_wrapped_id_vars()
create_expressions = self._parse_wrapped_csv(
self._parse_partitioning_granularity_dynamic
)
return self.expression(
exp.PartitionByRangeProperty,
partition_expressions=partition_expressions,
create_expressions=create_expressions,
)
return super()._parse_partitioned_by()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking: there's also this in starrocks' docs:

PARTITION <partition1_name> VALUES LESS THAN ("<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] )

From what I understand, your changes only add support for the (START...END...EVERY...) syntax.

  1. What happens today if the user tries to parse the alternative syntax?
  2. Same question as (1), but given the changes in your PR?

I wanna ensure there are no regressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not support this as I don't have the need to use it, other people that need it can add it later, later they can just add another expression for this, since it is totally different.

Yup, it only supports the dynamic partitioning range method (START...END...EVERY...) syntax like you mentioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but there's some nuance here. Some unsupported DDLs are still "parsed" today, meaning that we produce a Command node and store the input as raw text in it. The motivation behind this is to avoid crashing in cases where transpilation / parsing is not required for certain statements.

Since you introduced new logic for this property, I wanted to ensure that we don't break these Command statements accidentally by entering this if self._match_text_seq("RANGE") branch.

Basically let's test what happened before vs after. If the VALUES LESS THAN clause was not supported before, it's fine. If it was parsed into a Command, we need to make sure that if the parser fails, we'll fallback to a command as well.

Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will ignore this for now, don't want to spend too much time on this.

sqlglot/dialects/starrocks.py Outdated Show resolved Hide resolved
@pickfire pickfire force-pushed the starrocks-partition-by-range branch from 6f93937 to 30621b1 Compare December 17, 2024 02:48
@georgesittas georgesittas merged commit ee7dc96 into tobymao:main Dec 18, 2024
0 of 7 checks passed
@georgesittas
Copy link
Collaborator

Thanks for the PR.

@pickfire pickfire deleted the starrocks-partition-by-range branch December 19, 2024 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants