-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(starrocks): add partition by range and unique key #4509
feat(starrocks): add partition by range and unique key #4509
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @pickfire, please share documentation about the added features.
1f1a0a3
to
7c35b32
Compare
def _parse_interval(self, match_interval: bool = True) -> t.Optional[exp.Add | exp.Interval]: | ||
def _parse_interval( | ||
self, match_interval: bool = True, keep_number: bool = False | ||
) -> t.Optional[exp.Add | exp.Interval]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid adding the new keep_number
kwarg here. I tested INTERVAL '1' DAY
against starrocks and even though it accepts it in expression like DATE_ADD
, it doesn't accept it in the DDLs relevant to this PR. However, we can still output a number at generation time, so that would be preferable to adding complexity in the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INTERVAL '1' DAY
gets an error in starrocks, that's why I did it this way, so what you mean is during output we tweak it to number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
sqlglot/parser.py
Outdated
def _parse_unique_property(self) -> exp.UniqueKeyProperty: | ||
self._match_text_seq("KEY") | ||
expressions = self._parse_wrapped_id_vars() | ||
return self.expression(exp.UniqueKeyProperty, expressions=expressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be worth consolidating this and _parse_duplicate
into a single definition that gets a Type[Expression]
and instantiates it according to this (common) logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what do you want me to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
def _parse_composite_key_property(self, expr_type: t.Type[E]) -> E:
self._match_text_seq("KEY")
expressions = self._parse_wrapped_id_vars()
return self.expression(expr_type, expressions=expressions
So we can use it for both _parse_unique_property
and _parse_duplicate
in the base parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
if self._match_text_seq("RANGE"): | ||
partition_expressions = self._parse_wrapped_id_vars() | ||
create_expressions = self._parse_wrapped_csv( | ||
self._parse_partitioning_granularity_dynamic | ||
) | ||
return self.expression( | ||
exp.PartitionByRangeProperty, | ||
partition_expressions=partition_expressions, | ||
create_expressions=create_expressions, | ||
) | ||
return super()._parse_partitioned_by() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double-checking: there's also this in starrocks' docs:
PARTITION <partition1_name> VALUES LESS THAN ("<upper_bound_for_partitioning_column1>" [ , "<upper_bound_for_partitioning_column2>", ... ] )
From what I understand, your changes only add support for the (START...END...EVERY...)
syntax.
- What happens today if the user tries to parse the alternative syntax?
- Same question as (1), but given the changes in your PR?
I wanna ensure there are no regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not support this as I don't have the need to use it, other people that need it can add it later, later they can just add another expression for this, since it is totally different.
Yup, it only supports the dynamic partitioning range method (START...END...EVERY...)
syntax like you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, but there's some nuance here. Some unsupported DDLs are still "parsed" today, meaning that we produce a Command
node and store the input as raw text in it. The motivation behind this is to avoid crashing in cases where transpilation / parsing is not required for certain statements.
Since you introduced new logic for this property, I wanted to ensure that we don't break these Command
statements accidentally by entering this if self._match_text_seq("RANGE")
branch.
Basically let's test what happened before vs after. If the VALUES LESS THAN
clause was not supported before, it's fine. If it was parsed into a Command
, we need to make sure that if the parser fails, we'll fallback to a command as well.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will ignore this for now, don't want to spend too much time on this.
6f93937
to
30621b1
Compare
Thanks for the PR. |
No description provided.