Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The POSIX standard does not appear to allow empty regex... #27

Open
twhitehead opened this issue Oct 1, 2021 · 3 comments
Open

The POSIX standard does not appear to allow empty regex... #27

twhitehead opened this issue Oct 1, 2021 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@twhitehead
Copy link

Just a quick note that a lot of the other-implementations-are-not-compliant examples appear to be about empty patterns (e.g., issues with the matching of () in (()|.)(b)).

If you read the linked to POSIX standard, however, it seems that such empty expressions are not actually valid regexs. For example, the defined extended regex grammar is

extended_reg_exp   :                      ERE_branch
                   | extended_reg_exp '|' ERE_branch
                   ;
ERE_branch         :            ERE_expression
                   | ERE_branch ERE_expression
                   ;
ERE_expression     : one_char_or_coll_elem_ERE
                   | '^'
                   | '$'
                   | '(' extended_reg_exp ')'
                   | ERE_expression ERE_dupl_symbol
                   ;
one_char_or_coll_elem_ERE  : ORD_CHAR
                   | QUOTED_CHAR
                   | '.'
                   | bracket_expression
                   ;
ERE_dupl_symbol    : '*'
                   | '+'
                   | '?'
                   | '{' DUP_COUNT               '}'
                   | '{' DUP_COUNT ','           '}'
                   | '{' DUP_COUNT ',' DUP_COUNT '}'
                   ;

from which I don't see how you can form () as it must contain a extended_reg_exp which has to consist of at least one ERE_branch which must consist of at least one ERE_expression which must have at least one character of some sort.

@andreasabel
Copy link
Member

Thanks for the report, @twhitehead!

I suppose this is a issue with the Wiki rather than with regex-tdfa, but there is no bug tracker at the Wiki. The Wiki does not seem to be actively maintained. It could make sense to move relevant parts of the Wiki into the documentation of one of the regex-* packages, but then it would likely be regex-base.

@andreasabel andreasabel added the documentation Improvements or additions to documentation label Oct 4, 2021
@phadej
Copy link
Contributor

phadej commented Oct 4, 2021

FWIW, e.g. % git grep -E '()oo' works. I'm not sure what regexp engine git grep (and gnu grep) use, but it works anyhow.

@twhitehead
Copy link
Author

Feel free to close this if you want. As you said, there obviously nothing that needs to be done to the code itself. Had just happened to notice that and figured I should probably point it out. I've now added it to the talk page for the wiki entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants