Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add not_ref() #570

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add not_ref() #570

wants to merge 1 commit into from

Conversation

tesujimath
Copy link
Contributor

Works just like not() but for BorrowInput rather than ValueInput.

This is just like I did for any_ref, and needed for the same reason, that is, because not can't be used for BorrowInput, only ValueInput

@zesterer
Copy link
Owner

Hmm, I worry that we're going to end up duplicating a lot of combinators to make this work across the crate. I think there probably needs to be a better way of generating errors without relying on being able to pull values out by-value. I need to revisit exactly how error generation works with MaybeRef, but my intuition is that there should be a better way of solving this and relaxing the trait bound.

@tesujimath
Copy link
Contributor Author

Yeah I do agree, relaxing the trait bound from ValueInput to just Input would be better, but I couldn't see how to do that!

@tesujimath
Copy link
Contributor Author

tesujimath commented Dec 8, 2023

@zesterer I'm thinking I'll have to publish my own variant of this crate as an interim measure, so I can publish (an update to) my beancount-parser-lima crate which depends on it.

Best practice in the community seems to be I publish a crate called tesujimath-chumsky with a clear description as to it's temporary nature, so publishing my own crate isn't blocked this.

As soon as there's a published solution to what was blocking me here and in #569, I'll delete my crate and switch back.

Ok?

@zesterer
Copy link
Owner

zesterer commented Dec 8, 2023

Apologies, I've had a really busy few weeks (new job, among other things!). This is on my todo list (along with a series of other things), I'll try to get a decent response up here by the end of the weekend.

@tesujimath
Copy link
Contributor Author

Great, and thank you! Sorry if it seems like I'm hassling you about this, that's not my intention. Really appreciate your work on Chumsky!

@tesujimath
Copy link
Contributor Author

@zesterer I had another look at what would be required to support not() et al for BorrowInput.

Relaxing the trait constraints from BorrowInput and ValueInput to just Input seems to come unstuck with the calls to next_ref() or next(), which seem fundamentally incompatible. I'm not even sure if this will be possible.

For now I have only two ideas:

  1. Implement the whole suite of not_ref() and the others for BorrowInput, as parallels to the existing ValueInput ones (which is what I started to do here).
  2. Give up on using BorrowInput in my application and simply use ValueInput everywhere.

Since my tokens come from Logos and contain &str, using ValueInput does not seem like an enormous loss. Mostly. But I do have a couple of more painful tokens to pass by value. Is this what others are doing? Is BorrowInput able to become a first-class citizen somehow?

@zesterer
Copy link
Owner

zesterer commented Jan 31, 2024

Relaxing the trait constraints ... seem fundamentally incompatible.

One existing issue with not is that the way it reports errors is... strange. If you have something like just('a').and_is(just("avocado").not()).repeated() and run it on an input like aaaaavocado, you'll end up getting an error like

aaaaavocado
    ^
    Unexpected 'a'

This seems... counterintuitive, to say the least given that an a in that location is a legitimate pattern: provided it's not followed by vocado.

The wider issue here is that we don't have a good way to fall back on a non-token pattern. Ideally, the error would be more like Unexpected 'avocado'. This is actually a wider design problem with chumsky (see #538 and related issues) that I want to fix at some point, and it affects other combinators like filter (there's no good way to state, in the positive, what the filter function was expecting).

If this wider problem were solved, then I suspect that the not issue would also be solved: since the reported error could contain some fallback label instead of a long-lived reference to whatever token was found.

It's a tricky one: while I definitely see the value in this combinator, I'm also keenly aware that the reason for its existence is a product of poor design elsewhere in the crate. I think I'd be happy to accept it, but perhaps only as a temporary measure: perhaps it could be put behind the unstable feature?

Is BorrowInput able to become a first-class citizen somehow?

The difficulty here is that the inputs supported by chumsky have mutually-exclusive feature sets.

Input type Token type Supports borrowing tokens? Supports taking tokens by-value?
&'a str char false (str is UTF-8, does not contain chars) true (char is trivially copyable)
&'a [u8] u8 true ([u8] is a slice, contains u8s) true (u8 is trivially copyable)
&'a [T] T true ([T] is a slice, contains Ts) false (T's copyability is unknown)

The only commonality between these inputs is that they must be able to produce a MaybeRef<'a, Self::Token> (i.e: either an owned Self::Token or a borrowed &'a Self::Token), which is the base requirement placed upon the Input trait: making BorrowInput first-class would necessitate making at minimum &'a str an invalid input, and potentially many more (it would be nice, for example, to have non-memory data structures like impl Read fit into this abstraction too, see #590 ).

@tesujimath
Copy link
Contributor Author

I wasn't able to transition to ValueInput for various reasons, so sticking with BorrowInput, which is surely the right thing.

But by restructuring my parser, I was able to avoid needing both not_ref() and one_of_ref(), so neither this PR nor #569 are blocking my own progress.

Notwithstanding the not weirdness, I do think it should be supported for BorrowInput, or removed from ValueInput. But I don't think I can help with anything here, sorry.

Thanks for continuing to make Chumsky more awesome! ❤️

@zesterer
Copy link
Owner

zesterer commented Feb 1, 2024

Good to hear that you've been able to work around it. I'm going to keep this PR open for now, I if only as a reminder to myself to come up with a good solution to this wider problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants