Implement builtin derive `CoercePointee` #18764

Veykril · 2024-12-27T08:22:14Z

It is about to be stabilized rust-lang/rust#133820

ChayimFriedman2 · 2024-12-30T15:15:53Z

Ok, it looks like this macro does some cursed things no other builtin macro does (adding type parameters, manipulating bounds etc.) which are really awkward to express with our current messy builtin derive infrastructure. This motivates me to look again at simplifying the interaction between the parser and the tt (getting rid of syntax_bridge, at least partially) - @Veykril if I remember correctly you have some unfinished work in this area, right?

Veykril · 2024-12-30T15:27:01Z

I started replacing our current TokenTree format with something that more resembles rustc here #17830 some time ago. Mainly because our current TokenTree is just terrible, can represent way more things than it ever can (using chars for punctuation instead of enum for example) and also because its just not nice to work with. It's also a lot nicer for proc-macros, as that is directly what they interact with. The other main motivation was that it would allow us to clean up the mbe crate which is a mess right now.

Syntax bridge we won't be getting rid of from what I can tell, I at least don't see why we would as macros don't work on the AST, they work on the token tree model.

Either way feel free to pick that PR up if you want.

ChayimFriedman2 · 2024-12-30T15:52:37Z

I have a sketch of a plan. That is:

Revive Store token trees in contiguous Vec instead of as a tree #18327, because I believe this will make the rest of the transition smoother, although I'm not sure. But it's also profitable by itself.
The parser doesn't work on syntax nodes or anything similar, it works on a stream of lexed tokens. In much the same way it can work for tts. So I plan to make an abstraction that will create tt ast from tts. The question is what the type of the AST will be. It cannot be something other than the existing AST nodes, because otherwise all of the code in hir-def etc. won't work with it. We can make the AST nodes generic, but this will be a nightmare of making everything generic and will likely regress perf. But I believe I can create a source-agnostic (tt or syntax node) AST, either as an enum or with some clever bit-packing or something if needed. hir layers will work only with this source-agnostic AST, they need nothing more than children of the AST node. IDE layers will probably work on the rowan version to minimize changes, although I'm yet to see if this means I cannot get rid of syntax_bridge entirely. Macros will work only with the tt version, they need AST for fragements (which is why if I understand correctly you wanted your PR to begin with, to save the overhead of the syntax_bridge). Also builtin derives like in this issue.

The questions I have is whether this will work (of course), but I believe it will. More importantly I'm not sure this is worth it, and I don't know if and how much perf improvements this will bring. This will also cleanup the code (the builtin derives are one big mess currently and syntax_bridge is not pretty), but it's still a lot of work. Reviving #18327 seems worthy anyway for the memory savings, but beyond that I don't really know.

On the other hand, I seriously don't know how we can implement CoercePointee with the current mess, and how can we implement perhaps more complex builtin macros that will follow. Seriously, the only ways I see to implement CoercePointee are either to have a mini-parser inside it to parse bounds from tts (horribly awful), or convert to rowan AST, clone_to_update(), do modifications, then convert back to tt, while duplicating a bunch of code with other derives, that I'm not sure how much can be shared. Both look like something I don't want to do to me.

Veykril · 2024-12-31T14:50:36Z

I think it might be a good idea to step back for a second, clarify the current status quo and check what we can improve more structurally

Right now we have the following setup:

parsing normal source files was lex the source string into tokens, then parse that into our rowan AST.
parsing macro expansions, we expand the macro which yields a TokenTree, then that is being parsed into the AST via syntax bridge (which basically attempts to retokenize)
macro expansion, we turn the AST token tree of its input(s) into a TokenTree via syntax bridge
- what we do differs a bit dependong on the kind of macro
  - for MBE macros we then do the matching and transcribing against this TokenTree
  - for proc-macros we serialize this format into json, throw that at the proc-macro server, deserialize it again and transform that to the proc-macro types on demand (as required by the bridge)
  - for builtin macros it depends, most stuff just works on the TokenTree stuff directly as most builtins are simple, except for derives which re-parse the TokenTrees into the AST
    - noticable eager macros parse the TokenTree back fully, then descends the tree to find any new macro call, expands that, transforms that back into the AST and patches that into its AST until the AST is fully expanded turning it back into a TokenTree
- that leaves us with a final TokenTree

That means we currently have two types of syntax trees, our AST (or well, CST) which is what the lowering and the IDE works with and the TokenTree model which is what macro expansion work with.

We should ideally lay out the requirements here to see if we actually need the TokenTree (my gut tells me that I think we do, just that its shape is not ideal). I haven't given this much thought otherwise right now, and likely won't the coming week but to clarify I'd like to have a proper plan laid out with reasoning behind it before we do any major change here then as this is touching on a fundamental system in rust-analyzer. (lets fork that discussion to #18808)

On the other hand, I seriously don't know how we can implement CoercePointee with the current mess, and how can we implement perhaps more complex builtin macros that will follow. Seriously, the only ways I see to implement CoercePointee are either to have a mini-parser inside it to parse bounds from tts (horribly awful), or convert to rowan AST, clone_to_update(), do modifications, then convert back to tt, while duplicating a bunch of code with other derives, that I'm not sure how much can be shared. Both look like something I don't want to do to me.

Honestly speaking, the last part sounds like what we should just short term. That sounds like what we are doing in the other derives already anyways and the thing does not need to be perfect. That derive is a niche one in the first place and as long as we have some support for it it should be good enough. (the rust for linux folks want it, and they do use rust-analyzer)

Veykril added A-macro macro expansion Broken Window Bugs / technical debt to be addressed immediately C-bug Category: bug labels Dec 27, 2024

ChayimFriedman2 self-assigned this Dec 30, 2024

Veykril mentioned this issue Dec 31, 2024

Re-think/design our TokenTree model #18808

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement builtin derive `CoercePointee` #18764

Implement builtin derive `CoercePointee` #18764

Veykril commented Dec 27, 2024 •

edited

Loading

ChayimFriedman2 commented Dec 30, 2024

Veykril commented Dec 30, 2024

ChayimFriedman2 commented Dec 30, 2024

Veykril commented Dec 31, 2024 •

edited

Loading

Implement builtin derive CoercePointee #18764

Implement builtin derive CoercePointee #18764

Comments

Veykril commented Dec 27, 2024 • edited Loading

ChayimFriedman2 commented Dec 30, 2024

Veykril commented Dec 30, 2024

ChayimFriedman2 commented Dec 30, 2024

Veykril commented Dec 31, 2024 • edited Loading

Implement builtin derive `CoercePointee` #18764

Implement builtin derive `CoercePointee` #18764

Veykril commented Dec 27, 2024 •

edited

Loading

Veykril commented Dec 31, 2024 •

edited

Loading