Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usages of getPsi() #2901

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open

Remove usages of getPsi() #2901

wants to merge 27 commits into from

Conversation

mgroth0
Copy link
Contributor

@mgroth0 mgroth0 commented Dec 8, 2024

Description

I observed a performance bottleneck in com.pinterest.ktlint.rule.engine.core.api.isPartOfComment. CPU profiling showed a significant amount of time being wasted doing stuff related to checking for cancellation, like IntelliJ GUI stuff.
Screenshot 2024-12-08 at 11 58 25 AM

Checklist

Before submitting the PR, please check following (checks which are not relevant may be ignored):

  • Commit message are well written. In addition to a short title, the commit message also explain why a change is made.
  • At least one commit message contains a reference Closes #<xxx> or Fixes #<xxx> (replace<xxx> with issue number)
  • Tests are added
  • KtLint format has been applied on source code itself and violations are fixed
  • PR title is short and clear (it is used as description in the release changelog)
  • PR description added (background information)

Documentation is updated. See difference between snapshot and release documentation

  • Snapshot documentation in case documentation is to be released together with a code change
  • Release documentation in case documentation is related to a released version of ktlint and has to be published as soon as the change is merged to master

This PR doesn't include new tests, doesn't reference an issue, and doesn't change documentaiton. It is a performance optimization.

According to IntelliJ's Usages, there are about 65 other usages of ASTNode.getPsi() in the ktlint codebase. I'm not sure if all of them could be removed, but maybe we could try to remove as many of them as possible? If this PR looks good, I suggest we follow up by doing that.

@paul-dingemans
Copy link
Collaborator

Thanks for your contributions. So far, I have just skimmed your changes. I will do a detailed review later. To not block you, I have approved the workflow so that the pipeline will be trigger when you add changes.

I observed a performance bottleneck in com.pinterest.ktlint.rule.engine.core.api.isPartOfComment. CPU profiling showed a significant amount of time being wasted doing stuff related to checking for cancellation, like IntelliJ GUI stuff.

What lead you to analyzing/resolving this problem? Are you resolving problems with Ktlint CLI, ktlint-intellij-plugin, or with any API Consumer? Of course, all will benefit from improved performance. But I am just curious to understand where you are coming from.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

I am running ktlint primarily as an API consumer inside of a gradle plugin. I build a large project with hundreds of modules and execute ktlint concurrently on thousands of kotlin files and I have observed frequently my gradle builds seem to pause for long times (1-30 seconds) on ktlint execution. These pauses seem unnaturally long. I don't see pauses like this during regular kotlin compilation or during detekt. So I thought something different might be happening in ktlint. Upon CPU profiling, the flame graph looked like this:

Screenshot 2024-12-10 at 3 19 52 AM

What this told me is that it wasn't one rule, but rather something dispersed throughout ktlint. Then I took a closer look, and that is when I saw the image I shared at the top. The 3081 ms for ProgressIndicatorProvider.checkCancelled() seems very wrong, doesn't it? My understanding is that checkCancelled shouldn't be called during a gradle operation since gradle isn't a GUI operation and won't be cancelled like that. So then I realized that the ASTNode interface is designed to be more lightweight anyway, so I decided to see if I could migrate ktlint from Psi to ASTNode. getPsi() will always do checkCancelled if the node is a CompositeElement, which many elements are.

getPsi() is causing all of this strange pausing, because its checking for cancellation. I don't know why the checkCancelled operation is pausing so much, because my profiler only goes as deep as Cancellation.currentJob which is a kotlinx.coroutines thing, but for some reason that's the bottleneck. I have to suspect maybe Cancellation.currentJob is somehow getting overwhelmed because I'm using many concurrent threads or something, but I don't know.

This result tells a compelling story:

Screenshot 2024-12-10 at 3 33 36 AM

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

My strategy so far has been:

  1. Search for usages of getPsi() and try to replace it by working with ASTNode
  2. If its in the context of a code edit, ignore it. 99.9% of them time ktlint is just reading code, writes are much more rare for me so its less of a priority.
  3. Ensure tests pass and format as I go

My biggest concern is that even though I ensure tests pass, some of my changes are likely to cause unforseable bugs. I feel bad introducing these bugs, but I feel its neccesary to migrate the code to more lightweight APIs. I hope you agree.

Also, I still have a lot to learn about these APIs and could be making mistakes.

One idea I have is that if this gets merged, the first release after this merge is a beta release allowing a time for people to report bugs, since the tests probably aren't covering every corner case. I don't think these changes can be released as a stable version right away.

I am looking forward to your feedback.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

The other main concern that I have is that the ASTNode API doesn't do as much logic for you, so its harder to maintain and develop. It seems with a bit of dedication we could build an ASTNode-based library of extension functions to mimic a lot of the logic of psi, but I have mixed feelings about that.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

There are 13 more Rule classes that have usages of getPsi (excluding usages for mutating psi). I left what seemed to be the most difficult for last, so these might take a while. Given that I've shared some concerns I'm going to pause for now and give time for feedback and review.

This could technically be merged as is. The remaining work is just to remove more usages of getPsi, but the current state of the branch is good; all tests pass.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

Also I am curious, would ktlint possibly migrate to the analysis API in the future? I don't know much about this, but I wonder if newer APIs like that resolve the problem, like maybe they are both lightweight and also could offer more support with the logic like psi does? Just a thought.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

Looking into it more, apparently the Analysis API is overkill for a formatter and has lots of overhead.

This leads me to think maybe creating a small lightweight ASTNode-based support library, largely just copying a lot of the logic from Psi classes, will be helpful.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 10, 2024

Basically expanding ASTNodeExtension.kt and minimize psi usage

@mgroth0 mgroth0 changed the title implement ASTNode.isPartOfComment without psi Remove usages of getPsi() Dec 10, 2024
@paul-dingemans
Copy link
Collaborator

I am running ktlint primarily as an API consumer inside of a gradle plugin. I build a large project with hundreds of modules and execute ktlint concurrently on thousands of kotlin files and I have observed frequently my gradle builds seem to pause for long times (1-30 seconds) on ktlint execution. These pauses seem unnaturally long.

Interesting to know this. Personally I use ktlint CLI only on small projects, so I have never noticed this.

What this told me is that it wasn't one rule, but rather something dispersed throughout ktlint. Then I took a closer look, and that is when I saw the image I shared at the top. The 3081 ms for ProgressIndicatorProvider.checkCancelled() seems very wrong, doesn't it? My understanding is that checkCancelled shouldn't be called during a gradle operation since gradle isn't a GUI operation and won't be cancelled like that. So then I realized that the ASTNode interface is designed to be more lightweight anyway, so I decided to see if I could migrate ktlint from Psi to ASTNode. getPsi() will always do checkCancelled if the node is a CompositeElement, which many elements are.

Ktlint uses both the ASTNode interface as well as Psi. I expect that it dependended on the knowdledge/experience of rule developers which approach was used. In most cases I prefer the ASTNode, as it is way more easy to comprehend. On the other hand Psi ofter gives access to helper functions, but I find them typically hard to find. But I have never (until this issue) seen a reason to migrate away from PSI fully.

getPsi() is causing all of this strange pausing, because its checking for cancellation. I don't know why the checkCancelled operation is pausing so much, because my profiler only goes as deep as Cancellation.currentJob which is a kotlinx.coroutines thing, but for some reason that's the bottleneck. I have to suspect maybe Cancellation.currentJob is somehow getting overwhelmed because I'm using many concurrent threads or something, but I don't know.

I guess that the cancellation checking especially makes sense when running within the context of IntelliJ IDEA. For ktlint (including the ktlint-intellij-plugin) this does not seem relevant though.

My strategy so far has been:

  1. Search for usages of getPsi() and try to replace it by working with ASTNode
  2. If its in the context of a code edit, ignore it. 99.9% of them time ktlint is just reading code, writes are much more rare for me so its less of a priority.
  3. Ensure tests pass and format as I go

Sounds reasonable.

My biggest concern is that even though I ensure tests pass, some of my changes are likely to cause unforseable bugs. I feel bad introducing these bugs, but I feel its neccesary to migrate the code to more lightweight APIs. I hope you agree.

Also, I still have a lot to learn about these APIs and could be making mistakes.

I have no problem with this. I am quite confident about the test converage of most rules. Those tests cover both linting and formatting. But I hope that I can count on your support in case such errors will occur.

One idea I have is that if this gets merged, the first release after this merge is a beta release allowing a time for people to report bugs, since the tests probably aren't covering every corner case. I don't think these changes can be released as a stable version right away.

After each merge, a SNAPSHOT version of ktlint is released. The ktlint build pipeline does not support BETA builds as far as I know. Also, that would not make a lot of sense to me, as users of ktlint do not seem to be interested/invested in testing new versions before the actual release. That is the main reason that after major/minor release usually one patch version is needed a couple of weeks after the major/minor release.

The other main concern that I have is that the ASTNode API doesn't do as much logic for you, so its harder to maintain and develop. It seems with a bit of dedication we could build an ASTNode-based library of extension functions to mimic a lot of the logic of psi, but I have mixed feelings about that.

This is indeed a concern. But there is no need to be stressed about it too much. We don't need to replace 100% of Psi with AST. First we can pick the low hanging fruit. By doing so, we learn more about what works or doesn't work. Those learnings will help with future decision making.

There are 13 more Rule classes that have usages of getPsi (excluding usages for mutating psi). I left what seemed to be the most difficult for last, so these might take a while. Given that I've shared some concerns I'm going to pause for now and give time for feedback and review.

This could technically be merged as is. The remaining work is just to remove more usages of getPsi, but the current state of the branch is good; all tests pass.

Agree. Let's first work on wrapping/merging this part.

Also I am curious, would ktlint possibly migrate to the analysis API in the future? I don't know much about this, but I wonder if newer APIs like that resolve the problem, like maybe they are both lightweight and also could offer more support with the logic like psi does? Just a thought.

Looking into it more, apparently the Analysis API is overkill for a formatter and has lots of overhead.

This leads me to think maybe creating a small lightweight ASTNode-based support library, largely just copying a lot of the logic from Psi classes, will be helpful.

Basically expanding ASTNodeExtension.kt and minimize psi usage

I see no reason to migrate to the analysis API. Expanding ASTNodeExtension is fine. Let's see how this works out.

Thanks for your efforts sofar. It is nice to have input from a totally different perspective. I will start detailed review.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 15, 2024

But I hope that I can count on your support in case such errors will occur.

I cannot make a strong commitment to make fixes in a timely manner due to my other priorities, but please feel free to flag me on any bug report that involves my changes and I will help when I can.

Copy link
Collaborator

@paul-dingemans paul-dingemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a promising initative!

Please don't be dishartened by the amount of review remarks. Most of them are relatively minor because you are not fully acquainted with the code base.

I do have some concerns about replacing all Psi code with own AstNode although the readability is improved on lots of places. Sofar, I have seen no changes that should not have been ported from Psi.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 20, 2024

This is a tangential question: would ktlint consider using detekt to enforce rules on its own code? I ask because I notice you keep a really tight ship in terms of code style (in ways that are beyond just formating), and detekt may be able to automate and enforce some of that.

Also, I could see detekt and ktlint sharing a common library for some basic ASTNode operations, but that's a separate point.

@paul-dingemans
Copy link
Collaborator

This is a tangential question: would ktlint consider using detekt to enforce rules on its own code? I ask because I notice you keep a really tight ship in terms of code style (in ways that are beyond just formating), and detekt may be able to automate and enforce some of that.

I have considered this at some point, but never followed up upon it. As Detekt also contains Ktlint, I wonder what happens when I make a change in ktlint that conflicts with the older version of Ktlint in Detekt.

Also, I could see detekt and ktlint sharing a common library for some basic ASTNode operations, but that's a separate point.

That could be an option, but it might be hard to align on the governance of that library. Also, it might communicate to the world that it is a complete library for ASTNode operations and as of that get many feature requests not needed by Ktlint / Detekt.

Copy link
Collaborator

@paul-dingemans paul-dingemans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matt,
I did review your changes about a week ago, but did not mention that explicitly yet. Can have a look at the unresolved issues?

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 29, 2024

Hi Paul, I'm going to start working on it now. I just don't know if I'm going to find a great solution for the issue with the tokens.

@mgroth0
Copy link
Contributor Author

mgroth0 commented Dec 29, 2024

Done responding to all the unresolved issues.

@paul-dingemans
Copy link
Collaborator

I am done with the second review and close some conversations. Can you please have a look at the last open conversation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants