-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add count of lock conflicts to @time
etc. Add @lock_conflicts
#52883
Add count of lock conflicts to @time
etc. Add @lock_conflicts
#52883
Conversation
Maybe it could work like |
Maybe this should just be added to |
That sounds like a great idea to me! |
In
|
@time
06b0112
to
6e605d6
Compare
6e605d6
to
f0ec9b9
Compare
@time
@time
etc. Add @lock_conflicts
f0ec9b9
to
2f60a38
Compare
How often does this show up? The |
I've not field-tested it much, but |
I tried this branch on the LRUCache.jl tests (since they are fast to run and test some concurrency stuff), and got: (LRUCache) pkg> instantiate
Precompiling project...
1 dependency successfully precompiled in 1 seconds
julia> @time include("test/runtests.jl")
Test Summary: | Pass Total Time
Single-threaded insertion and reinsertion | 44 44 0.6s
Test Summary: | Pass Total Time
Multi-threaded insertion and reinsertion | 2 2 0.3s
Test Summary: | Pass Total Time
Single-threaded getting and setting | 930 930 0.4s
Test Summary: | Pass Total Time
Multi-threaded getting and setting | 7 7 0.2s
Test Summary: |Time
Recursive lock in get(!) | None 0.1s
Test Summary: | Pass Total Time
Eviction callback | 11 11 0.4s
Test Summary: | Pass Total Time
Reverse iterator | 10 10 0.3s
Test Summary: | Pass Total Time
Large entries | 13 13 0.2s
Test Summary: | Pass Total Time
cache_info | 15 15 0.3s
Test Summary: | Pass Total Time
Original tests | 115 115 0.3s
4.307419 seconds (6.51 M allocations: 324.627 MiB, 1.54% gc time, 2 lock conflicts, 108.86% compilation time: 5% of which was recompilation) Running again in the same session, I got 1.710402 seconds (1.58 M allocations: 77.543 MiB, 0.31% gc time, 1 lock conflict, 157.12% compilation time) The > 100% compilation time is weird, and I see it on the v1.10.0 release as well. (Or actually, is this a known thing when compilation is happening on multiple threads?). So a couple conflicts there. |
How many threads did you run with? |
julia> versioninfo()
Julia Version 1.11.0-DEV.1269
Commit 2f60a38ca0 (2024-01-13 15:16 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin23.0.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 4 default, 0 interactive, 2 GC (on 4 virtual cores)
Environment:
JULIA_NUM_THREADS = 4
JULIA_PKG_SERVER_REGISTRY_PREFERENCE = eager |
LRUCache uses SpinLock rather than ReentrantLock. This tooling is currently only set up for ReentrantLock. I've been using this tool already to reduce locks here JuliaImages/ImageFiltering.jl#271 Before
After
|
I tried moving the counter increment to SpinLock like below, but the OP example with that gives counts of 3, 4, 7 conflicts in repeat runs where previously it was always, as expected, 5 diff --git a/base/lock.jl b/base/lock.jl
index b593ebd10d..e3a8463578 100644
--- a/base/lock.jl
+++ b/base/lock.jl
@@ -145,7 +145,6 @@ Each `lock` must be matched by an [`unlock`](@ref).
"""
@inline function lock(rl::ReentrantLock)
trylock(rl) || (@noinline function slowlock(rl::ReentrantLock)
- Threads.lock_profiling() && Threads.inc_lock_conflict_count()
c = rl.cond_wait
lock(c.lock)
try
diff --git a/base/locks-mt.jl b/base/locks-mt.jl
index 5d355b9ed2..00bbabfed4 100644
--- a/base/locks-mt.jl
+++ b/base/locks-mt.jl
@@ -39,10 +39,15 @@ end
Base.assert_havelock(l::SpinLock) = islocked(l) ? nothing : Base.concurrency_violation()
function lock(l::SpinLock)
+ first = true
while true
if @inline trylock(l)
return
end
+ if first
+ first = false
+ lock_profiling() && inc_lock_conflict_count()
+ end
ccall(:jl_cpu_suspend, Cvoid, ())
# Temporary solution before we have gc transition support in codegen.
ccall(:jl_gc_safepoint, Cvoid, ()) @ericphanson with that LRUCache tests with
|
Hm, I don't know if adding this to I would rather do something like Go, which has profiling support for mutex https://go.dev/doc/diagnostics @pchintalapudi started to add something like this for the low-level locks. |
Doing that too would be great. I like that |
if the concern is that it's too noisy, maybe there's some heuristic like if the number of conflicts is more than x% of the number of total |
It could count time spent waiting at locks and if that's greater than a % of total CPU time then show? |
That sounds even better. And perhaps what can be shown is that time or %? Since it is likely more interpretable than a raw count. (Though the cputime vs walltime distinction may be hard to communicate briefly) |
I tried drafting some example messages with timing but I think count might be sufficient. It points to the mechanics of the code. If you just got a time it'd be hard to understand compared to threading structure. At least that was helpful in the PR I used it in above, because it compared to number of calls in my code that had deeply nested locks. For the more detailed profiling a count and time, and more would be great. |
Maybe we should do something like |
It feels like there is a very large number of timers/counters that can be put into |
|
2f60a38
to
d7ebd53
Compare
My impression from using this a bit is that it's not particularly noisy except where it's helpful to highlight. If the main concern with adding things to Or if people want me to try more examples first, I'd be happy to try and report. |
I think the docs should be clear it only affects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great and I've wanted something like this before; I think it's worth trying out.
I'm marking this for triage to review proposal #52883 (comment) Also added the |
From a "is this too noisy" perspective, I think this is good to merge. If it turns out to really be too noisy, we can back it out or use a different heuristic. |
d7ebd53
to
9d31f64
Compare
We discussed this in the triage call. Whilst there wasn't a quorum to make triage decisions, @LilithHafner and I did agree that the plan above can be tried without triage approval. |
fix side effects in allocation tests suggestions
66920b9
to
0b2adc6
Compare
Updated: