Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missed(?) optimization with a const array of same item #107208

Closed
iximeow opened this issue Jan 22, 2023 · 13 comments · Fixed by #134852
Closed

missed(?) optimization with a const array of same item #107208

iximeow opened this issue Jan 22, 2023 · 13 comments · Fixed by #134852
Labels
A-array Area: `[T; N]` A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@iximeow
Copy link
Contributor

iximeow commented Jan 22, 2023

some example code:

const LUT: [u8; 2] = [
    1,
    1,
];

pub fn decode(i: u8) -> u8 {
    if i < 2 {
        LUT[i as usize]
    } else {
        2
    }
}

compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21):

example::decode:
        mov     al, 2
        cmp     dil, 1
        ja      .LBB0_2
        movzx   eax, dil
        lea     rcx, [rip + .L__unnamed_1]
        movzx   eax, byte ptr [rax + rcx]
.LBB0_2:
        ret

.L__unnamed_1:
        .zero   2,1

rustc faithfully produces an array of 0x01, 0x01 in the resulting program and loads from it. it seems to me rustc ought to be able to tell that all entries in the array are identical, use that value, and avoid loading from the array in the first place.

i'm not entirely sure if this should be an issue against rust-lang/rust or bug against LLVM, but i tried looking through I-slow for similar reports. so i think starting here is the right call :D thanks!

edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.

i happened to notice this with LUT being an array of function pointers, included for completeness
const LUT: [fn() -> u8; 2] = [
    || { 1 },
    || { 1 },
];

pub fn decode(i: u8) -> u8 {
    if i < 2 {
        LUT[i as usize]()
    } else {
        2
    }
}

compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21):

core::ops::function::FnOnce::call_once:
	mov	al, 1
	ret

playground::decode:
	cmp	dil, 1
	ja	.LBB1_1
	movzx	eax, dil
	lea	rcx, [rip + .L__unnamed_1]
	jmp	qword ptr [rcx + 8*rax]

.LBB1_1:
	mov	al, 2
	ret

.L__unnamed_1:
	.quad	core::ops::function::FnOnce::call_once
	.quad	core::ops::function::FnOnce::call_once

similarly, the array is indexed regardless of the fact that the array's items are all identical. this, in turn, is a minimized form of the original code with dozens of entries in the array. in that case, LUT is an associated item on a trait where one implementation happens to result in all function pointers being the same nearly-no-op function.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jan 22, 2023
@alex
Copy link
Member

alex commented Jan 23, 2023

Minimal C reproducer produces identical code: https://godbolt.org/z/E97jeGshr

@erikdesjardins
Copy link
Contributor

erikdesjardins commented Jan 24, 2023

i'm not entirely sure if this should be an issue against rust-lang/rust or bug against LLVM, but i tried looking through I-slow for similar reports. so i think starting here is the right call :D thanks!

Indeed, even if it's an LLVM bug, we often keep issues open in this repo to track the Rust reproducer and make sure the upstream change actually fixes it.

edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.

If you mean something like https://godbolt.org/z/hn8WMd7Gx, the optimization is simpler in that case because i < 1 means the load is always from a constant index (LUT[0]).

It's harder when the load is from a variable index. LLVM should already handle the case where all bits are the same (though this wouldn't help your reproducer) in ConstantFoldLoadFromUniformValue. Strangely, this doesn't seem to work with an all-zero array: https://godbolt.org/z/hT8Gv6Gof.

@iximeow
Copy link
Contributor Author

iximeow commented Jan 24, 2023

If you mean something like https://godbolt.org/z/hn8WMd7Gx,

yep, that's the kind of thing i was thinking.

means the load is always from a constant index (LUT[0]).

ah! that makes much more sense; i hadn't thought about the index being a constant in that case.

@iximeow
Copy link
Contributor Author

iximeow commented Jan 28, 2023

i've just realized:

Indeed, even if it's an LLVM bug,

particularly given the C reproducer (thanks alex!), i should go open an issue against llvm/llvm-project about it then, and link it here for tracking purposes yeah?

@workingjubilee
Copy link
Member

Yes please! Someone will likely get around to it Eventually but, y'know,

nikic pushed a commit to llvm/llvm-project that referenced this issue Feb 20, 2023
Fold LoadInst for uniformly initialized constants, even if there
are non-constant GEP indices.

Goal proof: https://alive2.llvm.org/ce/z/oZtVby

Motivated by rust-lang/rust#107208

Differential Revision: https://reviews.llvm.org/D144184
CarlosAlbertoEnciso pushed a commit to SNSystems/llvm-debuginfo-analyzer that referenced this issue Feb 21, 2023
Fold LoadInst for uniformly initialized constants, even if there
are non-constant GEP indices.

Goal proof: https://alive2.llvm.org/ce/z/oZtVby

Motivated by rust-lang/rust#107208

Differential Revision: https://reviews.llvm.org/D144184
@workingjubilee workingjubilee added the A-array Area: `[T; N]` label Mar 7, 2023
@khei4
Copy link
Contributor

khei4 commented Mar 23, 2023

Upstream patch: https://reviews.llvm.org/D144445

@nikic nikic self-assigned this Mar 23, 2023
@nikic
Copy link
Contributor

nikic commented Mar 23, 2023

(Assigning to myself to check back after the LLVM 17 upgrade).

@Noratrieb Noratrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
@nikic
Copy link
Contributor

nikic commented Aug 14, 2023

This works now if LUT is a static: https://godbolt.org/z/Efxcn9K5s

But not if it's a const: https://godbolt.org/z/rf7eKdfGd

This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases. As a side effect of that change, const arrays now get materialized as explicit vector stores at every use, instead of referencing a global. That seems potentially pretty bad.

@nikic nikic removed their assignment Aug 14, 2023
@erikdesjardins
Copy link
Contributor

For the case with function pointers:

...when the function pointers are all the same declared function, this was fixed in 1.73: https://godbolt.org/z/WMefYsah4

...when the function pointers are from closures, it still isn't resolved: https://godbolt.org/z/5baEW4ro8 (it seems like this is because the two closure functions get merged too late)

@scottmcm
Copy link
Member

scottmcm commented Aug 2, 2024

This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases.

This was completely undone in #123185 for 1.79, so it should no longer be a vector op issue, at least.

veselypeta pushed a commit to veselypeta/cherillvm that referenced this issue Aug 12, 2024
Fold LoadInst for uniformly initialized constants, even if there
are non-constant GEP indices.

Goal proof: https://alive2.llvm.org/ce/z/oZtVby

Motivated by rust-lang/rust#107208

Differential Revision: https://reviews.llvm.org/D144184
@alex
Copy link
Member

alex commented Dec 28, 2024

This looks to be working as expected on release: https://godbolt.org/z/enf7a7Pa3

Can this be closed as is, or should we add a codegen test?

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 28, 2024
@nikic
Copy link
Contributor

nikic commented Dec 28, 2024

We should add a codegen test.

@alex
Copy link
Member

alex commented Dec 28, 2024

Ok, test in #134852

Zalathar added a commit to Zalathar/rust that referenced this issue Dec 28, 2024
Added a codegen test for optimization with const arrays

Closes rust-lang#107208
@bors bors closed this as completed in d6c73eb Dec 29, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Dec 29, 2024
Rollup merge of rust-lang#134852 - alex:patch-1, r=durin42

Added a codegen test for optimization with const arrays

Closes rust-lang#107208
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-array Area: `[T; N]` A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants