Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SV_CONST() / PL_sv_consts[] API on another planet, immortal HEKs in libperl #22872

Open
bulk88 opened this issue Dec 23, 2024 · 0 comments
Open

Comments

@bulk88
Copy link
Contributor

bulk88 commented Dec 23, 2024

Description

Out of curiosity, and I've never heard or seen anyone do it before. I decided to dump perl -e"print 'hello world'" 's shared HEK cache. AKA HV* PL_strtab or my_perl->Istrtab . I was curious strings or HV key names, are permanently burned into libperl/perl_construct(), and can't be stopped/prevent-initialization-allocation, by any user. The list shocked me. Some things like UNIVERSAL:: and the core XSUBs were obvious. All of my %ENV, was a surprise. Isn't that supposed to be a getter Magic HV*? Not plain-old-data HV* !

Click to expand the HEK dump, its VERY LONG

�
�
�
"
(""
()
(*
(*=
(+
(+=
(-
(-=
(/
(/=
(0+
(<=>
(abs
(bool
(cmp
(nomethod
,
-e
/
0
@
ALLUSERSPROFILE
APPDATA
ARGV
AUTOLOAD
CLEAR
CLONE
COMMANDPROMPTTYPE
COMMONPROGRAMFILES
COMMONPROGRAMFILES(X86)
COMMONPROGRAMW6432
COMPUTERNAME
COMSPEC
CORE
CORE::
CORE::GLOBAL
CopyFile
DB
DB::
DELETE
DESTROY
DEVENVDIR
DOES
DomainName
DynaLoader
DynaLoader::
ENV
EXISTS
EXTENSIONSDKDIR
EXTERNAL_INCLUDE
Exporter
Exporter::
FETCH
FIRSTKEY
FP_NO_HOST_CHECK
FRAMEWORK40VERSION
FRAMEWORKDIR
FRAMEWORKDIR32
FRAMEWORKDIR64
FRAMEWORKVERSION
FRAMEWORKVERSION32
FRAMEWORKVERSION64
File::
FormatMessage
FsType
GLOBAL::
GetCwd
GetFullPathName
GetLastError
GetLongPathName
GetNextAvailDrive
GetOSVersion
GetShortPathName
GetTickCount
HOMEDRIVE
HOMEPATH
Handle::
Hash
Hash::
Hash::Util
INC
INCLUDE
IO
IO::
IO::File
IO::Handle
IO::Seekable
ISA
Internals
Internals::
IsWin95
IsWinNT
LIB
LIBPATH
LOCALAPPDATA
LOGONSERVER
Layer::
LoginName
MPCONFIG_PRODUCTAPPDATAPATH
MPCONFIG_PRODUCTCODENAME
MPCONFIG_PRODUCTPATH
MPCONFIG_PRODUCTUSERAPPDATAPATH
MPCONFIG_REPORTINGGUID
NETFXSDKDIR
NEXTKEY
NODE_SKIP_PLATFORM_CHECK
NUMBER_OF_PROCESSORS
NamedCapture::
NoWarnings
NodeName
OS
PATH
PATHEXT
PLATFORM
PROCESSOR_ARCHITECTURE
PROCESSOR_IDENTIFIER
PROCESSOR_LEVEL
PROCESSOR_REVISION
PROGRAMDATA
PROGRAMFILES
PROGRAMFILES(X86)
PROGRAMW6432
PROMPT
PSMODULEPATH
PUBLIC
PerlIO
PerlIO::
PerlIO::Layer
Regexp
Regexp::
SCALAR
SESSIONNAME
SSLKEYLOGFILE
STDERR
STDIN
STDOUT
STORE
SYSTEMDRIVE
SYSTEMROOT
Seekable::
SetChildShowWindow
SetCwd
SetLastError
Sleep
Spawn
SvREADONLY
SvREFCNT
TEMP
TIEHASH
TMP
Tie
Tie::
Tie::Hash
Tie::Hash::NamedCapture
UCRTVERSION
UNIVERSAL
UNIVERSAL::
UNIVERSALCRTSDKDIR
USERDOMAIN
USERDOMAIN_ROAMINGPROFILE
USERNAME
USERPROFILE
Util::
V
VBOX_MSI_INSTALL_PATH
VCIDEINSTALLDIR
VCINSTALLDIR
VCPKG_ROOT
VCTOOLSINSTALLDIR
VCTOOLSREDISTDIR
VCTOOLSVERSION
VERSION
VISUALSTUDIOVERSION
VS170COMNTOOLS
VS90COMNTOOLS
VSCMD_ARG_APP_PLAT
VSCMD_ARG_HOST_ARCH
VSCMD_ARG_TGT_ARCH
VSCMD_VER
VSINSTALLDIR
WINDIR
WINDOWSLIBPATH
WINDOWSSDKBINPATH
WINDOWSSDKDIR
WINDOWSSDKLIBVERSION
WINDOWSSDKVERBINPATH
WINDOWSSDKVERSION
WINDOWSSDK_EXECUTABLEPATH_X64
WINDOWSSDK_EXECUTABLEPATH_X86
WINDOWS_TRACING_FLAGS
WINDOWS_TRACING_LOGFILE
Win32
Win32::
Win32CORE
Win32CORE::
_
_VERSION
__DOTNET_ADD_32BIT
__DOTNET_ADD_64BIT
__DOTNET_PREFERRED_BITNESS
__VSCMD_PREINIT_PATH
_make_const
_tie_it
blessed
boolean
boot_DynaLoader
builtin
builtin::
can
ceil
constant
constant::
created_as_number
created_as_string
declare
decode
dfs
dl_error
dl_find_symbol
dl_install_xsub
dl_load_file
dl_undef_symbols
dl_unload_file
downgrade
encode
export_lexically
false
find
flags
floor
from_tuple
get_layers
hv_clear_placeholders
import
indexed
inf
is_alpha
is_bool
is_qv
is_regexp
is_tainted
is_utf8
is_weak
isa
load_module
main
main::
method_changed_in
mro
mro::
nan
native_to_unicode
new
noop
normal
numify
parse
qv
re
re::
refaddr
reftype
regexp_pattern
regname
regnames
regnames_count
stack_refcounted
stderr
stdin
stdout
stringify
to_decimal
to_dotted_decimal
trim
true
tuple
unicode_to_native
unimport
unweaken
upgrade
utf8
utf8::
valid
vcmp
version
version::
weaken

Steps to Reproduce

Adjust for your OS before running, XS paths, shell quotes, etc. All ".pm"'s and PP code was removed to make this demo as close as possible to a C breakpoint on a bare interp startup.

perl -e"DynaLoader::boot_DynaLoader('DynaLoader');&{DynaLoader::dl_install_xsub('Hash::Util::bootstrap',DynaLoader::dl_find_symbol(DynaLoader::dl_load_file('C:\pb64\lib\auto\Hash\Util\Util.dll',0),'boot_Hash__Util'),'C:\pb64\lib\auto\Hash\Util\Util.dll')}(); my %keykill; foreach(keys %{*Hash::Util::}) {$keykill{$_} = !!1}; my $a = Hash::Util::bucket_array(undef); my @b; foreach(@{$a}){my @tarr; if(ref $_){@tarr = grep{!$keykill{$_}} @{$_}; push @b, @tarr;}} print join(\"\n\",sort @b);" > t.txt

Expected behavior

While the semi-recent SV_CONST() / PL_sv_consts[] API is a GREAT IDEA. The strings/method names/typeglob names that were picked in the past, are very narrow minded, for a very tiny fraction of perl users and perl process startups. Those all UC method names are just on another planet, since they don't even cover, a bare, empty, no PP code yet, interpreter process. I'll copy paste the list here for convince.

SV_CONST_BINMODE 28
SV_CONST_CLEAR 32
SV_CONST_CLOSE 30
SV_CONST_DELETE 31
SV_CONST_DESTROY 34
SV_CONST_EOF 27
SV_CONST_EXISTS 8
SV_CONST_EXTEND 14
SV_CONST_FETCH 4
SV_CONST_FETCHSIZE 5
SV_CONST_FILENO 29
SV_CONST_FIRSTKEY 15
SV_CONST_GETC 24
SV_CONST_NEXTKEY 16
SV_CONST_OPEN 18
SV_CONST_POP 10
SV_CONST_PRINT 20
SV_CONST_PRINTF 21
SV_CONST_PUSH 9
SV_CONST_READ 22
SV_CONST_READLINE 23
SV_CONST_SCALAR 17
SV_CONST_SEEK 25
SV_CONST_SHIFT 11
SV_CONST_SPLICE 13
SV_CONST_STORE 6
SV_CONST_STORESIZE 7
SV_CONST_TELL 26
SV_CONST_TIEARRAY 1
SV_CONST_TIEHANDLE 3
SV_CONST_TIEHASH 2
SV_CONST_TIESCALAR 0
SV_CONST_UNSHIFT 12
SV_CONST_UNTIE 33
SV_CONST_WRITE 19

Expected behavior is, any HV keys, unconditionally created inside libperl at process startup, must be global, Read-Write "IMMORTAL" shared HEKs, stored/backed, by C global storage, which means, backed by the libperl.so/libperl.dll/perl.exe/perl.elf binary. Not backed by malloc() memory which is purely duplication of data that already exists in libperl.so.

Fix ideas

SV_CONST() / PL_sv_consts[] API was and is a great idea, but totally missed the most important key names. It should be refactored and expanded IMO. The current contiguous shared_HE_HEK_PVBUF struct, needs some small tweaks, to be "IMMORTAL" class type data, backed by C global var storage, not per-interp malloc. Note, because of PERL_HASH_SEED, it is impossible to make a RO/C-static/C-global shared_HE_HEK_PVBUF struct. If someone has ideas or knows a secret how to implement RO shared_HE_HEK_PVBUF structs, come forward.

One idea I had was precalculate in miniperl the hash numbers for RO HEKs, they are constants per interp-binary, and probably the RO disk hash was mixed with the current timestamp at CC compile time whenever the interp bin was compiled, There is low randomness here, but all Linux package managers families, would have different U32s in their libperl.so, even if 5.X.Y is the same.

Other idea, 1 CPU XOR ^ op against the dynamic hash seed at runtime, against all shared HEKs combined, RO .so backed, and RW malloc backed. HV* PL_strtab is per-proc/per-interp/per-my_perl struct anyways.

Easiest choice is just have RW immortal global HEKs. If the perl port knows how to do it, those 3.5KB-7KBs of strings can be made OS hardware VM RO right after PERL_HASH_SEED is read from the shell on perl proc startup.

On MSVC2022 X64, SBOX32 almost completely inlines if MSVC knows the C literal during LTO Compile time. Therefore it can be done partially in miniperl also at interp CC time.

#define _SBOX32_CASE(len,hash,state,key) \
    /* FALLTHROUGH */ \
    case len: hash ^= state[ 1 + ( 256 * ( len - 1 ) ) + key[ len - 1 ] ];

Note, the interp currentlu FORBIDS ever re-reading $ENV{PERL_HASH_SEED}, after perl_construct() or perl_init_sys3() runs. All ithreads, all embedders, will use the same seed for the rest of the proc lifetime.

Or the alternative instead of immortal heks is MUCH MORE sub AUTOLOAD{} from XS usage, so the HE/HEK/GV_H/GV_B/GP/CV_H/CV_B structs are never allocated until the interp runs (yy_lex/op_null/gv_fetchFOOpvn()/etc) into the first user's explicit PP/XS method/sub call to these special identifiers. I'm not very eager about this idea since its possible, just not my favorite.

There really is no alternative to SV_CONST() / PL_sv_consts[] API, and discussing "memory bloat" and "memory usage" of immortal HEKs, is not applicable in this case, since the memory is currently already "wasted" before the 1st ASCII char of PP code is ever parsed.

I doubt there would be consensus, to removed from core (to a core .pm) these 4 packages, version::*, Win32CORE::*, Tie::Hash::NamedCapture::* , and builtins::*. That would leave %ENV as the last HV HEK memory hog.

Other questionable packages.

STATIC void
S_init_predump_symbols(pTHX)
{
.............................
    /* Historically, PVIOs were blessed into IO::Handle, unless
       FileHandle was loaded, in which case they were blessed into
       that. Action at a distance.
       However, if we simply bless into IO::Handle, we break code
       that assumes that PVIOs will have (among others) a seek
       method. IO::File inherits from IO::Handle and IO::Seekable,
       and provides the needed methods. But if we simply bless into
       it, then we break code that assumed that by loading
       IO::Handle, *it* would work.
       So a compromise is to set up the correct @IO::File::ISA,
       so that code that does C<use IO::Handle>; will still work.
    */

    Perl_populate_isa(aTHX_ STR_WITH_LEN("IO::File::ISA"),
                      STR_WITH_LEN("IO::Handle::"),
                      STR_WITH_LEN("IO::Seekable::"),
                      STR_WITH_LEN("Exporter::"),
                      NULL);

SV_CONST() literally has no reason to even bother lazy loading/runtime optionally allocing SV heads for its sub TIE*() methods. The TIE*() HEKs are unconditional already. The current PL_sv_consts[] SV* array should just be merged into PL_sv_immortals[] SV head array.

struct xsub_details {
    const char *name;
    XSUBADDR_t xsub;
    const char *proto;
    int ix;
};

static const struct xsub_details these_details[] = {
---------------------------------------------------------
    {"Tie::Hash::NamedCapture::_tie_it", XS_NamedCapture_tie_it, NULL, 0 },
    {"Tie::Hash::NamedCapture::TIEHASH", XS_NamedCapture_TIEHASH, NULL, 0 },
    {"Tie::Hash::NamedCapture::FETCH", XS_NamedCapture_FETCH, NULL, FETCH_ALIAS },
    {"Tie::Hash::NamedCapture::STORE", XS_NamedCapture_FETCH, NULL, STORE_ALIAS },
    {"Tie::Hash::NamedCapture::DELETE", XS_NamedCapture_FETCH, NULL, DELETE_ALIAS },
    {"Tie::Hash::NamedCapture::CLEAR", XS_NamedCapture_FETCH, NULL, CLEAR_ALIAS },
    {"Tie::Hash::NamedCapture::EXISTS", XS_NamedCapture_FETCH, NULL, EXISTS_ALIAS },
    {"Tie::Hash::NamedCapture::SCALAR", XS_NamedCapture_FETCH, NULL, SCALAR_ALIAS },
    {"Tie::Hash::NamedCapture::FIRSTKEY", XS_NamedCapture_FIRSTKEY, NULL, 0 },
    {"Tie::Hash::NamedCapture::NEXTKEY", XS_NamedCapture_FIRSTKEY, NULL, 1 },
    {"Tie::Hash::NamedCapture::flags", XS_NamedCapture_flags, NULL, 0 },
};

Perl configuration

Win32 Perl 5.41.7. Edit the demo code above, and run it on your system. Note because of Win32CORE:: WinPerl is probably higher than PosixPerl for mandatory created HEKs. The problem still remains the awesome idea SV_CONST() API doesn't implement, the fundamentals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant