I'm trying to determine my module's load address at runtime. By 'module's load address', I mean byte[0] of the in-memory image (ie, the first byte of the Elf32_Ehdr). I believe I want information from the struct module in kernel/modules.c. I did find sys_query_module, but it has been depricated.
Everything I've found on the web is kernel-centric [1,2], and Stevens does not cover it in Advanced Unix Programming. In the Windows world, I would use __ImageBase (fixed up by the link-loader) or GetModuleHandle(...).
Can anyone point me to the proper syscall? (Or to a forum that fields C/C++ and Linux API questions).
Thanks, Jeffrey Walton
[1] LKML: "Richard B. Johnson": Re: determining load address of module [2] Linux-Kernel Archive: Re: determining load address of module
In article <e5661e7b-0927-4cda-b1e9-652274633...@t18g2000vbj.googlegroups.com>, Jeffrey Walton <noloa...@gmail.com> wrote:
>Hi All,
>I'm trying to determine my module's load address at runtime. By >'module's load address', I mean byte[0] of the in-memory image (ie, >the first byte of the Elf32_Ehdr). I believe I want information from >the struct module in kernel/modules.c. I did find sys_query_module, >but it has been depricated.
>Everything I've found on the web is kernel-centric [1,2], and Stevens
That's because you're using the word "module" in a foreign way. We don't use it that way. Here, "module" means kernel module 99.44% of the time.
You can probably get what you want by parsing /proc/self/maps. The lack of a well-known function to do this query should tip you off that it's not considered a normal thing to ask. If you're writing something like a debugger, fine. Otherwise, what's the purpose of finding the in-memory copy of an ELF header? What are you going to do with that information that you can't do without it? The dynamic linker should fix up any pointers you need within your address space. Doing it manually is icky. (Oh, if you're writing a dynamic linker that's fine too. An icky job!)
On Nov 7, 5:44 pm, pac...@kosh.dhis.org (Alan Curry) wrote:
> In article <e5661e7b-0927-4cda-b1e9-652274633...@t18g2000vbj.googlegroups.com>, > Jeffrey Walton <noloa...@gmail.com> wrote:
> >Hi All,
> >I'm trying to determine my module's load address at runtime. By > >'module's load address', I mean byte[0] of the in-memory image (ie, > >the first byte of the Elf32_Ehdr). I believe I want information from > >the struct module in kernel/modules.c. I did find sys_query_module, > >but it has been depricated.
> >Everything I've found on the web is kernel-centric [1,2], and Stevens
> That's because you're using the word "module" in a foreign way. We don't use > it that way. Here, "module" means kernel module 99.44% of the time.
My bad. Would 'image' be a better term in the Linux world?
> The lack of a well-known function to do this query should tip > you off that it's not considered a normal thing to ask.
Agreed.
> If you're writing something like a debugger, fine. Otherwise, what's the > purpose of finding the in-memory copy of an ELF header? What are you > going to do with that information that you can't do without it?
FIPS integrity checks. Locating a particular section in memory is an early smoke test.
> The dynamic linker should fix up any pointers you need within your > address space. > Doing it manually is icky. (Oh, if you're writing a dynamic linker that's > fine too. An icky job!)
Agreed.
I thought I found the load address in struct r_debug::r_ldbase (from elf.h). But when I iterated the array of r_debugs, I found the base address for ld-linux.so.2.
Using dl_iterate_phdr(3), the first header returned to my callback from dl_iterate_phdr relates to my image's load address (the remaining headers appear to be SO's). Assuming 4KB pages, it can be found in the virtual address of dlpi_phdr:
I believe, with a high degree of certainty, the image is being loaded at 0x8048000: (gdb) print (char*) 0x8048000 $1 = 0x8048000 "\177ELF\001\001\001"
This begs two questions. First, is 0x8048000 (for x86) always the address base (or an address I can control from the linker)? Second, does dl_iterate_phdr(3) always return the image's base information on the *first* invocation of the callback.
In article <d9c7a43a-46aa-45cd-98f4-059f5bd6c...@m35g2000vbi.googlegroups.com>, Jeffrey Walton <noloa...@gmail.com> wrote:
>Hi Alan,
>On Nov 7, 5:44 pm, pac...@kosh.dhis.org (Alan Curry) wrote: >> In article ><e5661e7b-0927-4cda-b1e9-652274633...@t18g2000vbj.googlegroups.com>, >> Jeffrey Walton <noloa...@gmail.com> wrote:
>> >Everything I've found on the web is kernel-centric [1,2], and Stevens
>> That's because you're using the word "module" in a foreign way. We don't use >> it that way. Here, "module" means kernel module 99.44% of the time. >My bad. Would 'image' be a better term in the Linux world?
I'm not sure what the definition of "module" is where you come from so I can't translate it. It seems to include "main executable" and "shared library" as subcases.
>Using dl_iterate_phdr(3), the first header returned to my callback
Oh you found a nice function to do the query after all.
>I believe, with a high degree of certainty, the image is being loaded >at 0x8048000: >(gdb) print (char*) 0x8048000 >$1 = 0x8048000 "\177ELF\001\001\001"
>This begs two questions. First, is 0x8048000 (for x86) always the >address base (or an address I can control from the linker)? Second, >does dl_iterate_phdr(3) always return the image's base information on >the *first* invocation of the callback.
0x8048000 has been the default for a while. Other archs do have different defaults. I only vaguely remember the time when it used to be something different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000 depending on linker options). You can override it when linking your program, but as long as the program isn't rebuilt, the main executable will load at the same address every time. Shared libraries can move around between invocations (are position-independent), but the main program body won't.
readelf -l or objdump -p can show you where the program's segments will be mapped. The interesting ones are the ones marked LOAD.
As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not going to guess.
> In article <d9c7a43a-46aa-45cd-98f4-059f5bd6c...@m35g2000vbi.googlegroups.com>, > Jeffrey Walton <noloa...@gmail.com> wrote:
> [SNIP]
> >This begs two questions. First, is 0x8048000 (for x86) always the > >address base (or an address I can control from the linker)? Second, > >does dl_iterate_phdr(3) always return the image's base information on > >the *first* invocation of the callback.
> 0x8048000 has been the default for a while. Other archs do have different > defaults. I only vaguely remember the time when it used to be something > different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000 > depending on linker options). You can override it when linking your program, > but as long as the program isn't rebuilt, the main executable will load at > the same address every time. Shared libraries can move around between > invocations (are position-independent), but the main program body won't.
> readelf -l or objdump -p can show you where the program's segments will be > mapped. The interesting ones are the ones marked LOAD.
Lots of objdump and readelf seemed to the trick. You're right about LOAD - and I also needed flags = PF_R|PF_X to separate the code from the data segment.
> As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not > going to guess.
With LOAD and PF_R|PF_X, I can find it every time.
Thanks for your help. I know I have a couple more questions for tomorrow :)
Jeffrey Walton wrote: > On Nov 7, 9:59 pm, pac...@kosh.dhis.org (Alan Curry) wrote: >> In article <d9c7a43a-46aa-45cd-98f4-059f5bd6c...@m35g2000vbi.googlegroups.com>, >> Jeffrey Walton <noloa...@gmail.com> wrote:
>> [SNIP]
>>> This begs two questions. First, is 0x8048000 (for x86) always the >>> address base (or an address I can control from the linker)? Second, >>> does dl_iterate_phdr(3) always return the image's base information on >>> the *first* invocation of the callback. >> 0x8048000 has been the default for a while. Other archs do have different >> defaults. I only vaguely remember the time when it used to be something >> different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000 >> depending on linker options). You can override it when linking your program, >> but as long as the program isn't rebuilt, the main executable will load at >> the same address every time. Shared libraries can move around between >> invocations (are position-independent), but the main program body won't.
>> readelf -l or objdump -p can show you where the program's segments will be >> mapped. The interesting ones are the ones marked LOAD. > Lots of objdump and readelf seemed to the trick. You're right about > LOAD - and I also needed flags = PF_R|PF_X to separate the code from > the data segment.
>> As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not >> going to guess. > With LOAD and PF_R|PF_X, I can find it every time.
> Thanks for your help. I know I have a couple more questions for > tomorrow :)
> Jeff
Befor you proceed too far, please think that you're running on a demand-paged virtual-memory system.
You did not say it, but I guess that the relevant processor architecture is Intel 386+.
The run executables are loaded at the same virtual address, but the real physical addresses will be determined dynamically at run-time. For different processes running the same executable, the physical addresses may be the same (for read-only sections).
The dynamic libraries are linked at most suitable virtual addresses. The same library may be located at different virtual addresses in different processes at the same time. This is why the dynamic libraries have to be position-independent code.