`

第7章 PIC、GOT和PLT和延迟绑定(动态链接补充)

 
阅读更多

       其实读完《程序员的自我修养》这本书的“动态链接”一章后,仍然云里雾里,幸运的是在网上找到国外大牛发表的这篇博客。本人愚笨,仔仔细细读了一天,做了PPT图解了本文,并且以问答的方式阐述了文中细节。彻底理解了PICGOTPLT延迟绑定,向着小牛的高峰又奋进了一步大笑

 

一、国外大牛原文

 

PLT and GOT - the key to code sharing and dynamic libraries

by Ian Wienand (Tue 10 May 2011)

 

(this post was going to be about something else, but after getting this far, I think it stands on its own as an introduction to dynamic linking)

The shared library is an integral part of a modern system, but often the mechanisms behind the implementation are less well understood. There are, of course, many guides to this sort of thing. Hopefully this adds another perspective that resonates with someone.

Let's start at the beginning — - relocations are entries in binaries that are left to be filled in later -- at link time by the toolchain linker or at runtime by the dynamic linker. A relocation in a binary is a descriptor which essentially says "determine the value of X, and put that value into the binary at offset Y" — each relocation has a specific type, defined in the ABI documentation, which describes exactly how "determine the value of" is actually determined.

Here's the simplest example:

$ cat a.c
extern int foo;

int function(void) {
    return foo;
}
$ gcc -c a.c
$ readelf --relocs ./a.o

Relocation section '.rel.text' at offset 0x2dc contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000004  00000801 R_386_32          00000000   foo

The value of foo is not known at the time you make a.o, so the compiler leaves behind a relocation (of type R_386_32) which is saying "in the final binary, patch the value at offset 0x4 in this object file with the address of symbol foo". If you take a look at the output, you can see at offset 0x4 there are 4-bytes of zeros just waiting for a real address:

$ objdump --disassemble ./a.o

./a.o:     file format elf32-i386


Disassembly of section .text:

00000000 <function>:
   0:    55         push   %ebp
   1:    89 e5                  mov    %esp,%ebp
   3:    a1 00 00 00 00         mov    0x0,%eax
   8:    5d                     pop    %ebp
   9:    c3                     ret

That's link time; if you build another object file with a value of foo and build that into a final executable, the relocation can go away. But there is a whole bunch of stuff for a fully linked executable or shared-library that just can't be resolved until runtime. The major reason, as I shall try to explain, is position-independent code (PIC). When you look at an executable file, you'll notice it has a fixed load address

$ readelf --headers /bin/ls
[...]
ELF Header:
[...]
  Entry point address:               0x8049bb0

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
[...]
  LOAD           0x000000 0x08048000 0x08048000 0x16f88 0x16f88 R E 0x1000
  LOAD           0x016f88 0x0805ff88 0x0805ff88 0x01543 0x01543 RW  0x1000

This is not position-independent. The code section (with permissions R E; i.e. read and execute) must be loaded at virtual address0x08048000, and the data section (RW) must be loaded above that at exactly 0x0805ff88.

This is fine for an executable, because each time you start a new process (fork and exec) you have your own fresh address space. Thus it is a considerable time saving to pre-calculate addresses from and have them fixed in the final output (you can make position-independent executables, but that's another story).

This is not fine for a shared library (.so). The whole point of a shared library is that applications pick-and-choose random permutations of libraries to achieve what they want. If your shared library is built to only work when loaded at one particular address everything may be fine — until another library comes along that was built also using that address. The problem is actually somewhat tractable — you can just enumerate every single shared library on the system and assign them all unique address ranges, ensuring that whatever combinations of library are loaded they never overlap. This is essentially what prelinkingdoes (although that is a hint, rather than a fixed, required address base). Apart from being a maintenance nightmare, with 32-bit systems you rapidly start to run out of address-space if you try to give every possible library a unique location. Thus when you examine a shared library, they do not specify a particular base address to be loaded at:

$ readelf --headers /lib/libc.so.6
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
[...]
  LOAD           0x000000 0x00000000 0x00000000 0x236ac 0x236ac R E 0x1000
  LOAD           0x023edc 0x00024edc 0x00024edc 0x0015c 0x001a4 RW  0x1000

Shared libraries also have a second goal — code sharing. If a hundred processes use a shared library, it makes no sense to have 100 copies of the code in memory taking up space. If the code is completely read-only, and hence never, ever, modified, then every process can share the same code. However, we have the constraint that the shared library must still have a unqiue data instance in each process. While it would be possible to put the library data anywhere we want at runtime, this would require leaving behind relocations to patch the code and inform it where to actually find the data — destroying the always read-only property of the code and thus sharability. As you can see from the above headers, the solution is that the read-write data section is always put at a known offset from the code section of the library. This way, via the magic of virtual-memory, every process sees its own data section but can share the unmodified code. All that is needed to access data is some simple maths; address of thing I want = my current address + known fixed offset.

Well, simple maths is all relative! "My current address" may or may not be easy to find. Consider the following:

$ cat test.c
static int foo = 100;

int function(void) {
    return foo;
}
$ gcc -fPIC -shared -o libtest.so test.c

So foo will be in data, at a fixed offset from the code infunction, and all we need to do is find it! On amd64, this is quite easy, check the disassembly:

000000000000056c <function>:
 56c:        55         push   %rbp
 56d:        48 89 e5               mov    %rsp,%rbp
 570:        8b 05 b2 02 20 00      mov    0x2002b2(%rip),%eax        # 200828 <foo>
 576:        5d                     pop    %rbp

This says "put the value at offset 0x2002b2 from the current instruction pointer (%rip) into %eax. That's it — we know the data is at that fixed offset so we're done. i386, on the other hand, doesn't have the ability to offset from the current instruction pointer. Some trickery is required there:

0000040c <function>:
 40c:    55         push   %ebp
 40d:    89 e5                  mov    %esp,%ebp
 40f:    e8 0e 00 00 00         call   422 <__i686.get_pc_thunk.cx>
 414:    81 c1 5c 11 00 00      add    $0x115c,%ecx
 41a:    8b 81 18 00 00 00      mov    0x18(%ecx),%eax
 420:    5d                     pop    %ebp
 421:    c3                     ret

00000422 <__i686.get_pc_thunk.cx>:
 422:    8b 0c 24       mov    (%esp),%ecx
 425:    c3                     ret

The magic here is __i686.get_pc_thunk.cx. The architecture does not let us get the current instruction address, but we can get a known fixed address — the value __i686.get_pc_thunk.cx pushes into cx is the return value from the call, i.e in this case 0x414. Then we can do the maths for the add instruction; 0x115c + 0x414 = 0x1570, the final move goes 0x18 bytes past that to 0x1588 ... checking the disassembly

00001588 <global>:
    1588:       64 00 00                add    %al,%fs:(%eax)

i.e., the value 100 in decimal, stored in the data section.

We are getting closer, but there are still some issues to deal with. If a shared library can be loaded at any address, then how does an executable, or other shared library, know how to access data or call functions in it? We could, theoretically, load the library and patch up any data references or calls into that library; however as just described this would destroy code-sharability. As we know, all problems can be solved with a layer of indirection, in this case called global offset table or GOT.

Consider the following library:

$ cat test.c
extern int foo;

int function(void) {
    return foo;
}
$ gcc -shared -fPIC -o libtest.so test.c

Note this looks exactly like before, but in this case the foo isextern; presumably provided by some other library. Let's take a closer look at how this works, on amd64:

$ objdump --disassemble libtest.so
[...]
00000000000005ac <function>:
 5ac:        55         push   %rbp
 5ad:        48 89 e5               mov    %rsp,%rbp
 5b0:        48 8b 05 71 02 20 00   mov    0x200271(%rip),%rax        # 200828 <_DYNAMIC+0x1a0>
 5b7:        8b 00                  mov    (%rax),%eax
 5b9:        5d                     pop    %rbp
 5ba:        c3                     retq

$ readelf --sections libtest.so
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
[...]
  [20] .got              PROGBITS         0000000000200818  00000818
       0000000000000020  0000000000000008  WA       0     0     8

$ readelf --relocs libtest.so
Relocation section '.rela.dyn' at offset 0x418 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
[...]
000000200828  000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0

The disassembly shows that the value to be returned is loaded from an offset of 0x200271 from the current %rip; i.e. 0x0200828. Looking at the section headers, we see that this is part of the .got section. When we examine the relocations, we see a R_X86_64_GLOB_DATrelocation that says "find the value of symbol foo and put it into address 0x200828.

So, when this library is loaded, the dynamic loader will examine the relocation, go and find the value of foo and patch the .gotentry as required. When it comes time for the code loads to load that value, it will point to the right place and everything just works; without having to modify any code values and thus destroy code sharability.

This handles data, but what about function calls? The indirection used here is called a procedure linkage table or PLT. Code does not call an external function directly, but only via a PLT stub. Let's examine this:

$ cat test.c
int foo(void);

int function(void) {
    return foo();
}
$ gcc -shared -fPIC -o libtest.so test.c

$ objdump --disassemble libtest.so
[...]
00000000000005bc <function>:
 5bc:        55         push   %rbp
 5bd:        48 89 e5               mov    %rsp,%rbp
 5c0:        e8 0b ff ff ff         callq  4d0 <foo@plt>
 5c5:        5d                     pop    %rbp

$ objdump --disassemble-all libtest.so
00000000000004d0 <foo@plt>:
 4d0:   ff 25 82 03 20 00       jmpq   *0x200382(%rip)        # 200858 <_GLOBAL_OFFSET_TABLE_+0x18>
 4d6:   68 00 00 00 00          pushq  $0x0
 4db:   e9 e0 ff ff ff          jmpq   4c0 <_init+0x18>

$ readelf --relocs libtest.so
Relocation section '.rela.plt' at offset 0x478 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200858  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0

So, we see that function makes a call to code at 0x4d0. Disassembling this, we see an interesting call, we jump to the value stored in 0x200382 past the current %rip (i.e. 0x200858), which we can then see the relocation for — the symbol foo.

It is interesting to keep following this through; let's look at the initial value that is jumped to:

$ objdump --disassemble-all libtest.so

Disassembly of section .got.plt:

0000000000200840 <.got.plt>:
  200840:       98                      cwtl
  200841:       06                      (bad)
  200842:       20 00                   and    %al,(%rax)
        ...
  200858:       d6                      (bad)
  200859:       04 00                   add    $0x0,%al
  20085b:       00 00                   add    %al,(%rax)
  20085d:       00 00                   add    %al,(%rax)
  20085f:       00 e6                   add    %ah,%dh
  200861:       04 00                   add    $0x0,%al
  200863:       00 00                   add    %al,(%rax)
  200865:       00 00                   add    %al,(%rax)
        ...

Unscrambling 0x200858 we see its initial value is 0x4d6 — i.e. the next instruction! Which then pushes the value 0 and jumps to0x4c0. Looking at that code we can see it pushes a value from the GOT, and then jumps to a second value in the GOT:

00000000000004c0 <foo@plt-0x10>:
 4c0:   ff 35 82 03 20 00       pushq  0x200382(%rip)        # 200848 <_GLOBAL_OFFSET_TABLE_+0x8>
 4c6:   ff 25 84 03 20 00       jmpq   *0x200384(%rip)        # 200850 <_GLOBAL_OFFSET_TABLE_+0x10>
 4cc:   0f 1f 40 00             nopl   0x0(%rax)

What's going on here? What's actually happening is lazy binding — by convention when the dynamic linker loads a library, it will put an identifier and resolution function into known places in the GOT. Therefore, what happens is roughly this: on the first call of a function, it falls through to call the default stub, which loads the identifier and calls into the dynamic linker, which at that point has enough information to figure out "hey, this libtest.so is trying to find the function foo". It will go ahead and find it, and then patch the address into the GOT such that the next time the original PLT entry is called, it will load the actual address of the function, rather than the lookup stub. Ingenious!

Out of this indirection falls another handy thing — the ability to modify the symbol binding order. LD_PRELOAD, for example, simply tells the dynamic loader it should insert a library as first to be looked-up for symbols; therefore when the above binding happens if the preloaded library declares a foo, it will be chosen over any other one provided.

In summary — code should be read-only always, and to make it so that you can still access data from other libraries and call external functions these accesses are indirected through a GOT and PLT which live at compile-time known offsets.

In a future post I'll discuss some of the security issues around this implementation, but that post won't make sense unless I can refer back to this one :)

 

二、Q&A

1. 本文的思路是什么?或者说:请说清楚“地址无关代码PIC”、GOT和PLT和延迟绑定之间的关系?

    将test.c做成共享库libtest.so(gcc -fPIC -shared -o libtest.so test.c),共享库是由“代码”和“数据”两部分组成的。在运行时很多进程(p1, p2, ...)都可能用到她,每个进程pi都有自己独立的虚拟进程地址空间,pi加载的虚拟地址可以固定,但是它用到的共享库libtest.so加载的虚拟地址则是运行时动态决定的(编译时不能确定)。为了保证所有进程p1,p2,...都能共享libtest.so,需要将“数据”从“代码”中分离出去。
    (1)分离后“代码”部分是“地址无关代码(PIC)”。
    (2)分离后“数据部分”则有两种情形:(1)自己定义的;(2)引用外部模块定义的。
    自己定义的不需用到GOT;而引用外部模块定义的需要用到GOT. GOT就是一个待填补的空,在重定位表中会有对应项说明GOT中每一个需要被填补的空缺,在外部模块被加载的时候会将这个空填上。


2. 下面代码中,调用call 422 <__i686.get_pc_thunk.cx>为何可以把下一条指令的地址放到寄存器ecx中呢?
0000040c <function>:
 40c:    55         push   %ebp
 40d:    89 e5                  mov    %esp,%ebp
 40f:    e8 0e 00 00 00         call   422 <__i686.get_pc_thunk.cx>
 414:    81 c1 5c 11 00 00      add    $0x115c,%ecx
 41a:    8b 81 18 00 00 00      mov    0x18(%ecx),%eax
 420:    5d                     pop    %ebp
 421:    c3                     ret

00000422 <__i686.get_pc_thunk.cx>:
 422:    8b 0c 24       mov    (%esp),%ecx
 425:    c3                     ret

答:其实就是在代码中调用了一个函数(__i686.get_pc_thunk.cx),在这个函数中把esp(堆栈)中存放的返回地址存到某个寄存器中,然后再原来的代码中从寄存器(此处为ecx)中取出来。

 

三、图解动态链接精髓——“PIC、GOT和PLT和延迟绑定”



 

 

 

 

 

 

 

  • 大小: 26.1 KB
  • 大小: 22.3 KB
  • 大小: 24 KB
  • 大小: 98.9 KB
  • 大小: 74.2 KB
  • 大小: 87.8 KB
分享到:
评论

相关推荐

    第7章 动态链接

    动态链接是操作系统和编程技术中的一个..."第7章 动态链接"的内容可能会深入讨论这些概念,帮助读者理解如何在实践中应用动态链接技术。通过阅读"PLT and GOT.docx",可以更深入地了解动态链接的关键组件及其工作方式。

    调试PLT/GOT代码

    本教程将深入探讨动态链接中的两个关键概念:PLT(Procedure Linkage Table)和GOT(Global Offset Table),并结合提供的文件进行调试实践。 PLT(Procedure Linkage Table)是动态链接器用来调用动态库函数的一种...

    Linux Debugging(七): 使用反彙編理解動態庫函數調用方式GOT PLT1

    这种方法对于理解和调试涉及到动态链接的问题非常有用,特别是在处理延迟绑定(lazy binding)或解决函数调用错误时。 总的来说,动态库的函数调用在Linux环境下依赖于位置无关代码、GOT和PLT来实现。了解这些概念...

    可执行文件动态链接分析

    ### 可执行文件动态链接分析 #### 一、引言 在现代操作系统中,特别是Linux环境下,可执行文件经常依赖于动态链接库。这种技术允许程序在运行时加载所需的库,从而减少内存占用并提高资源利用率。ELF(Executable ...

    linux动态链接机制研究及应用

    PLT提供了一种方法来延迟绑定函数调用,即在第一次调用某个函数时才进行实际的地址绑定,从而提高了程序的性能。 - **PLT结构**:PLT通常由一系列的小段代码组成,每一段代码对应一个要调用的函数。当程序首次尝试...

    linux动态链接机制

    本文主要探讨Linux下动态链接的实现机制和原理,并详细介绍其关键技术:位置无关代码(Position Independent Code, PIC)、全局偏移表(Global Offset Table, GOT)和过程链接表(Procedure Link Table, PLT)。...

    001_2 Linux 动态链接机制研究及应用1

    《Linux动态链接机制研究...总之,《Linux动态链接机制研究及应用1》深入剖析了Linux动态链接的各个方面,包括ELF文件格式、PIC、GOT、PLT等关键概念,为理解Linux系统运行原理以及优化软件性能提供了宝贵的理论基础。

    Linker动态链接库详细1

    动态链接是程序构建和执行的一种方法,它与静态链接相对,解决了静态链接带来的诸多问题。在静态链接中,编译器将所有库函数和其他模块直接合并到可执行文件中,导致文件体积大,升级和部署不便。动态链接则将部分...

    Linux动态链接机制研究及应用

    基于ELF(可执行和链接格式)文件格式,我们详细讨论了Linux动态链接的核心技术:位置无关代码(PIC)、全局偏移表(GOT)以及过程链接表(PLT)。此外,还分析了动态库的加载映射过程以及符号解析技术,并展示了一...

    PLT文件阅读器,安装程序用于打开PLT文件

    在CAD(Computer-Aided Design)领域,PLT文件常被用来保存和交换设计数据。 安装程序用于打开PLT文件的软件通常是为了方便用户查看和处理这些专业图形文件。由于PLT文件不常见于常规的图形查看软件中,因此需要...

    PLT文件转成G代码_PLT文件解析

    标题中的“PLT文件转成G代码”是指将PLT文件转换为数控设备使用的G代码,这是一种常见的工艺流程,主要用于CAD/CAM系统中。PLT是HPGL(Hewlett-Packard Graphics Language)的文件格式,常用于绘图仪和打印机,而G...

    AutoCAD PLT 文件查看器

    AutoCAD PLT 文件查看器是一种专门用于打开...总之,AutoCAD PLT文件查看器是设计和工程领域的实用工具,它简化了对HPGL文件的查看和基本编辑过程,是AutoCAD的一个补充,尤其适用于那些只需要处理PLT格式文件的用户。

    编译器利用pc指针和ldr生成PIC演示

    3. **动态链接器的帮助**:动态链接器在程序运行前或运行时负责将位置无关的代码转换为绝对地址,这包括对GOT的填充和重定位表的处理。 4. **使用plt(Procedure Linkage Table)**:对于函数调用,PLT是一个间接层...

    PLT转PDF的小软件

    标题中的“PLT转PDF的小软件”是一款工具,主要用于将PLT文件格式转换为更通用的PDF文件格式。PLT是HPGL(Hewlett-Packard Graphics Language)的图形文件格式,常用于绘图仪和CAD(计算机辅助设计)软件。这种格式...

    plthook, 通过替换 PLT ( 过程链接表) 条目来调用钩子函数.zip

    plthook, 通过替换 PLT ( 过程链接表) 条目来调用钩子函数 PLTHook 什么是 plthook 。用于钩子由指定对象文件( 可执行文件和库) 发出的库函数调用的实用工具库。 这将修改大多数unix或者 IAT ( 导入地址表) 中使用的...

    C# 读取 plt文件

    采用C#代码中实现读取plt内容,实现展示plt文件内容 plt是一种CAD文件格式

    PLT DXF格式转换 器

    PLT和DXF是两种在服装CAD(计算机辅助设计)领域广泛应用的文件格式。它们各自具有独特的特性和用途,但有时需要在两者之间进行转换,以便于不同软件间的兼容性和数据交换。 PLT格式,全称HPGL(Hewlett-Packard ...

    PLT转SLG.zip

    它们包含了程序运行所需的动态链接库文件,例如内存管理、输入/输出操作等基础功能。 2. `PLT2SLG.exe` 应该是主执行文件,即实际执行PLT到SLG转换的程序。通过双击或在命令行中运行这个文件,用户可以启动转换过程...

Global site tag (gtag.js) - Google Analytics