堆利用

1.1 glibc 堆概述

1.1.1 内存管理与堆

内存管理是对计算机的内存资源进行管理，这要求在程序请求时能够动态分配内存的一部分，并在程序不需要时释放分配的内存。CTF中常见的ptmalloc2就是glibc实现内存管理机制，它继承自dlmalloc，并提供了对多线程的支持。

其他常见的堆管理机制还有 dlmalloc， tcmalloc ，jemalloc等，一般这些机制由用户显示的调用malloc())函数申请内存，调用free()函数释放内存。除此之外，还有由编程语言实现的自动内存管理机制，也就是垃圾回收。

堆是程序虚拟内存中由低地址向高地址增长的线性区域。一般只有当用户向操作系统申请内存时，这片区域才会被内核分配出来，并且由于页对齐和效率的问题，一般分配的空间很大。堆的位置一般在BSS段的高地址处。

brk()和sbrk()

堆的属性是可读可写的，大小通过brk()和sbrk()函数进行控制。如下图，在堆未初始化时，program_break 指向BSS段的末尾，通过调用brk()和sbrk()函数来移动progam_break使得堆增长。在堆初始化时，如果开启了ASLR,则堆的起始地址start_brk会在BSS段之后的随机位移处，没有开启就会紧接着BSS段。

brk()函数的参数是一个指针，用于设置program_break指向的位置，sbrk()函数的参数increment（可以是负值）用于与program_break相加来调整program_break的值。

mmap()和unmmap()

当用户申请的内存过大时，ptmalloc2会选择通过mmap（）函数创建匿名映射段供用户使用，并通过unmmap（）函数回收。匿名映射的目的主要是可以申请以 0 填充的内存，并且这块内存仅被调用进程所使用。

glibc中的堆

通常来说，系统中的堆是指主线程中的main_arena所管理的区域，但glibc会同时维持多个区域来供多线程使用，每个线程都有属于自己的内存（称为arena），这些连续的内存也可以称为堆。

glibc的想法是：当用户申请堆块时，从堆中按顺序分配堆块交给用户，用户保存指向这些堆块的指针；当用户释放堆块时，glibc会将释放的堆块组织成链表；当两块相邻堆块都为释放状态时将之合并为一个新的堆块；由此解决内存碎片的问题。

用户正在使用中的堆块叫做allocated chunk，被释放的堆块叫做free chunk ，由free chunk 组成的链表叫做bin。为了方便管理，将大小范围不同的chunk组织成不同的bin。如fast bin ，small bin ，large bin等。

1.1.2 重要概念和结构体

堆的操作就这么复杂，那么在 glibc 内部必然也有精心设计的数据结构来管理它。与堆相应的数据结构主要分为

宏观结构，包含堆的宏观信息，可以通过这些数据结构索引堆的基本信息。
微观结构，用于具体处理堆的分配与回收中的内存块。

1.1.3宏观结构

arena

arena包含一片或数片连续的内存，堆块将会从这片区域划分给用户。主线程的arena被称为main_arena ，它包含start_brk和brk之间这片连续内存。

主线程的arena只有堆，子线程的arena可以有数片连续内存。如果主线程的堆大小不够分的话可以用brk()调用来扩展，但是子线程分配的映射段大小是固定的，不可以扩展，所以子线程分配出来的一段映射段不够用的话就需要再次用mmap()来分配新的内存。

heap_info

子线程的arena可以有多片连续内存，这些内存被称为heap。每一个heap都有自己的heap header，heap header 是通过链表相连接的，并且heap header 里面保存了指向器所属的arena的指针。

typedef struct _heap_info
{
  mstate ar_ptr;            /* Arena for this heap. */
  struct _heap_info *prev;  /* Previous heap. */
  size_t size;              /* Current size in bytes. */
  size_t mprotect_size;     /* Size in bytes that has been mprotected
                             PROT_READ|PROT_WRITE.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

该结构主要是描述堆的基本信息，包括

堆对应的arena的地址
由于一个线程申请一个堆之后，可能会使用完，之后就必须得再次申请，因此，一个线程肯=可能会有多个堆。prev即记录了上一个heap_info 的地址。这里可以看到每个堆的heap_info 是通过单项链表进行链接的。
size表示当前堆的大小。
最后一部分确保对齐

malloc_state

该结构用于管理堆，记录每个 arena 当前申请的内存的具体状态，比如说是否有空闲 chunk，有什么大小的空闲 chunk 等等。无论是 thread arena 还是 main arena，它们都只有一个 malloc state 结构。由于 thread 的 arena 可能有多个，malloc state 结构会在最新申请的 arena 中。

每个线程只有一个arena header ，里面保存了bins，top chunk 等信息。主线程main_arena保存在libc.so的数据段里面，其他线程的arena则保存在给该arena分配的heap里面。malloc_state定义如下：

struct malloc_state
{
  /* Serialize access.  */
  __libc_lock_define (, mutex);
 
  /* Flags (formerly in max_fast).  */
  int flags;
 
  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;
 
  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];
 
  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;
 
  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;
 
  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];
 
  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];
 
  /* Linked list */
  struct malloc_state *next;
 
  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;
 
  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;
 
  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

__libc_lock_define (, mutex);
该变量用于控制程序串行访问同一个分配区，当一个线程获取了分配区之后，其他线程要想访问该分配区，就必须等待该线程分配完成后才能使用。
flags
- flags记录了分配区的一些标志，比如bit0记录了分配区是否有fast bin chunk ，bit1标识分配区是否能返回连续的虚拟地址空间。
fastbinsY[NFASTBINS]
- 存放每个fast chunk链表头部的指针
top
- 指向分配区的top chunk
last_remainder
- 最新的chunk分割后剩下的那部分
bins
- 用于存储unsorted bin，small bin 和 large bin的chunk 链表。
binmap
- ptmalloc 用一个bit 来标识某一个bin中是否包含空闲的chunk。

mlloc_chunk

chunk是glibc管理内存的基本单位，整个堆在初始化后会被当成一个free chunk ，被称为top chunk，每次用户请求内存时，如果bins中没有合适的chunk ，malloc就会从top chunk中进行划分，如果top chunk 也不够，那就用brk函数来扩展堆的大小。

用户释放内存时，glibc会先根据情况将释放的chunk与其他free chunk合并，然后加入合适的bin中。

接下来我们来看glibc如何记录每个chunk的大小，状态和指针等信息，下面是malloc_chunk结构体的定义:

/*
  This struct declaration is misleading (but accurate and necessary).
  It declares a "view" into memory allowing access to necessary
  fields at known offsets from a given base. See explanation below.
*/
struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

默认情况下，INTERNAL_SIZE_T的大小在64位系统下是八位字节，32位系统下是四字节。

prev_size：如果上一个chunk处于释放状态，用于表示其大小；否则作为上一个chunk的一部分，用于保存上一个chunk的数据。
size ：表示当前chunk的大小，根据规定必须是2*SIZE_SZ的整数倍。默认情况下，SIZE_SZ在64位系统下是八字节，32位系统下是四字节。受到页对齐的影响，最后3个比特位用作状态标识，其中最低的两个比特位，从高到低分别代表:
- IS_MAPPED :用于标识一个chunk是否是从mmap()函数中获得的。如果用户申请一个相当大的内存，malloc会通过mmap（）函数分配一个映射段。
- PREV_INUSE ：当它为1时，表示上一个chunk处于使用状态，否则表示上一个chunk处于释放状态。
fd和bk：仅在当前chunk处于释放状态有效。chunk被释放后会被加入相应的bin链表中，此时fd和bk指向该chunk在链表中的上一个和下一个free chunk （不一定是物理相邻的)。如果当前chunk处于使用状态，那么这两个字段是无效的，都是用户使用的空间。
fd_nextsize和bk_nextsize：与fd和bk相似，仅在处于释放状态时有效，否则就是用户使用空间。不同的是它们仅用于large bin ，分别指向前后第一个和当前chunk大小不同的mem指针。
每次 malloc 申请得到的内存指针，其实指向 user data 的起始处。
当一个 chunk 处于使用状态时，它的下一个 chunk 的 prev_size 域无效，所以下一个 chunk 的该部分也可以被当前 chunk 使用。这就是 chunk 中的空间复用。

1.1.4微观结构

chunk被释放时，glibc会将它们重新组织起来，构成不同的bin表，当用户再次申请时，就从中选择合适的chunk返回用户。不同大小的chunk被划分到不同的bin中，再加上一种特殊的bin，一共四种，Fast bin,Small bin,Large bin,Unsorted bin。这些bin记录在malloc_state结构中。

fastbinsY：这是一个数组，里面有NFASTBINS个fast bin。
bins ：也是一个数组，一个126个bin，按顺序分别是：
- bin1 为unsorted bin
- bin 2 到bin 63 为small bin
- bin 64 到bin126 为large bin

fast bin

在实践中，程序申请和释放的堆块都比较小，所以glibc对这类bin采用单链表结构，并采用先进后出的分配策略。为了加快速度，fast bin不会对chunk进行合并操作，所以下一个chunk的PRV_INUSE始终标记为1，使其处于使用状态。同一个fast bin 里的chunk 大小相同，并在fastbinsY数组里按照从小到大的顺序排序，序号为0的fast bin中容纳的chunk大小为4*SIZE_SZ字节，随着序号的增加，所容纳的chunk递增2*SIZE_SZ字节。

unsorted bin

一定大小的chunk被释放时，在进入small bin和large bin 之前，会先加入unsorted bin 。在实践中，一个被释放的chunk常常很快就会被重新使用，所以先将其加入unsorted bin ，可以加快分配的速度。unsorted bin使用双链表结构，并采用先进后出的分配策略。与fastbinsY不同，unsorted bin 中的chunk大小可能是不同的，并且由于是双链表结构，一个bin会占用bins的两个元素。

small bin

同一个small bin里，chunk 的大小相同，采用双链表结构，使用频率介于fast bin 和large bin 之间。small bin 在bins里面居第2到第63位，共62个。根据排序，每个small bin的大小为2*SIZE_SZ*idx(idx表示bins数组的下标)。在64位系统下，最小的small bin为32字节，最大的为1008字节。由于small bin和fast bin有重合的部分，所以这些chunk在某些情况下会被加入到small bin中。

large bin

large bin 在bins里面居64位到126位，共63个，被分成了6组，每组所能容纳的chunk 按顺序排列成等差数列。

组	数量	公差
1	32	64B
2	16	512B
3	8	4096B
4	4	32768B
5	2	262144B
6	1	不限制

large bin 也是采用的双链表结构，里面的chunk从头结点的fd指针开始，按大小顺序进行排序。为了加快检索速度，fd_nextsize 和bk_nextsize指针用于指向第一个与自己大小不同的chunk，所以也只有在加入了大小不同的chunk时，这两个指针才会被修改。

1.1.5 chunk 相关宏源码

对于用户来说，只需要确保malloc()函数返回的内存不会发生溢出，并且在不用的时候使用free()函数将其释放，以后也不再做任何操作。

而对于glibc来说，它要在用户第一次调用malloc函数之前对堆进行初始化；并且在用户频繁神申请和释放时维护堆的结构，保证时间和空间上的效率；同时检测过程中可能产生的错误，并及时终止程序。

request2size()

#define request2size(req)                                         \
  (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)  ?             \
   MINSIZE :                                                      \
   ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
#define MALLOC_ALIGN_MASK  ((2*SIZE_SZ) - 1)

这个宏将请求的req转换成包含chunk头部（presize和size）的chunk大小，实例如下：（MINSIZE默认为0x20）

req是请求的空间

当req属于[0,9)时，返回0x20
当req为0x9时，返回0x20
当req为0x18时，返回0x20

可能有的人会疑惑，0x18的user data 加上头部0x10就已经是0x28了，为什么返回的chunk却是0x20。这是因为如果当前chunk在使用中，下一个chunk的pre_inuse 成员就会属于当前chunk，所以就多出了0x8的使用空间。考虑到这一点，当req在0x9到0x18之间时，对应的chunk大小为0x10，当req在0x19到0x28之间时，对应的chunk大小为0x20。

chunk2mem()和mem2chunk()

1 2	#define chunk2mem (p) ((void)((char)(p) + 2* SIZE_SZ )) #define mem2chunk (mem) ((mchunkptr)((char)(mem) - 2 SIZE_SZ ))

chunk2mem()把指向chunk的指针转换成指向user data的指针，常出现在malloc()函数返回时，mem2chunk()把指向user data的指针转化为指向chunk的指针，常出现在free()函数开始时。

chunk状态相关

定义PREV_INUSE,IS_MMAPPED,NON_MAIN_ARENA（chunk是否属于非主线程）以及对三者进行或运算（用于掩码）

/* 当前一个相邻块正在使用时，大小字段与 PREV_INUSE 进行或运算 */
#define PREV_INUSE 0x1
/* 如果块是使用 mmap() 获得的，则 size 字段与 IS_MMAPPED 进行或运算 */
#define IS_MMAPPED 0x2
/* 如果获取了块，则 size 字段与 NON_MAIN_ARENA 进行或运算
   来自非主战场。这仅在即将交付之前设置
   如有必要，将块发送给用户。*/
#define NON_MAIN_ARENA 0x4
/*
   提取大小时要屏蔽的位
   注意：IS_MMAPPED 故意没有从 size 字段中屏蔽掉
   永远不应该看到映射块的宏。这应该
   如果意外尝试，会导致有用的核心转储发生
   人们扩展或适应这个malloc。
*/
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)

通过chunk指针p，对某标志位进行提取，检查，置位，清除操作

/* 提取前一个块的 inuse 位 */
#define prev_inuse (p) ((p)->mchunk_size & PREV_INUSE)
/* 检查 mmap() 的块 */
#define chunk_is_mmapped (p) ((p)->mchunk_size & IS_MMAPPED)
/* 检查来自主竞技场的块。*/
#define chunk_main_arena (p) (((p)->mchunk_size & NON_MAIN_ARENA) == 0)
/* extract p's inuse bit */
#define inuse(p)                                                 \
  ((((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size) & PREV_INUSE)
/* set/clear chunk as being inuse without otherwise disturbing */
#define set_inuse(p)                                              \
  ((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size |= PREV_INUSE
#define clear_inuse(p)                                             \
  ((mchunkptr) (((char *) (p)) + chunksize (p)))->mchunk_size &= ~(PREV_INUSE)

由于当前chunk的使用状态是由下一个chunk的size成员PREV_INUSE比特位决定的，所以可以通过下面的宏获得或者修改当前chunk的inuse状态。

/* check/set/clear inuse bits in known places */
#define inuse_bit_at_offset(p, s)                                    \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size & PREV_INUSE)

#define set_inuse_bit_at_offset(p, s)                                \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size |= PREV_INUSE)

#define clear_inuse_bit_at_offset(p, s)                              \
  (((mchunkptr) (((char *) (p)) + (s)))->mchunk_size &= ~(PREV_INUSE))

set_head_size()修改size时不会修改当前chunk的标志位，而set_head()会。set_foot()修改下一个chunk的prev_size时，当前chunk一定要处于释放状态，不然下一个chunk的pre_size是没有意义的。

/* Set size at head, without disturbing its use bit */
#define set_head_size(p, s)  ((p)->mchunk_size = (((p)->mchunk_size & SIZE_BITS) | (s)))

/* Set size/use field */
#define set_head(p, s)       ((p)->mchunk_size = (s))

/* Set size at footer (only when chunk is not in use) */
#define set_foot(p, s)       (((mchunkptr) ((char *) (p) + (s)))->mchunk_prev_size = (s))
/* Get size, ignoring use bits */
#define chunksize(p) (chunksize_nomask (p) & ~(SIZE_BITS))

next_chunk()将当前chunk地址加上当前chunk大小获得下一个chunk的指针。pre_chunk()将当前chunk地址减去prev_size值获得上一个chunk的指针，前提是上一个chunk处于释放状态。chunk_at_offset()将当前地址加上s偏移处的位置视为一个chunk。

/* Ptr to next physical malloc_chunk. */
#define next_chunk(p) ((mchunkptr) (((char *) (p)) + chunksize (p)))

/* Size of the chunk below P.  Only valid if !prev_inuse (P).  */
#define prev_size(p) ((p)->mchunk_prev_size)

/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr) (((char *) (p)) + (s)))

chunk合并过程

当一个非fast bin的chunk被释放时，会与相邻的chunk进行合并，顺序通常是先向后（上）合并（低地址）再向前（下）合并（高地址）。如果合并的chunk是top chunk ，则合并之后会形成新的top chunk ；如果不是的话，则合并之后会被加入到unsorted bin中。

chunk拆分过程

当用户申请的chunk较小时，回先将一个大的chunk进行拆分，合适的部分返回给用户，剩下的部分（称为remainder）则加入unsorted bin中。同时malloc_state 中last_remainder 会记录最近拆分出来的remainder。当然，这个remainder的大小至少要为MINSIZE（默认为0x20），否则不能拆分。

拆分chunk的一种情况是：fast bin和small bin 中都没有合适的chunk，同时unsorted bin中有且仅有一个可拆分的chunk，并且该chunk是last remainder。

common macro

这里介绍一些通用的宏

根据chunk的大小统一的获得chunk所在的索引

1 2	#define bin_index(sz) \ ((in_smallbin_range(sz)) ? smallbin_index(sz) :largebin_index(sz))

1.1.5 bin相关源码

fastbin相关

/*
   Fastbins
    An array of lists holding recently freed small chunks.  Fastbins
    are not doubly linked.  It is faster to single-link them, and
    since chunks are never removed from the middles of these lists,
    double linking is not necessary. Also, unlike regular bins, they
    are not even processed in FIFO order (they use faster LIFO) since
    ordering doesn't much matter in the transient contexts in which
    fastbins are normally used.
    Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks.
 */
typedef struct malloc_chunk *mfastbinptr;
#define fastbin(ar_ptr, idx) ((ar_ptr)->fastbinsY[idx])
/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)
#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)

fastbinY数组其实没有保存头结点，而是只保存了malloc_chunk的fd成员，因为其他成员对于单链表头结点并没有用，所以就省略了。这些fd指针的初始值为NULL，表示对应的bin为空，直到有chunk加进来时，fd才保存chunk的地址。

fastbin中的chunk一遍不会与其他chunk合并。但如果合并之后的chunk大于FASTBIN_CONSOLIDATION_THRESHOLD，就会触发malloc_consolidate()函数，将fastbin中的chunk与其他free chunk 合并，然后移动到unsorted bin 中。

1	#define FASTBIN_CONSOLIDATION_THRESHOLD (65536UL)

fast bin 中最大的chunk是由global_max_fast 决定的，这个值一般在堆初始化的时候设置，当然在运行时也可以设置的。

#define set_max_fast(s) \
 global_max_fast = (((s) == 0)             \
 ? SMALLBIN_WIDTH : ((s + SIZE_SZ) & ~MALLOC_ALIGN_MASK))
#define get_max_fast() global_max_fast

bins相关

/* addressing -- note that bin_at(0) does not exist */
#define bin_at(m, i) \
  (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2]))                              \
             - offsetof (struct malloc_chunk, fd))
/* analog of ++bin */
#define next_bin(b)  ((mbinptr) ((char *) (b) + (sizeof (mchunkptr) << 1)))
/* Reminders about list directionality within bins */
#define first(b)     ((b)->fd)
#define last(b)      ((b)->bk)

可以看到binat(m,i)宏定义中减去了offsetof (struct malloc_chunk, fd)),也就是prev_size和size成员的大小。这是因为bins数组实际上只保存了双链表的头结点的fd和bk。如下图所示：

bins[0]和bins[1]是unsorted bin 的fd和bk指针，binat(1)返回的应该是unsorted bin 的头指针，但实际上其指向的是bins[0]地址减去offsetof (struct malloc_chunk, fd))的位置，这样使用头结点指证b时，b->fd或者b->bk能够正确访问，同时prev_size和size对于头结点没有意义，所以就被省略了。对于binat(64)及其之后的large bin来说，因为头结点的size成员没有意义，所以其fd_nextsize和bk_nextsize也是没有意义的，也可以省略。

small bin 和large bin 索引如下。

#define smallbin_index(sz) \
  ((SMALLBIN_WIDTH == 16 ? (((unsigned) (sz)) >> 4) : (((unsigned) (sz)) >> 3))\
   + SMALLBIN_CORRECTION)

#define largebin_index(sz) \
  (SIZE_SZ == 8 ? largebin_index_64 (sz)                                     \
   : MALLOC_ALIGNMENT == 16 ? largebin_index_32_big (sz)                     \
   : largebin_index_32 (sz))

#define bin_index(sz) \
  ((in_smallbin_range (sz)) ? smallbin_index (sz) : largebin_index (sz))

获得unsorted bin。

1	#define unsorted_chunks(M) (bin_at (M, 1))

binmap结构在索引bin的时候使用，binmap为malloc_state的成员，其中一个比特位表示bins中相应的bin状态，1表示bin不为空，0表示为空，这样能加快搜索速度。

1.1.6 深入理解堆的实现

仔细想一下，任何堆的实现都需要从以下两个角度考虑相应的问题

宏观角度
- 创建堆
- 堆的初始化
- 删除堆
微观角度
- 申请内存块
- 释放内存块

基础操作

`unlink`

unlink用来将一个双向链表（只存储空闲的chunk）中的一个元素取出来，可能在以下地方使用：

malloc
- 从恰好大小合适的large bin 中获取chunk。
  - 这里需要注意的是fastbin 与small bin就没有使用unlink，这就是为什么漏洞会经常出现在它们这里的原因。
  - 依次遍历处理unsorted bin时也没有使用unlink。
- 从比请求的chunk所在的bin大的bin中取chunk。
free
- 后向合并，合并物理相邻低地址空闲chunk。
- 前向合并，合并物理相邻高地址空闲chunk（除了top chunk）。
malloc_consolidate
- 后向合并，合并物理相邻低地址空闲chunk。
- 前向合并，合并物理相邻高地址空闲chunk（除了top chunk）。
realloc
- 前向扩展，合并物理相邻高地址空闲chunk（除了top chunk）。

由于unlink使用非常频繁，所以unlink被实现为了一个宏，如下

/* Take a chunk off a bin list */
// unlink p
#define unlink(AV, P, BK, FD) {                                            \
    // 由于 P 已经在双向链表中，所以有两个地方记录其大小，所以检查一下其大小是否一致。
    if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \
      malloc_printerr ("corrupted size vs. prev_size");               \
    FD = P->fd;                                                                      \
    BK = P->bk;                                                                      \
    // 防止攻击者简单篡改空闲的 chunk 的 fd 与 bk 来实现任意写的效果。
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                      \
      malloc_printerr (check_action, "corrupted double-linked list", P, AV);  \
    else {                                                                      \
        FD->bk = BK;                                                              \
        BK->fd = FD;                                                              \
        // 下面主要考虑 P 对应的 nextsize 双向链表的修改
        if (!in_smallbin_range (chunksize_nomask (P))                              \
            // 如果P->fd_nextsize为 NULL，表明 P 未插入到 nextsize 链表中。
            // 那么其实也就没有必要对 nextsize 字段进行修改了。
            // 这里没有去判断 bk_nextsize 字段，可能会出问题。
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {                      \
            // 类似于小的 chunk 的检查思路
            if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)              \
                || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \
              malloc_printerr (check_action,                                      \
                               "corrupted double-linked list (not small)",    \
                               P, AV);                                              \
            // 这里说明 P 已经在 nextsize 链表中了。
            // 如果 FD 没有在 nextsize 链表中
            if (FD->fd_nextsize == NULL) {                                      \
                // 如果 nextsize 串起来的双链表只有 P 本身，那就直接拿走 P
                // 令 FD 为 nextsize 串起来的
                if (P->fd_nextsize == P)                                      \
                  FD->fd_nextsize = FD->bk_nextsize = FD;                      \
                else {                                                              \
                // 否则我们需要将 FD 插入到 nextsize 形成的双链表中
                    FD->fd_nextsize = P->fd_nextsize;                              \
                    FD->bk_nextsize = P->bk_nextsize;                              \
                    P->fd_nextsize->bk_nextsize = FD;                              \
                    P->bk_nextsize->fd_nextsize = FD;                              \
                  }                                                              \
              } else {                                                              \
                // 如果在的话，直接拿走即可
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;                      \
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;                      \
              }                                                                      \
          }                                                                      \
      }                                                                              \
}

这里我们以small bin 的unlink为例子介绍一下。对于large bin 的unlink，与其类似，只是多了个nextsize的处理。

可以看出，P最后的fd和bk指针并没有发生变化，但是当我们去遍历整个双向链表时，已经遍历不到对应的链表了。这一点没有变化还是很有用处的，因为我们有时候可以使用这个办法来泄露地址

libc地址
- P位于双向链表头部，bk泄露
- P位于双向链表尾部，fd泄露
- 双向链表只包含一个空闲chunk时，P位于双向链表中，fd和bk均可以泄露
泄露堆地址，双向链表包含多个空闲chunk
- P位于双向链表头部，fd泄露
- P位于双向链表中，fd和bk均可以泄露
- P位于双向链表尾部，bk泄露。

注意

这里的头部指的是bin的fd指向的chunk，即双向链表最新加入的chunk。
这里的尾部指的是bin的bk指向的chunk，即双向链表最先加入的chunk。

同时，无论是对于fd，bk还是fd_nextsize ,bk_nextszie ,程序都会检测fd和bk是否满足对应的要求。

// fd bk
if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                      \
  malloc_printerr (check_action, "corrupted double-linked list", P, AV);  \

  // next_size related
              if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)              \
                || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \
              malloc_printerr (check_action,                                      \
                               "corrupted double-linked list (not small)",    \
                               P, AV);

看起来似乎很正常。我们以fd和bk为例，P的forrward chunk 的bk很自然是P，同样P的backward chunk 的fd也很自然是P。如果没有做相应的检查的话，我们可以修改p的fd与bk，从而很容易地达到任意地址写的效果。

注意：堆的第一个chunk所记录的prev_inuse位默认为1。

`malloc_printerr`

在glibc malloc时检测到错误的时候，会调用malloc_printerr函数。

static void malloc_printerr(const char *str) {
  __libc_message(do_abort, "%s\n", str);
  __builtin_unreachable();
}

主要会调用__libc_message来执行abort函数，如下

if ((action & do_abort)) {
  if ((action & do_backtrace))
    BEFORE_ABORT(do_abort, written, fd);

  /* Kill the application.  */
  abort();
}

在 abort函数里，在glibc还是2.23版本时，会fflush stream。

堆的初始化

堆的初始化是在用户第一次申请内存时执行malloc_consolidate 再执行malloc_init_state实现的。

`malloc_consolidate`

由于fast bin中的chunk永远不会释放，导致相邻的free chunk 无法与之合并，从而造成大量的内存碎片，malloc_consolidate()函数最主要的功能就是来解决这个问题。在达到某些条件时，glibc就会调用该函数将fast bin中的chunk取出来，与相邻的free chunk合并后放入unsorted bin，或者与top chunk合并后形成新的top chunk。

/*
  ------------------------- malloc_consolidate -------------------------
  malloc_consolidate is a specialized version of free() that tears
  down chunks held in fastbins.  Free itself cannot be used for this
  purpose since, among other things, it might place chunks back onto
  fastbins.  So, instead, we need to use a minor variant of the same
  code.
*/
static void malloc_consolidate(mstate av)
{
  mfastbinptr*    fb;                 /* current fastbin being consolidated */
  mfastbinptr*    maxfb;              /* last fastbin (for loop control) */
  mchunkptr       p;                  /* current chunk being consolidated */
  mchunkptr       nextp;              /* next chunk to consolidate */
  mchunkptr       unsorted_bin;       /* bin header */
  mchunkptr       first_unsorted;     /* chunk to link to */
  /* These have same use as in free() */
  mchunkptr       nextchunk;
  INTERNAL_SIZE_T size;
  INTERNAL_SIZE_T nextsize;
  INTERNAL_SIZE_T prevsize;
  int             nextinuse;
  atomic_store_relaxed (&av->have_fastchunks, false);
  unsorted_bin = unsorted_chunks(av);
  /*
    Remove each chunk from fast bin and consolidate it, placing it
    then in unsorted bin. Among other reasons for doing this,
    placing in unsorted bin avoids needing to calculate actual bins
    until malloc is sure that chunks aren't immediately going to be
    reused anyway.
  */
  maxfb = &fastbin (av, NFASTBINS - 1);
  fb = &fastbin (av, 0);
  do {
    p = atomic_exchange_acq (fb, NULL);
    if (p != 0) {
      do {
        {
          unsigned int idx = fastbin_index (chunksize (p));
          if ((&fastbin (av, idx)) != fb)
            malloc_printerr ("malloc_consolidate(): invalid chunk size");
        }
        check_inuse_chunk(av, p);
        nextp = p->fd;
        /* Slightly streamlined version of consolidation code in free() */
        size = chunksize (p);
        nextchunk = chunk_at_offset(p, size);
        nextsize = chunksize(nextchunk);
        if (!prev_inuse(p)) {
          prevsize = prev_size (p);
          size += prevsize;
          p = chunk_at_offset(p, -((long) prevsize));
          if (__glibc_unlikely (chunksize(p) != prevsize))
            malloc_printerr ("corrupted size vs. prev_size in fastbins");
          unlink_chunk (av, p);
        }
        if (nextchunk != av->top) {
          nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
          if (!nextinuse) {
            size += nextsize;
            unlink_chunk (av, nextchunk);
          } else
            clear_inuse_bit_at_offset(nextchunk, 0);
          first_unsorted = unsorted_bin->fd;
          unsorted_bin->fd = p;
          first_unsorted->bk = p;
          if (!in_smallbin_range (size)) {
            p->fd_nextsize = NULL;
            p->bk_nextsize = NULL;
          }
          set_head(p, size | PREV_INUSE);
          p->bk = unsorted_bin;
          p->fd = first_unsorted;
          set_foot(p, size);
        }
        else {
          size += nextsize;
          set_head(p, size | PREV_INUSE);
          av->top = p;
        }
      } while ( (p = nextp) != 0);
    }
  } while (fb++ != maxfb);
}

`malloc_init_state`

/*
   Initialize a malloc_state struct.

   This is called from ptmalloc_init () or from _int_new_arena ()
   when creating a new arena.
 */

static void
malloc_init_state (mstate av)
{
  int i;
  mbinptr bin;

  /* Establish circular links for normal bins */
  for (i = 1; i < NBINS; ++i)
    {
      bin = bin_at (av, i);
      bin->fd = bin->bk = bin;
    }

#if MORECORE_CONTIGUOUS
  if (av != &main_arena)
#endif
  set_noncontiguous (av);
  if (av == &main_arena)
    set_max_fast (DEFAULT_MXFAST);
  atomic_store_relaxed (&av->have_fastchunks, false);

  av->top = initial_top (av);
}

申请堆块

`__libc_malloc`

一般我们会使用malloc函数来申请内存块，可是当仔细看glibc的源码实现时，其实并没有malloc函数。因为使用了宏strong_alias (__libc_malloc , malloc)，在glibc源码中malloc（）函数实际上是__libc_malloc(),此外，__libc_malloc函数只是用来简单封装_int_malloc函数。_int_malloc才是申请内内存块的核心。接下来仔细分析一下具体的实现。

该函数会首先检查是否有内存分配函数的钩子函数（__malloc_hook)，这个主要是用于用户自定义的堆分配函数，方便用户快速修改堆分配函数并进行测试。这里需要注意的是，用户申请的字节一旦进入申请内存函数中就变成了无符号整数。

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;
  _Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2,
                  "PTRDIFF_MAX is not more than half of SIZE_MAX");
//检查是否内存分配钩子，如果有，调用钩子并返回.
  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

接着会寻找一个arena来试图分配内存。

1	arena_get(ar_ptr, bytes);

然后调用_int_malloc函数去申请对应的内存。

1	victim = _int_malloc (ar_ptr, bytes);

如果分配失败的话，ptmalloc会尝试再去寻找一个可用的arena，并分配内存。

/* Retry with another arena only if we were able to find a usable arena
     before.  */
if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

如果申请到了arena，那么在退出之前还得解锁。

1 2	if (ar_ptr != NULL) __libc_lock_unlock (ar_ptr->mutex);

判断目前的状态是否满足一下条件

要么没有申请到内存
要么是mmap的内存
要么申请到内存必须在其所分配的arena中

1 2	assert (!victim \|\| chunk_is_mmapped (mem2chunk (victim)) \|\| ar_ptr == arena_for_chunk (mem2chunk (victim)));

最后返回内存。

1	return victim;

`_int_malloc`

_int_malloc是内存分配的核心函数，为了以最快的速度找到最合适的堆块，glibc根据申请堆块的大小，各个bin的状态仔细地安排了分配顺序和内存整理的时机。其大概搜索顺序是：

1）fast bin（寻找大小完全一样的）；

2）small bin（寻找大小完全一样的）；

3）unsorted bin（寻找大小完全一样的）；

4）large bin（如果申请较大的是chunk，寻找最小能满足的）；

5）bins（寻找最小能满足的）；

6）top chunk（切分出合适的chunk）；

7）系统函数分配。

其核心思路有如下

它根据用户申请的 内存块大小以及 相应大小chunk通常使用的频度（fastbin chunk，small chunk，large chunk），依次实现了不同的分配方法。
它由小到大依次检查不同的bin中是否有相应的空闲块可以满足用户请求的内存。
当所有的空闲chunk都无法满足时，它会考虑top chunk。
当top chunk 也无法满足时，堆分配器才会进行内存块申请。

_ini_malloc()函数具体过程如下。

在进入该函数后，函数立马定义了一系列自己需要的变量，并将用户申请的内存大小转换为内部的chunk大小。

static void *_int_malloc(mstate av, size_t bytes) {
    INTERNAL_SIZE_T nb;  /* normalized request size */
    unsigned int    idx; /* associated bin index */
    mbinptr         bin; /* associated bin */

    mchunkptr       victim;       /* inspected/selected chunk */
    INTERNAL_SIZE_T size;         /* its size */
    int             victim_index; /* its bin index */

    mchunkptr     remainder;      /* remainder from a split */
    unsigned long remainder_size; /* its size */

    unsigned int block; /* bit map traverser */
    unsigned int bit;   /* bit map traverser */
    unsigned int map;   /* current word of binmap */

    mchunkptr fwd; /* misc temp for linking */
    mchunkptr bck; /* misc temp for linking */

    const char *errstr = NULL;

    /*
       Convert request size to internal form by adding SIZE_SZ bytes
       overhead plus possibly more to obtain necessary alignment and/or
       to obtain a size of at least MINSIZE, the smallest allocatable
       size. Also, checked_request2size traps (returning 0) request sizes
       that are so large that they wrap around zero when padded and
       aligned.
     */

    checked_request2size(bytes, nb);

如果没有合适的arena，就调用sysmalloc（），用mmap（）分配chunk并返回。

/* There are no usable arenas.  Fall back to sysmalloc to get a chunk from
   mmap.  */
if (__glibc_unlikely(av == NULL)) {
    void *p = sysmalloc(nb, av);
    if (p != NULL) alloc_perturb(p, bytes);
    return p;
}

其次，检查fast bin中是否有合适的chunk。如果申请的chunk的大小位于fastbin范围内，需要注意的是这里比较的是无符号整数。此外，是从fastbin的头结点开始取chunk。

/*
   If the size qualifies as a fastbin, first check corresponding bin.
   This code is safe to execute even if av is not yet initialized, so we
   can try it without checking, which saves some time on this fast path.
 */

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast())) {
    // 得到对应的fastbin的下标
    idx             = fastbin_index(nb);
    // 得到对应的fastbin的头指针
    mfastbinptr *fb = &fastbin(av, idx);
    mchunkptr    pp = *fb;
    // 利用fd遍历对应的bin内是否有空闲的chunk块，
    do {
        victim = pp;
        if (victim == NULL) break;
    } while ((pp = catomic_compare_and_exchange_val_acq(fb, victim->fd,
                                                        victim)) != victim);
    // 存在可以利用的chunk
    if (victim != 0) {
        // 检查取到的 chunk 大小是否与相应的 fastbin 索引一致。
        // 根据取得的 victim ，利用 chunksize 计算其大小。
        // 利用fastbin_index 计算 chunk 的索引。
        if (__builtin_expect(fastbin_index(chunksize(victim)) != idx, 0)) {
            errstr = "malloc(): memory corruption (fast)";
        errout:
            malloc_printerr(check_action, errstr, chunk2mem(victim), av);
            return NULL;
        }
        // 细致的检查。。只有在 DEBUG 的时候有用
        check_remalloced_chunk(av, victim, nb);
        // 将获取的到chunk转换为mem模式
        void *p = chunk2mem(victim);
        // 如果设置了perturb_type, 则将获取到的chunk初始化为 perturb_type ^ 0xff
        alloc_perturb(p, bytes);
        return p;
    }
}

然后检查small bin中是否有合适的chunk。

/*
   If a small request, check regular bin.  Since these "smallbins"
   hold one size each, no searching within bins is necessary.
   (For a large request, we need to wait until unsorted chunks are
   processed to find best fit. But for small ones, fits are exact
   anyway, so we can check now, which is faster.)
 */

if (in_smallbin_range(nb)) {
    // 获取 small bin 的索引
    idx = smallbin_index(nb);
    // 获取对应 small bin 中的 chunk 指针
    bin = bin_at(av, idx);
    // 先执行 victim = last(bin)，获取 small bin 的最后一个 chunk
    // 如果 victim = bin ，那说明该 bin 为空。
    // 如果不相等，那么会有两种情况
    if ((victim = last(bin)) != bin) {
        // 第一种情况，small bin 还没有初始化。
        if (victim == 0) /* initialization check */
            // 执行初始化，将 fast bins 中的 chunk 进行合并
            malloc_consolidate(av);
        // 第二种情况，small bin 中存在空闲的 chunk
        else {
            // 获取 small bin 中倒数第二个 chunk 。
            bck = victim->bk;
            // 检查 bck->fd 是不是 victim，防止伪造
            if (__glibc_unlikely(bck->fd != victim)) {
                errstr = "malloc(): smallbin double linked list corrupted";
                goto errout;
            }
            // 设置 victim 对应的 inuse 位
            set_inuse_bit_at_offset(victim, nb);
            // 修改 small bin 链表，将 small bin 的最后一个 chunk 取出来
            bin->bk = bck;
            bck->fd = bin;
            // 如果不是 main_arena，设置对应的标志
            if (av != &main_arena) set_non_main_arena(victim);
            // 细致的检查，非调试状态没有作用
            check_malloced_chunk(av, victim, nb);
            // 将申请到的 chunk 转化为对应的 mem 状态
            void *p = chunk2mem(victim);
            // 如果设置了 perturb_type , 则将获取到的chunk初始化为 perturb_type ^ 0xff
            alloc_perturb(p, bytes);
            return p;
        }
    }
}

当fast bin,small bin中的chunk都不能满足用户请求chunk大小时，就会考虑是不是large bin。但是，其实在large bin中并没有直接去扫描对应bin中的chunk，而是先利用malloc_consollidate函数处理fast bin中chunk，将有可能能够合并的chunk先进行合并后放到unsorted bin中，不能够合并的就直接放到unsorted bin中，然后再在下面的大循环中进行相应的处理。 为什么不直接从相应的bin中取出large chunk呢？这个是ptmalloc 的机制，它会在分配large chunk之前对堆中碎片chunk进行合并，以减少堆中的碎片。

    /*
       If this is a large request, consolidate fastbins before continuing.
       While it might look excessive to kill all fastbins before
       even seeing if there is space available, this avoids
       fragmentation problems normally associated with fastbins.
       Also, in practice, programs tend to have runs of either small or
       large requests, but less often mixtures, so consolidation is not
       invoked all that often in most programs. And the programs that
       it is called frequently in otherwise tend to fragment.
     */
/*到这里还不能满足，就调用malloc_consolidate()整理fastbins，或许会有合适的chunk*/
    else {
        // 获取large bin的下标。
        idx = largebin_index(nb);
        // 如果存在fastbin的话，会处理 fastbin
        if (have_fastchunks(av)) malloc_consolidate(av);
    }

大循环-遍历unsorted bin
接下来，函数进入一个大的外层for循环，包含了_int_malloc（）函数之后的所有过程。紧接着是内层第一个while循环，它会遍历unsorted bin中的每一个chunk，如果大小正好合适，就将其取出，否则就将其放入small bin或者large bin 。这是唯一的将chunk放进small bin或者large bin的过程。

如果程序执行到了这里，那么说明与chunk大小正好一致的bin（fast bin ，small bin）中没有chunk可以直接满足需求，但是large chunk则是在这个大循环中处理。

在接下来的这个循环中，主要做了以下的操作

按照FIFO的方式逐个将unsorted bin中的chunk取出来
- 如果是small request ，则考虑是不是恰好满足，是的话，直接返回。
- 如果不是得话，放到对应的bin中。
尝试从large bin中分配用户所需的内存

该部分是个大循环，这是为了尝试重新分配small bin chunk ，这是因为我们虽然会首先使用large bin，top chunk来尝试满足用户的请求，但是如果没有满足的话，由于我们在上面没有分配成功small bin，我们并没有对fast bin中的chunk进行合并，所以这里会进行fast bin chunk的合并，进而使用一个大循环来尝试再次分配small bin chunk。

/*
   Process recently freed or remaindered chunks, taking one only if
   it is exact fit, or, if this a small request, the chunk is remainder from
   the most recent non-exact fit.  Place other traversed chunks in
   bins.  Note that this step is the only place in any routine where
   chunks are placed in bins.

   The outer loop here is needed because we might not realize until
   near the end of malloc that we should have consolidated, so must
   do so and retry. This happens at most once, and only when we would
   otherwise need to expand memory to service a "small" request.
 */

for (;;) {
    int iters = 0;

unsorted bin遍历

先考虑unsorted bin ，再考虑last remainder ，但是对于small bin chunk的请求会有所例外。

注意unsorted bin的遍历顺序为bk。

// 如果 unsorted bin 不为空
// First In First Out
while ((victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) {
    // victim 为 unsorted bin 的最后一个 chunk
    // bck 为 unsorted bin 的倒数第二个 chunk
    bck = victim->bk;
    // 判断得到的 chunk 是否满足要求，不能过小，也不能过大
    // 一般 system_mem 的大小为132K
    if (__builtin_expect(chunksize_nomask(victim) <= 2 * SIZE_SZ, 0) ||
        __builtin_expect(chunksize_nomask(victim) > av->system_mem, 0))
        malloc_printerr(check_action, "malloc(): memory corruption",
                        chunk2mem(victim), av);
    // 得到victim对应的chunk大小。
    size = chunksize(victim);

small request

如果用户的请求为small bin chunk ，那么我们首先考虑last remainder ，如果last remainder 是unsorted bin中的唯一一块的话，并且满足拆分条件时，就将其拆分。

/*
   If a small request, try to use last remainder if it is the
   only chunk in unsorted bin.  This helps promote locality for
   runs of consecutive small requests. This is the only
   exception to best-fit, and applies only when there is
   no exact fit for a small chunk.
 */

if (in_smallbin_range(nb) && bck == unsorted_chunks(av) &&
    victim == av->last_remainder &&
    (unsigned long) (size) > (unsigned long) (nb + MINSIZE)) {
    /* split and reattach remainder */
    // 获取新的 remainder 的大小
    remainder_size          = size - nb;
    // 获取新的 remainder 的位置
    remainder               = chunk_at_offset(victim, nb);
    // 更新 unsorted bin 的情况
    unsorted_chunks(av)->bk = unsorted_chunks(av)->fd = remainder;
    // 更新 av 中记录的 last_remainder
    av->last_remainder                                = remainder;
    // 更新last remainder的指针
    remainder->bk = remainder->fd = unsorted_chunks(av);
    if (!in_smallbin_range(remainder_size)) {
        remainder->fd_nextsize = NULL;
        remainder->bk_nextsize = NULL;
    }
    // 设置victim的头部，
    set_head(victim, nb | PREV_INUSE |
                         (av != &main_arena ? NON_MAIN_ARENA : 0));
    // 设置 remainder 的头部
    set_head(remainder, remainder_size | PREV_INUSE);
    // 设置记录 remainder 大小的 prev_size 字段，因为此时 remainder 处于空闲状态。
    set_foot(remainder, remainder_size);
    // 细致的检查，非调试状态下没有作用
    check_malloced_chunk(av, victim, nb);
    // 将 victim 从 chunk 模式转化为mem模式
    void *p = chunk2mem(victim);
    // 如果设置了perturb_type, 则将获取到的chunk初始化为 perturb_type ^ 0xff
    alloc_perturb(p, bytes);
    return p;
}

初始取出

/* remove from unsorted list */
        if (__glibc_unlikely (bck->fd != victim))
          malloc_printerr ("malloc(): corrupted unsorted chunks 3");
        kunsorted_chunks (av)->bk = bck;
        bck->fd = unsorted_chunks (av);

EXACT FIT

如果从unsorted bin中取出来的chunk大小正好合适，就直接使用。这里应该已经把合并后恰好合适的chunk给分配出去了。

/* Take now instead of binning if exact fit */
if (size == nb) {
    set_inuse_bit_at_offset(victim, size);
    if (av != &main_arena) set_non_main_arena(victim);
    check_malloced_chunk(av, victim, nb);
    void *p = chunk2mem(victim);
    alloc_perturb(p, bytes);
    return p;
}

PLACE CHUNK IN SMALL BIN

如果chunk大小不合适，就将其插入对应的bin中。插入的过程就是双链表插入结点打的过程。如果取出来的chunk放到对应的small bin中。

} else {
    // large bin 范围
    victim_index = largebin_index(size);
    bck          = bin_at(av, victim_index); // 当前 large bin 的头部
    fwd          = bck->fd;

    /* maintain large bins in sorted order */
    /* 从这里我们可以总结出，largebin 以 fd_nextsize 递减排序。
       同样大小的 chunk，后来的只会插入到之前同样大小的 chunk 后，
       而不会修改之前相同大小的fd/bk_nextsize，这也很容易理解，
       可以减低开销。此外，bin 头不参与 nextsize 链接。*/
    // 如果 large bin 链表不空
    if (fwd != bck) {
        /* Or with inuse bit to speed comparisons */
        // 加速比较，应该不仅仅有这个考虑，因为链表里的 chunk 都会设置该位。
        size |= PREV_INUSE;
        /* if smaller than smallest, bypass loop below */
        // bck->bk 存储着相应 large bin 中最小的chunk。
        // 如果遍历的 chunk 比当前最小的还要小，那就只需要插入到链表尾部。
        // 判断 bck->bk 是不是在 main arena。
        assert(chunk_main_arena(bck->bk));
        if ((unsigned long) (size) <
            (unsigned long) chunksize_nomask(bck->bk)) {
            // 令 fwd 指向 large bin 头
            fwd = bck;
            // 令 bck 指向 largin bin 尾部 chunk
            bck = bck->bk;
            // victim 的 fd_nextsize 指向 largin bin 的第一个 chunk
            victim->fd_nextsize = fwd->fd;
            // victim 的 bk_nextsize 指向原来链表的第一个 chunk 指向的 bk_nextsize
            victim->bk_nextsize = fwd->fd->bk_nextsize;
            // 原来链表的第一个 chunk 的 bk_nextsize 指向 victim
            // 原来指向链表第一个 chunk 的 fd_nextsize 指向 victim
            fwd->fd->bk_nextsize =
                victim->bk_nextsize->fd_nextsize = victim;
        } else {
            // 当前要插入的 victim 的大小大于最小的 chunk
            // 判断 fwd 是否在 main arena
            assert(chunk_main_arena(fwd));
            // 从链表头部开始找到不比 victim 大的 chunk
            while ((unsigned long) size < chunksize_nomask(fwd)) {
                fwd = fwd->fd_nextsize;
                assert(chunk_main_arena(fwd));
            }
            // 如果找到了一个和 victim 一样大的 chunk，
            // 那就直接将 chunk 插入到该chunk的后面，并不修改 nextsize 指针。
            if ((unsigned long) size ==
                (unsigned long) chunksize_nomask(fwd))
                /* Always insert in the second position.  */
                fwd = fwd->fd;
            else {
                // 如果找到的chunk和当前victim大小不一样
                // 那么就需要构造 nextsize 双向链表了
                victim->fd_nextsize              = fwd;
                victim->bk_nextsize              = fwd->bk_nextsize;
                fwd->bk_nextsize                 = victim;
                victim->bk_nextsize->fd_nextsize = victim;
            }
            bck = fwd->bk;
        }
    } else
        // 如果空的话，直接简单使得 fd_nextsize 与 bk_nextsize 构成一个双向链表即可。
        victim->fd_nextsize = victim->bk_nextsize = victim;
}

最终取出

// 放到对应的 bin 中，构成 bck<-->victim<-->fwd。
mark_bin(av, victim_index);
victim->bk = bck;
victim->fd = fwd;
fwd->bk    = victim;
bck->fd    = victim;

WHILE迭代次数

1
2
3

#define MAX_ITERS 10000
            if (++iters >= MAX_ITERS) break;
        }

while 最多迭代10000次后退出。

large chunk

注：或许会很奇怪，为什么这里没有先去看small chunk是否满足新需求了呢？这是因为small bin在循环之前以及判断过了，这里如果有的话，就是合并后的才出现chunk。但是在大循环外，large chunk只是单纯地找到其索引，所以觉得在这里直接判断是合理的，而且也为了下面可以再去找较大的chunk

如果请求的chunk在large chunk 范围内，就在对应的bin中从小到大进行扫描，找到第一个合适的。

/*
   If a large request, scan through the chunks of current bin in
   sorted order to find smallest that fits.  Use the skip list for this.
 */
if (!in_smallbin_range(nb)) {
    bin = bin_at(av, idx);
    /* skip scan if empty or largest chunk is too small */
    // 如果对应的 bin 为空或者其中的chunk最大的也很小，那就跳过
    // first(bin)=bin->fd 表示当前链表中最大的chunk
    if ((victim = first(bin)) != bin &&
        (unsigned long) chunksize_nomask(victim) >=
            (unsigned long) (nb)) {
        // 反向遍历链表，直到找到第一个不小于所需chunk大小的chunk
        victim = victim->bk_nextsize;
        while (((unsigned long) (size = chunksize(victim)) <
                (unsigned long) (nb)))
            victim = victim->bk_nextsize;

        /* Avoid removing the first entry for a size so that the skip
           list does not have to be rerouted.  */
        // 如果最终取到的chunk不是该bin中的最后一个chunk，并且该chunk与其前面的chunk
        // 的大小相同，那么我们就取其前面的chunk，这样可以避免调整bk_nextsize,fd_nextsize
        //  链表。因为大小相同的chunk只有一个会被串在nextsize链上。
        if (victim != last(bin) &&
            chunksize_nomask(victim) == chunksize_nomask(victim->fd))
            victim = victim->fd;
        // 计算分配后剩余的大小
        remainder_size = size - nb;
        // 进行unlink
        unlink(av, victim, bck, fwd);

        /* Exhaust */
        // 剩下的大小不足以当做一个块
        // 很好奇接下来会怎么办？
        if (remainder_size < MINSIZE) {
            set_inuse_bit_at_offset(victim, size);
            if (av != &main_arena) set_non_main_arena(victim);
        }
        /* Split */
        //  剩下的大小还可以作为一个chunk，进行分割。
        else {
            // 获取剩下那部分chunk的指针，称为remainder
            remainder = chunk_at_offset(victim, nb);
            /* We cannot assume the unsorted list is empty and therefore
               have to perform a complete insert here.  */
            // 插入unsorted bin中
            bck = unsorted_chunks(av);
            fwd = bck->fd;
            // 判断 unsorted bin 是否被破坏。
            if (__glibc_unlikely(fwd->bk != bck)) {
                errstr = "malloc(): corrupted unsorted chunks";
                goto errout;
            }
            remainder->bk = bck;
            remainder->fd = fwd;
            bck->fd       = remainder;
            fwd->bk       = remainder;
            // 如果不处于small bin范围内，就设置对应的字段
            if (!in_smallbin_range(remainder_size)) {
                remainder->fd_nextsize = NULL;
                remainder->bk_nextsize = NULL;
            }
            // 设置分配的chunk的标记
            set_head(victim,
                     nb | PREV_INUSE |
                         (av != &main_arena ? NON_MAIN_ARENA : 0));

            // 设置remainder的上一个chunk，即分配出去的chunk的使用状态
            // 其余的不用管，直接从上面继承下来了
            set_head(remainder, remainder_size | PREV_INUSE);
            // 设置remainder的大小
            set_foot(remainder, remainder_size);
        }
        // 检查
        check_malloced_chunk(av, victim, nb);
        // 转换为mem状态
        void *p = chunk2mem(victim);
        // 如果设置了perturb_type, 则将获取到的chunk初始化为 perturb_type ^ 0xff
        alloc_perturb(p, bytes);
        return p;
    }
}

寻找较大chunk

接下来，进入内层第二个for循环。根据binmap来搜索bin，因为申请的 chunk大小所对应bin没有找到合适的chunk，所以就从下一个bin中搜索。如果走到了这里，那就说明对于用户所需的chunk，不能直接从其对应的合适的bin中获取chunk，所以我们需要来查找比当前bin更大的fast bin ，small bin或者large bin。

/*
   Search for a chunk by scanning bins, starting with next largest
   bin. This search is strictly by best-fit; i.e., the smallest
   (with ties going to approximately the least recently used) chunk
   that fits is selected.

   The bitmap avoids needing to check that most blocks are nonempty.
   The particular case of skipping all bins during warm-up phases
   when no chunks have been returned yet is faster than it might look.
 */

++idx;
// 获取对应的bin
bin   = bin_at(av, idx);
// 获取当前索引在binmap中的block索引
// #define idx2block(i) ((i) >> BINMAPSHIFT)  ,BINMAPSHIFT=5
// Binmap按block管理，每个block为一个int，共32个bit，可以表示32个bin中是否有空闲chunk存在
// 所以这里是右移5
block = idx2block(idx);
// 获取当前块大小对应的映射，这里可以得知相应的bin中是否有空闲块
map   = av->binmap[ block ];
// #define idx2bit(i) ((1U << ((i) & ((1U << BINMAPSHIFT) - 1))))
// 将idx对应的比特位设置为1，其它位为0
bit   = idx2bit(idx);
for (;;) {

在内层第二个循环内部，寻找第一个不为空的block ，再根据比特位找到合适的bin。找到一个合适的MAP

/* Skip rest of block if there are no more set bits in this block.
 */
// 如果bit>map，则表示该 map 中没有比当前所需要chunk大的空闲块
// 如果bit为0，那么说明，上面idx2bit带入的参数为0。
if (bit > map || bit == 0) {
    do {
        // 寻找下一个block，直到其对应的map不为0。
        // 如果已经不存在的话，那就只能使用top chunk了
        if (++block >= BINMAPSIZE) /* out of bins */
            goto use_top;
    } while ((map = av->binmap[ block ]) == 0);
    // 获取其对应的bin，因为该map中的chunk大小都比所需的chunk大，而且
    // map本身不为0，所以必然存在满足需求的chunk。
    bin = bin_at(av, (block << BINMAPSHIFT));
    bit = 1;
}
/* Advance to bin with set bit. There must be one. */
// 从当前map的最小的bin一直找，直到找到合适的bin。
// 这里是一定存在的
while ((bit & map) == 0) {
    bin = next_bin(bin);
    bit <<= 1;
    assert(bit != 0);
}

然后检查bit对应的bin是否为空，如果是，就清空对应的比特位，从下一个bin开始再次循环，否则将victim中取出来。

       /* Inspect the bin. It is likely to be non-empty */
       // 获取对应的bin
       victim = last(bin);

       /*  If a false alarm (empty bin), clear the bit. */
       // 如果victim=bin，那么我们就将map对应的位清0，然后获取下一个bin
       // 这种情况发生的概率应该很小。
       if (victim == bin) {
           av->binmap[ block ] = map &= ~bit; /* Write through */
           bin                 = next_bin(bin);
           bit <<= 1;
       }
else {
           // 获取对应victim的大小
           size = chunksize(victim);

           /*  We know the first chunk in this bin is big enough to use. */
           assert((unsigned long) (size) >= (unsigned long) (nb));
           // 计算分割后剩余的大小
           remainder_size = size - nb;

           /* unlink */
           unlink(av, victim, bck, fwd);

将取出来的victim进行切分并把remainder 加入unsorted bin ，如果victim

不够分，那就直接返回给用户。内层第二个循环到此结束。

    /* Exhaust */
    // 如果分割后不够一个chunk怎么办？
    if (remainder_size < MINSIZE) {
        set_inuse_bit_at_offset(victim, size);
        if (av != &main_arena) set_non_main_arena(victim);
    }

    /* Split */
    // 如果够，尽管分割
    else {
        // 计算剩余的chunk的偏移
        remainder = chunk_at_offset(victim, nb);

        /* We cannot assume the unsorted list is empty and therefore
           have to perform a complete insert here.  */
        // 将剩余的chunk插入到unsorted bin中
        bck = unsorted_chunks(av);
        fwd = bck->fd;
        if (__glibc_unlikely(fwd->bk != bck)) {
            errstr = "malloc(): corrupted unsorted chunks 2";
            goto errout;
        }
        remainder->bk = bck;
        remainder->fd = fwd;
        bck->fd       = remainder;
        fwd->bk       = remainder;

        /* advertise as last remainder */
        // 如果在small bin范围内，就将其标记为remainder
        if (in_smallbin_range(nb)) av->last_remainder = remainder;
        if (!in_smallbin_range(remainder_size)) {
            remainder->fd_nextsize = NULL;
            remainder->bk_nextsize = NULL;
        }
        // 设置victim的使用状态
        set_head(victim,
                 nb | PREV_INUSE |
                     (av != &main_arena ? NON_MAIN_ARENA : 0));
        // 设置remainder的使用状态，这里是为什么呢？
        set_head(remainder, remainder_size | PREV_INUSE);
        // 设置remainder的大小
        set_foot(remainder, remainder_size);
    }
    // 检查
    check_malloced_chunk(av, victim, nb);
    // chunk状态转换到mem状态
    void *p = chunk2mem(victim);
    // 如果设置了perturb_type, 则将获取到的chunk初始化为 perturb_type ^ 0xff
    alloc_perturb(p, bytes);
    return p;
}

如果上面的操作还不能满足要求，就只能从top chunk 上进行切分。

use_top:
    /*
       If large enough, split off the chunk bordering the end of memory
       (held in av->top). Note that this is in accord with the best-fit
       search rule.  In effect, av->top is treated as larger (and thus
       less well fitting) than any other available chunk since it can
       be extended to be as large as necessary (up to system
       limitations).

       We require that av->top always exists (i.e., has size >=
       MINSIZE) after initialization, so if it would otherwise be
       exhausted by current request, it is replenished. (The main
       reason for ensuring it exists is that we may need MINSIZE space
       to put in fenceposts in sysmalloc.)
     */
    // 获取当前的top chunk，并计算其对应的大小
    victim = av->top;
    size   = chunksize(victim);
    // 如果分割之后，top chunk 大小仍然满足 chunk 的最小大小，那么就可以直接进行分割。
    if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE)) {
        remainder_size = size - nb;
        remainder      = chunk_at_offset(victim, nb);
        av->top        = remainder;
        // 这里设置 PREV_INUSE 是因为 top chunk 前面的 chunk 如果不是 fastbin，就必然会和
        // top chunk 合并，所以这里设置了 PREV_INUSE。
        set_head(victim, nb | PREV_INUSE |
                             (av != &main_arena ? NON_MAIN_ARENA : 0));
        set_head(remainder, remainder_size | PREV_INUSE);

        check_malloced_chunk(av, victim, nb);
        void *p = chunk2mem(victim);
        alloc_perturb(p, bytes);
        return p;
    }
    // 否则，判断是否有 fast chunk
    /* When we are using atomic ops to free fast chunks we can get
       here for all block sizes.  */
    else if (have_fastchunks(av)) {
        // 先执行一次fast bin的合并
        malloc_consolidate(av);
        /* restore original bin index */
        // 判断需要的chunk是在small bin范围内还是large bin范围内
        // 并计算对应的索引
        // 等待下次再看看是否可以
        if (in_smallbin_range(nb))
            idx = smallbin_index(nb);
        else
            idx = largebin_index(nb);
    }

如果堆内存不够，我们就需要使用sysmalloc来申请内存了。

/*
   Otherwise, relay to handle system-dependent cases
 */
// 否则的话，我们就只能从系统中再次申请一点内存了。
else {
    void *p = sysmalloc(nb, av);
    if (p != NULL) alloc_perturb(p, bytes);
    return p;
}

在主线程中，sysmalloc （）函数的大概流程如下。

（1）当申请的大小nb大于mp_.mmap_threshold时，通过mmap （）函数进行分配。其中mp._mmap_threshold的默认大小为128x1024字节

（2）尝试用brk（）扩展堆内存，形成新的top chunk ，而旧的top chunk 会被释放。然后从新的top chunk中切分出nb大小的chunk，返回给用户。

`_libc_calloc`

calloc也是libc中的一种申请内存块的函数。在libc中的封装为_libc_calloc，具体介绍如下

/*
  calloc(size_t n_elements, size_t element_size);
  Returns a pointer to n_elements * element_size bytes, with all locations
  set to zero.
*/
void*  __libc_calloc(size_t, size_t);

`sysmalloc`

正如该函数头的注释所言，该函数用于当前堆内存不足时，需要向系统申请更多的内存。

/*
   sysmalloc handles malloc cases requiring more memory from the system.
   On entry, it is assumed that av->top does not have enough
   space to service request for nb bytes, thus requiring that av->top
   be extended or replaced.
 */

基本定义

static void *sysmalloc(INTERNAL_SIZE_T nb, mstate av) {
  mchunkptr old_top;        /* incoming value of av->top */
  INTERNAL_SIZE_T old_size; /* its size */
  char *old_end;            /* its end address */

  long size; /* arg to first MORECORE or mmap call */
  char *brk; /* return value from MORECORE */

  long correction; /* arg to 2nd MORECORE call */
  char *snd_brk;   /* 2nd return val */

  INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
  INTERNAL_SIZE_T end_misalign;   /* partial page left at end of new space */
  char *aligned_brk;              /* aligned offset into brk */

  mchunkptr p;                  /* the allocated/returned chunk */
  mchunkptr remainder;          /* remainder frOm allocation */
  unsigned long remainder_size; /* its size */

  size_t pagesize = GLRO(dl_pagesize);
  bool tried_mmap = false;

我们可以主要关注一下 pagesize,其

#ifndef EXEC_PAGESIZE
#define EXEC_PAGESIZE   4096
#endif
# define GLRO(name) _##name
size_t _dl_pagesize = EXEC_PAGESIZE;

所以，pagesize = 4096 = 0x1000。

释放内存块

`__libc_free`

类似于malloc，free函数也有一层封装，命名格式与malloc基本类似。代码如下

void __libc_free(void *mem) {
    mstate    ar_ptr;
    mchunkptr p; /* chunk corresponding to mem */
    // 判断是否有钩子函数 __free_hook
    void (*hook)(void *, const void *) = atomic_forced_read(__free_hook);
    if (__builtin_expect(hook != NULL, 0)) {
        (*hook)(mem, RETURN_ADDRESS(0));
        return;
    }
    // free NULL没有作用
    if (mem == 0) /* free(0) has no effect */
        return;
    // 将mem转换为chunk状态
    p = mem2chunk(mem);
    // 如果该块内存是mmap得到的
    if (chunk_is_mmapped(p)) /* release mmapped memory. */
    {
        /* See if the dynamic brk/mmap threshold needs adjusting.
       Dumped fake mmapped chunks do not affect the threshold.  */
        if (!mp_.no_dyn_threshold && chunksize_nomask(p) > mp_.mmap_threshold &&
            chunksize_nomask(p) <= DEFAULT_MMAP_THRESHOLD_MAX &&
            !DUMPED_MAIN_ARENA_CHUNK(p)) {
            mp_.mmap_threshold = chunksize(p);
            mp_.trim_threshold = 2 * mp_.mmap_threshold;
            LIBC_PROBE(memory_mallopt_free_dyn_thresholds, 2,
                       mp_.mmap_threshold, mp_.trim_threshold);
        }
        munmap_chunk(p);
        return;
    }
    // 根据chunk获得分配区的指针
    ar_ptr = arena_for_chunk(p);
    // 执行释放
    _int_free(ar_ptr, p, 0);
}

`_int_free`

函数初始时刻定义了一系列的变量，并且得到了用户想要释放的chunk的大小

static void _int_free(mstate av, mchunkptr p, int have_lock) {
    INTERNAL_SIZE_T size;      /* its size */
    mfastbinptr *   fb;        /* associated fastbin */
    mchunkptr       nextchunk; /* next contiguous chunk */
    INTERNAL_SIZE_T nextsize;  /* its size */
    int             nextinuse; /* true if nextchunk is used */
    INTERNAL_SIZE_T prevsize;  /* size of previous contiguous chunk */
    mchunkptr       bck;       /* misc temp for linking */
    mchunkptr       fwd;       /* misc temp for linking */

    const char *errstr = NULL;
    int         locked = 0;

    size = chunksize(p);

简单的检查

/* Little security check which won't hurt performance: the
   allocator never wrapps around at the end of the address space.
   Therefore we can exclude some size values which might appear
   here by accident or by "design" from some intruder.  */
// 指针不能指向非法的地址, 必须小于等于-size，为什么？？？
// 指针必须得对齐，2*SIZE_SZ 这个对齐得仔细想想
if (__builtin_expect((uintptr_t) p > (uintptr_t) -size, 0) ||
    __builtin_expect(misaligned_chunk(p), 0)) {
    errstr = "free(): invalid pointer";
errout:
    if (!have_lock && locked) __libc_lock_unlock(av->mutex);
    malloc_printerr(check_action, errstr, chunk2mem(p), av);
    return;
}
/* We know that each chunk is at least MINSIZE bytes in size or a
   multiple of MALLOC_ALIGNMENT.  */
// 大小没有最小的chunk大，或者说，大小不是MALLOC_ALIGNMENT的整数倍
if (__glibc_unlikely(size < MINSIZE || !aligned_OK(size))) {
    errstr = "free(): invalid size";
    goto errout;
}
// 检查该chunk是否处于使用状态，非调试状态下没有作用
check_inuse_chunk(av, p);

其中

/* Check if m has acceptable alignment */

#define aligned_OK(m) (((unsigned long) (m) &MALLOC_ALIGN_MASK) == 0)

#define misaligned_chunk(p)                                                    \
    ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem(p)) &       \
     MALLOC_ALIGN_MASK)

fast bin

如果上述检查都合格的话，判断当前的斌是不是在fast bin范围内，在的话就插入到 fastbin头部，即成为对应fastbin链表的 第一个free chunk。

    /*
      If eligible, place chunk on a fastbin so it can be found
      and used quickly in malloc.
    */

    if ((unsigned long) (size) <= (unsigned long) (get_max_fast())

#if TRIM_FASTBINS
        /*
      If TRIM_FASTBINS set, don't place chunks
      bordering top into fastbins
        */
       //默认 #define TRIM_FASTBINS 0，因此默认情况下下面的语句不会执行
       // 如果当前chunk是fast chunk，并且下一个chunk是top chunk，则不能插入
        && (chunk_at_offset(p, size) != av->top)
#endif
            ) {
        // 下一个chunk的大小不能小于两倍的SIZE_SZ,并且
        // 下一个chunk的大小不能大于system_mem， 一般为132k
        // 如果出现这样的情况，就报错。
        if (__builtin_expect(
                chunksize_nomask(chunk_at_offset(p, size)) <= 2 * SIZE_SZ, 0) ||
            __builtin_expect(
                chunksize(chunk_at_offset(p, size)) >= av->system_mem, 0)) {
            /* We might not have a lock at this point and concurrent
               modifications
               of system_mem might have let to a false positive.  Redo the test
               after getting the lock.  */
            if (have_lock || ({
                    assert(locked == 0);
                    __libc_lock_lock(av->mutex);
                    locked = 1;
                    chunksize_nomask(chunk_at_offset(p, size)) <= 2 * SIZE_SZ ||
                        chunksize(chunk_at_offset(p, size)) >= av->system_mem;
                })) {
                errstr = "free(): invalid next size (fast)";
                goto errout;
            }
            if (!have_lock) {
                __libc_lock_unlock(av->mutex);
                locked = 0;
            }
        }
        // 将chunk的mem部分全部设置为perturb_byte
        free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);
        // 设置fast chunk的标记位
        set_fastchunks(av);
        // 根据大小获取fast bin的索引
        unsigned int idx = fastbin_index(size);
        // 获取对应fastbin的头指针，被初始化后为NULL。
        fb               = &fastbin(av, idx);

        /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
        // 使用原子操作将P插入到链表中
        mchunkptr    old     = *fb, old2;
        unsigned int old_idx = ~0u;
        do {
            /* Check that the top of the bin is not the record we are going to
               add
               (i.e., double free).  */
            // so we can not double free one fastbin chunk
            // 防止对 fast bin double free
            if (__builtin_expect(old == p, 0)) {
                errstr = "double free or corruption (fasttop)";
                goto errout;
            }
            /* Check that size of fastbin chunk at the top is the same as
               size of the chunk that we are adding.  We can dereference OLD
               only if we have the lock, otherwise it might have already been
               deallocated.  See use of OLD_IDX below for the actual check.  */
            if (have_lock && old != NULL)
                old_idx = fastbin_index(chunksize(old));
            p->fd = old2 = old;
        } while ((old = catomic_compare_and_exchange_val_rel(fb, p, old2)) !=
                 old2);
        // 确保fast bin的加入前与加入后相同
        if (have_lock && old != NULL && __builtin_expect(old_idx != idx, 0)) {
            errstr = "invalid fastbin entry (free)";
            goto errout;
        }
    }

`合并非mmap的空闲chunk`

只有不是fast bin的情况下才会触发unlink

首先我们说一下为什么会合并chunk，这是为了避免heap中有太多零零碎碎的内存块，合并之后可以用来应对更大的内存块请求。合并的主要顺序为

先考虑物理低地址空闲块
后考虑物理高地址空闲块

合并后的chunk指向合并的chunk的低地址。

在没有锁的情况下，先获得锁。

/*
  Consolidate other non-mmapped chunks as they arrive.
*/

else if (!chunk_is_mmapped(p)) {
    if (!have_lock) {
        __libc_lock_lock(av->mutex);
        locked = 1;
    }
    nextchunk = chunk_at_offset(p, size);

轻量级的检测

/* Lightweight tests: check whether the block is already the
   top block.  */
// 当前free的chunk不能是top chunk
if (__glibc_unlikely(p == av->top)) {
    errstr = "double free or corruption (top)";
    goto errout;
}
// 当前free的chunk的下一个chunk不能超过arena的边界
/* Or whether the next chunk is beyond the boundaries of the arena.  */
if (__builtin_expect(contiguous(av) &&
                         (char *) nextchunk >=
                             ((char *) av->top + chunksize(av->top)),
                     0)) {
    errstr = "double free or corruption (out)";
    goto errout;
}
// 当前要free的chunk的使用标记没有被标记，double free
/* Or whether the block is actually not marked used.  */
if (__glibc_unlikely(!prev_inuse(nextchunk))) {
    errstr = "double free or corruption (!prev)";
    goto errout;
}
// 下一个chunk的大小
nextsize = chunksize(nextchunk);
// next chunk size valid check
// 判断下一个chunk的大小是否不大于2*SIZE_SZ，或者
// nextsize是否大于系统可提供的内存
if (__builtin_expect(chunksize_nomask(nextchunk) <= 2 * SIZE_SZ, 0) ||
    __builtin_expect(nextsize >= av->system_mem, 0)) {
    errstr = "free(): invalid next size (normal)";
    goto errout;
}

释放填充

1 2	//将指针的mem部分全部设置为perturb_byte free_perturb(chunk2mem(p), size - 2 * SIZE_SZ);

后向合并-合并低地址（chunk）

/* consolidate backward */
if (!prev_inuse(p)) {
    prevsize = prev_size(p);
    size += prevsize;
    p = chunk_at_offset(p, -((long) prevsize));
    unlink(av, p, bck, fwd);
}

下一块不是top chunk-前向合并-合并高地址chunk

需要注意的是，如果下一块不是top chunk，则合并高地址的chunk,并将合并后的chunk放到unsorted bin中。

// 如果下一个chunk不是top chunk
if (nextchunk != av->top) {
    /* get and clear inuse bit */
    // 获取下一个 chunk 的使用状态
    nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
    // 如果不在使用，合并，否则清空当前chunk的使用状态。
    /* consolidate forward */
    if (!nextinuse) {
        unlink(av, nextchunk, bck, fwd);
        size += nextsize;
    } else
        clear_inuse_bit_at_offset(nextchunk, 0);

    /*
  Place the chunk in unsorted chunk list. Chunks are
  not placed into regular bins until after they have
  been given one chance to be used in malloc.
    */
    // 把 chunk 放在 unsorted chunk 链表的头部
    bck = unsorted_chunks(av);
    fwd = bck->fd;
    // 简单的检查
    if (__glibc_unlikely(fwd->bk != bck)) {
        errstr = "free(): corrupted unsorted chunks";
        goto errout;
    }
    p->fd = fwd;
    p->bk = bck;
    // 如果是 large chunk，那就设置nextsize指针字段为NULL。
    if (!in_smallbin_range(size)) {
        p->fd_nextsize = NULL;
        p->bk_nextsize = NULL;
    }
    bck->fd = p;
    fwd->bk = p;

    set_head(p, size | PREV_INUSE);
    set_foot(p, size);

    check_free_chunk(av, p);
}

下一块是top chunk-合并到top chunk

/*
  If the chunk borders the current high end of memory,
  consolidate into top
*/
// 如果要释放的chunk的下一个chunk是top chunk，那就合并到 top chunk
else {
    size += nextsize;
    set_head(p, size | PREV_INUSE);
    av->top = p;
    check_chunk(av, p);
}

向系统返还内存

        /*
          If freeing a large space, consolidate possibly-surrounding
          chunks. Then, if the total unused topmost memory exceeds trim
          threshold, ask malloc_trim to reduce top.

          Unless max_fast is 0, we don't know if there are fastbins
          bordering top, so we cannot tell for sure whether threshold
          has been reached unless fastbins are consolidated.  But we
          don't want to consolidate on each free.  As a compromise,
          consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
          is reached.
        */
         // 如果合并后的 chunk 的大小大于FASTBIN_CONSOLIDATION_THRESHOLD
         // 一般合并到 top chunk 都会执行这部分代码。
         // 那就向系统返还内存
        if ((unsigned long) (size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
            // 如果有 fast chunk 就进行合并
            if (have_fastchunks(av)) malloc_consolidate(av);
            // 主分配区
            if (av == &main_arena) {
#ifndef MORECORE_CANNOT_TRIM
                // top chunk 大于当前的收缩阙值
                if ((unsigned long) (chunksize(av->top)) >=
                    (unsigned long) (mp_.trim_threshold))
                    systrim(mp_.top_pad, av);
#endif      // 非主分配区，则直接收缩heap
            } else {
                /* Always try heap_trim(), even if the top chunk is not
                   large, because the corresponding heap might go away.  */
                heap_info *heap = heap_for_ptr(top(av));

                assert(heap->ar_ptr == av);
                heap_trim(heap, mp_.top_pad);
            }
        }

        if (!have_lock) {
            assert(locked);
            __libc_lock_unlock(av->mutex);
        }

`释放mmap的chunk`

} else {
    //  If the chunk was allocated via mmap, release via munmap().
    munmap_chunk(p);
}