go map源码探索（查找、插入、删除、扩容、遍历）

2023-10-06 01:00:50

文章目录

- 概要
- 一、Go map结构
- 二、Go map初始化
- - - 2.1、不带容量初始化
    - 2.2、带容量初始化
- 三、Go map查找
- 四、Go map插入
- - - 4.1、插入源码分析
    - 4.2、溢出桶申请策略
- 五、删除源码分析
- 六、扩容与迁移源码分析
- - - 6.1、扩容条件
    - - 6.1.1、当前负载因子大于6.5
        6.1.2、有过多的溢出桶
    - 6.2、扩容
    - 6.3、迁移
- 七、遍历源码分析
- - - 7.1、mapiterinit函数
    - 7.2、mapiternext函数
- 八、总结

概要

本人在看《Go语言底层原理剖析》（郑建勋/著）这本书的【哈希表与Go语言实现机制】章节时，map的源码讲得不是很透彻（可能受限于书本章节大小吧），这里针对go map的初始化、查找、遍历、插入、删除、扩容等机制进行源码分析。

PS: go V1.19.2

一、Go map结构

// Maximum number of key/elem pairs a bucket can hold.
bucketCntBits = 3
bucketCnt     = 1 << bucketCntBits//每个桶中元素个数为8// Represent as loadFactorNum/loadFactorDen, to allow integer math.
loadFactorNum = 13
loadFactorDen = 2 //13除以2得到负载因子 6.5
// flags
iterator     = 1 // there may be an iterator using buckets
oldIterator  = 2 // there may be an iterator using oldbuckets
hashWriting  = 4 // a goroutine is writing to the map
sameSizeGrow = 8 // the current map growth is to a new map of the same size
// A header for a Go map.
type hmap struct {count     int //map中键值对个数 ，len(map)就是取的该值flags     uint8 //标志位，用于表示当前map正在写等状态B         uint8  // 2^B表示map中桶的个数noverflow uint16 // 溢出桶的个数,其只表示实时申请的溢出桶个数，不包含预申请的溢出桶。approximate number of overflow buckets; see incrnoverflow for detailshash0     uint32 // hash seedbuckets    unsafe.Pointer // 桶数组。array of 2^B Buckets. may be nil if count==0.oldbuckets unsafe.Pointer // 扩容时用，暂存旧的桶。previous bucket array of half the size, non-nil only when growingnevacuate  uintptr        // 表示扩容时迁移进度，小于该值的表示已经被迁移了。progress counter for evacuation (buckets less than this have been evacuated)extra *mapextra // optional fields
}
// mapextra holds fields that are not present on all maps.
//overflow与nextOverflow没有任何从属关系
type mapextra struct {overflow    *[]*bmap //等nextOverflow没有时，插入时又需要溢出桶，此时申请的溢出桶就放在这里，可称之为实时申请的溢出桶oldoverflow *[]*bmap //扩容时旧的溢出桶// nextOverflow holds a pointer to a free overflow bucket.nextOverflow *bmap //预申请的溢出桶，在map初始化的时候确定，与h.buckets属于同一块连续内存
}
// 桶内部结构
type bmap struct {tophash [bucketCnt]uint8//tophash的元素是取key的hash值的高8位，用于快速定位key的位置key[bucketCnt]unsafe.Pointer//连续的key,便于内存对齐以节省空间value[bucketCnt]unsafe.Pointer//连续的value,便于内存对齐以节省空间overflow    *bmap //溢出桶指针
}

对于bmap结构体中的key,value,overflow并未显式定义的，而是直接通过指针运算进行访问的
另外很多map操作函数都以maptype结构体作为第一个参数，可以看到里面包含了hash函数，key，value大小（便于通过指针运算访问）等内容

type maptype struct {typ    _type//map类型key    *_type //key类型elem   *_type //value类型bucket *_type // internal type representing a hash bucket// function for hashing keys (ptr to key, seed) -> hashhasher     func(unsafe.Pointer, uintptr) uintptr //hsah函数keysize    uint8  // size of key slotelemsize   uint8  // size of elem slotbucketsize uint16 // size of bucket，即bmap大小flags      uint32
}

Go map结构图如下：
go map宏观结构图
从上面hmap和bmap结构体来看，对于hash冲突，Go map通过拉链法（溢出桶的存在）和探测法（每个bucket包含8个键值对）来解决的。

二、Go map初始化

我们在使用make函数初始化map的时候可以分为不带容量初始化和带容量初始化。

2.1、不带容量初始化

这种情况较为简单，其调用makemap_small，如下：

func makemap_small() *hmap {h := new(hmap)h.hash0 = fastrand()//生成hash种子return h
}

可以看到只是初始化了一个hmap类型的指针，并没有申请其它内存。

2.2、带容量初始化

带容量初始化的时候如果容量类型是int64,会调用makemap64函数，如下：

func makemap64(t *maptype, hint int64, h *hmap) *hmap {if int64(int(hint)) != hint {hint = 0}return makemap(t, int(hint), h)
}

否则直接调用makemap函数，其逻辑如下：

//hint 是make是定义的map的键值对个数，比如make(map[int]bool,111),那么hint就是111
func makemap(t *maptype, hint int, h *hmap) *hmap {mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)if overflow || mem > maxAlloc {hint = 0}// initialize Hmapif h == nil {h = new(hmap)}h.hash0 = fastrand()//生成hash种子// Find the size parameter B which will hold the requested # of elements.// For hint < 0 overLoadFactor returns false since hint < bucketCnt.B := uint8(0)for overLoadFactor(hint, B) {//根据hint值计算B的大小，即桶个数B++ //如果此时超过负载因子，B就加1}h.B = B// if B == 0, the buckets field is allocated lazily later (in mapassign)// If hint is large zeroing this memory could take a while.if h.B != 0 {var nextOverflow *bmaph.buckets, nextOverflow = makeBucketArray(t, h.B, nil)//申请桶和溢出桶内存if nextOverflow != nil {h.extra = new(mapextra)h.extra.nextOverflow = nextOverflow}}return h
}

func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {base := bucketShift(b)//计算桶个数，2^Bnbuckets := base//对于map键值对个数较少时并不会申请溢出桶，避免计算开销if b >= 4 {//只有b>=4时才会按照2^(b-4来申请溢出桶个数nbuckets += bucketShift(b - 4)sz := t.bucket.size * nbucketsup := roundupsize(sz)//内存对齐if up != sz {nbuckets = up / t.bucket.size//得出内存对齐后的所有桶个数}}//申请一段连续内存（包括正常桶和溢出桶）if dirtyalloc == nil {buckets = newarray(t.bucket, int(nbuckets))} else {buckets = dirtyallocsize := t.bucket.size * nbucketsif t.bucket.ptrdata != 0 {memclrHasPointers(buckets, size)} else {memclrNoHeapPointers(buckets, size)}}//根据指针运算定位溢出桶位置，可以看到并未将预申请的溢出桶计入noverflowif base != nbuckets {nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))//找到属于溢出桶的那部分内存last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))//找到最后一个溢出桶last.setoverflow(t, (*bmap)(buckets))//设置最后一个溢出桶的溢出指针（即bmap的overflow字段）为第一个桶，这时只是为了占个位}return buckets, nextOverflow
}

这里为啥要给最后一个溢出桶bmap的字段overflow设置一个值，而其他溢出桶该值依旧是nil，没做任何设置呢？？请看后续map 扩容章节分解。

func (b *bmap) setoverflow(t *maptype, ovf *bmap) {*(**bmap)(add(unsafe.Pointer(b), uintptr(t.bucketsize)-goarch.PtrSize)) = ovf
}
func add(p unsafe.Pointer, x uintptr) unsafe.Pointer {return unsafe.Pointer(uintptr(p) + x)
}

goarch.PtrSize就是指针大小，32位操作系统下其值就是4,64位操作系统下其值就是8

根据setoverflow和add函数可以得到：unsafe.Pointer(b)+uintptr(t.bucketsize)-goarch.PtrSize的指针运算结果正好是bmap的overflow字段的地址，所以setoverflow函数的作用是设置bmap的overflow字段的值。

ps:对于字面量初始化（即通过var a = map[int]bool={1:true}模式来初始化map）的这里不做探索。

三、Go map查找

针对var a = map[int]bool={1:true}分析，我们进行map查找有两种：

b:=a[1]。这种情况下key 1 存在就返回相应的value,不存在就返回value类型的零值，即false，对应源码函数为mapaccess1;
b,ok:=a[1]。这种情况key 1 存在b就是key对应的value，ok为true,否则b就是value类型的零值，ok为false，对应源码函数为mapaccess2;

两种种查找过模式的核心逻辑一样：

计算key的hash值；
通过B的大小和hash值来确定key在哪个桶(假设此时B=2，则桶个数b=4,所以取hash值的后4位，如果后四位是0101，0101用十进制表示为5，所以在5号桶）；
如果处于扩容后的迁移期间，定位key位于的旧桶，若旧桶没被迁移，则到该旧桶查找value;
通过2、3步定位key所在桶后，计算tophash值，即hash值的高八位；
遍历桶中元素，根据tophash值和key来找value；
如果第五步没找到，则切换到溢出桶，如此重复5,6步，找到就结束遍历，否则一直循环，直至遍历完所有溢出桶；

代码如下（以mapaccess1为例）：

func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {//省略代码...if h == nil || h.count == 0 {//这里可以看出查找未初始化的map也不会报错，直接返回value类型零值if t.hashMightPanic() {t.hasher(key, 0) }return unsafe.Pointer(&zeroVal[0])}if h.flags&hashWriting != 0 {//如果map正在写，不允许读fatal("concurrent map read and map write")}hash := t.hasher(key, uintptr(h.hash0))// 计算key的hash值m := bucketMask(h.B)//计算掩码，即2^B - 1//hash&m等价于取模运算（公式：X % 2^ n = X & (2^ n – 1)，只适用于2的次方），用于获取桶的偏移量b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))//通过if c := h.oldbuckets; c != nil {if !h.sameSizeGrow() {// There used to be half as many buckets; mask down one more power of two.m >>= 1}oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))//定位旧桶if !evacuated(oldb) {//旧桶没迁移就到该旧桶查找valueb = oldb}}top := tophash(hash)//计算tophash，取hash高八位
bucketloop:for ; b != nil; b = b.overflow(t) {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {//通过tophash加速定位key的位置if b.tophash[i] == emptyRest {//如果遇到emptyRest标志位，表示不管是正常桶，还是溢出桶，后面都没有元素了，直接跳出循环即可break bucketloop}continue}k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))//tophash相等，就通过指针运算取真实的key值if t.indirectkey() {k = *((*unsafe.Pointer)(k))}if t.key.equal(key, k) {//tophash相等只能说明key hash值的高八位一样，还需要对比此key值与要找的key是否相等e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))if t.indirectelem() {e = *((*unsafe.Pointer)(e))}return e}}}return unsafe.Pointer(&zeroVal[0])
}

取hash的掩码

func bucketShift(b uint8) uintptr {//计算当前桶个数，可以看出桶个数（不包括溢出桶）在64为系统下最多是2^63个，32位系统下最多是2^31个// Masking the shift amount allows overflow checks to be elided.return uintptr(1) << (b & (goarch.PtrSize*8 - 1))
}
// bucketMask returns 1<
func bucketMask(b uint8) uintptr {return bucketShift(b) - 1
}

四、Go map插入

插入逻辑步骤如下：

计算key的hash值；
通过B的大小和hash值来确定key在哪个桶(假设此时B=2，则桶个数b=4,所以取hash值的后4位，如果后四位是0101，0101用十进制表示为5，所以在5号桶）；
如果处于扩容后的迁移期间，就先把key所映射的旧桶及其溢出桶的内容迁移到新的桶中;
遍历新桶，根据tophash值和key来定位key是否存在；
存在就直接结束，并返回value的地址，用于更新value；
不存在就找一个空位，用于插入key和value；
再判定是否需要扩容，需要的话就先扩容，之后跳到第2步；
不需要扩容就设置key及其tophash,并返回value地址，用户设置value的值。

4.1、插入源码分析

向map插入键值对核心逻辑就是找key是否存在，存在就更新value值，不存在就插入该键值对，下面一起来看看源码吧。

看代码的时候大家不妨带着这样的疑问：每个桶大小是8，负载因子是6.5，那为什么还需要判定溢出桶过多呢？按理说每个桶至多达到7个就会扩容了呀，哪里会导致溢出桶过多呢？

先说下答案，都是因为扩容后的迁移采用的是渐进式策略,那么在没将旧桶迁移完的时候就不能进行再次扩容，这就导致会出现一个桶必须要有溢出桶兜底，举个极端的例子，扩容后，插入的成百上千key都被hash到某桶A,那A桶后面必然出现很多溢出桶，这种情况就需要判定溢出桶过多。正所谓鱼与熊掌不可兼得也。

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {if h == nil {//向未初始化的map中插入数据会panicpanic(plainError("assignment to entry in nil map"))}//省略代码...if h.flags&hashWriting != 0 {//如果map正在写，不允许插入，否则报错fatal("concurrent map writes")}hash := t.hasher(key, uintptr(h.hash0))// 计算key的hash值h.flags ^= hashWriting//设置map状态标志位为正在写if h.buckets == nil {//如果当前map的桶未初始化，就申请内存，针对在初始还map时未声明容量大小或容量大小为0的情况h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)} again:bucket := hash & bucketMask(h.B)//没啥好说的，在查找章节说过了，计算key所在桶的内存偏移量if h.growing() {//如果map正在扩容growWork(t, h, bucket)//先把key所映射的旧桶内容迁移到新的桶中，详情待后续map 扩容章节讲解}b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))//根据偏移量得到桶内容top := tophash(hash)//计算tophashvar inserti *uint8 var insertk unsafe.Pointervar elem unsafe.Pointer//key对应value的指针，会赋给该变量 bucketloop:for {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {//tophash值不等，说明key不存在if isEmpty(b.tophash[i]) && inserti == nil {//如果当前tophash位置是空的inserti = &b.tophash[i] //取当前tophash地址赋给inserti变量insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))//通过指针运算取对应key地址赋给insertK变量elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))//取对应value地址赋给elem变量}if b.tophash[i] == emptyRest {//如果遇到emptyRest标志位，表示不管是正常桶，还是溢出桶，后面都没有元素了，直接跳出循环即可break bucketloop}continue}//tophash值相等，说明key可能存在k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))if t.indirectkey() {k = *((*unsafe.Pointer)(k))}if !t.key.equal(key, k) {//对比此key值与要插入的key是否相等，不相等继续循环continue}// already have a mapping for key. Update it.if t.needkeyupdate() {typedmemmove(t.key, k, key)}//key相等的话就将value的地址赋给elem并go to don以退出函数elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))goto done}//没找到的话就到溢出桶里找ovf := b.overflow(t)if ovf == nil {//溢出桶存在就找break}b = ovf}//如果当前map不在进行扩容迁移，判断当前负载是否大于负载因子6.5，或者溢出桶数量是否过多//任何一个满足就进行扩容，注意，此时只是扩容，并不迁移，Go map为了避免一次迁移所有数据太耗资源，采用渐进式迁移策略。if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {hashGrow(t, h)goto again //重新跑一遍again 域内的逻辑}//已有的桶都满了，这个新的key没位置了，就申请一个溢出桶if inserti == nil {// The current bucket and all the overflow buckets connected to it are full, allocate a new one.newb := h.newoverflow(t, b)inserti = &newb.tophash[0]insertk = add(unsafe.Pointer(newb), dataOffset)elem = add(insertk, bucketCnt*uintptr(t.keysize))}//把新的key,value存储到正当的位置if t.indirectkey() {//申请key的内存kmem := newobject(t.key)*(*unsafe.Pointer)(insertk) = kmeminsertk = kmem}if t.indirectelem() {申请value的内存vmem := newobject(t.elem)*(*unsafe.Pointer)(elem) = vmem}typedmemmove(t.key, insertk, key)//存储key*inserti = top//存储tophashh.count++ //不存在的key就计数+1 //看到这里的同学可能会有疑问，value啥时候存储呢，可以看到最终把value地址elem返回 //在调用完本函数后会将真实的value赋给elem,本函数并不会设置value的值 done:if h.flags&hashWriting == 0 {fatal("concurrent map writes")}h.flags &^= hashWritingif t.indirectelem() {elem = *((*unsafe.Pointer)(elem))}return elem }

4.2、溢出桶申请策略

通过阅读mapassign函数源码，我们知道h.newoverflow函数会做溢出桶申请的工作。

为什么ovf.overflow(t) == nil 的判定为false就知道该桶是预申请的溢出桶的最后一个呢？
细心的同学应该还记得2.2小节的makeBucketArray函数，关键代码如下：

if base != nbuckets {nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))//找到属于溢出桶的那部分内存last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))//找到最后一个溢出桶last.setoverflow(t, (*bmap)(buckets))//设置最后一个溢出桶的溢出指针（即bmap的overflow字段）为第一个桶，这时只是为了占个位}

可以看到当初始化map的时候，如果需要预申请溢出桶，就会特地找到最后一个溢出桶，并将其溢出桶指针设置为buckets，即非空。

func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {var ovf *bmapif h.extra != nil && h.extra.nextOverflow != nil {//使用预申请的溢出桶，详见2.2节makeBucketArray函数// We have preallocated overflow buckets available，See makeBucketArray for more details.ovf = h.extra.nextOverflow//预申请的溢出桶if ovf.overflow(t) == nil {//判断ovf的溢出桶指针是否为nil，不为nil的话表示则是nextOverflow最后一个桶了// We're not at the end of the preallocated overflow buckets. Bump the pointer.h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.bucketsize)))//ovf不是最后一个桶的话就将h.extra.nextOverflow后移} else {// This is the last preallocated overflow bucket， Reset the overflow pointer on this bucket,//which was set to a non-nil sentinel value.ovf.setoverflow(t, nil)//最后一个桶就把其溢出桶指针重置为nilh.extra.nextOverflow = nil//设为nil，表示预申请的溢出桶耗尽，后续就要申请内存了}} else {ovf = (*bmap)(newobject(t.bucket))//申请一个溢出桶}h.incrnoverflow()//溢出桶计数if t.bucket.ptrdata == 0 {h.createOverflow()//初始化溢出桶管理*h.extra.overflow = append(*h.extra.overflow, ovf)//放到溢出桶数组中}b.setoverflow(t, ovf)return ovf }// To keep hmap small, noverflow is a uint16. // When there are few buckets, noverflow is an exact count. // When there are many buckets, noverflow is an approximate count. //溢出桶计数，可以看到当溢出桶个数较小的时候其值是准确的，过大就只是一个近似值 func (h *hmap) incrnoverflow() {if h.B < 16 {h.noverflow++return}mask := uint32(1)<<(h.B-15) - 1// Example: if h.B == 18, then mask == 7, and fastrand & 7 == 0 with probability 1/8.if fastrand()&mask == 0 {h.noverflow++} }

所以可以看出，对于溢出桶的管理分两部分：

h.extra.nextOverflow是预申请的溢出桶，其与正常桶是在同一个数组。
h.extra.overflow是当预申请的溢出桶没有的时候才会被初始化，是独立申请的。

五、删除源码分析

删除与插入差不多，只不过找到后要对相应空，找不到就直接返回，这里简单说下。

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {//省略代码...if h == nil || h.count == 0 {//向未初始化的map中插入数据会panicif t.hashMightPanic() {t.hasher(key, 0) // see issue 23734}return}if h.flags&hashWriting != 0 {//如果map正在写，不允许删除，否则panicfatal("concurrent map writes")}hash := t.hasher(key, uintptr(h.hash0))// 计算key的hash值h.flags ^= hashWriting//设置map状态标志位为正在写bucket := hash & bucketMask(h.B)// 计算key所在桶的内存偏移量if h.growing() {//如果map正在扩容growWork(t, h, bucket)//先把key所映射的旧桶内容迁移到新的桶中，详情待后续map 扩容章节讲解}b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))bOrig := btop := tophash(hash)//计算tophash值，取hash高8位 search:for ; b != nil; b = b.overflow(t) {//循环遍历正常桶和溢出桶for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {if b.tophash[i] == emptyRest {//如果遇到emptyRest标志位还没找到，直接跳出循环即可break search}continue //tophash值不相等就继续循环}k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))k2 := kif t.indirectkey() {k2 = *((*unsafe.Pointer)(k2))}if !t.key.equal(key, k2) {//此key不等于要删除的key，就继续找continue}//找到了，清除keyif t.indirectkey() {*(*unsafe.Pointer)(k) = nil} else if t.key.ptrdata != 0 {memclrHasPointers(k, t.key.size)}//清除valuee := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))if t.indirectelem() {*(*unsafe.Pointer)(e) = nil} else if t.elem.ptrdata != 0 {memclrHasPointers(e, t.elem.size)} else {memclrNoHeapPointers(e, t.elem.size)}b.tophash[i] = emptyOne//对应tophash的值设置空标志位// If the bucket now ends in a bunch of emptyOne states,// change those to emptyRest states.// It would be nice to make this a separate function, but// for loops are not currently inlineable.if i == bucketCnt-1 {if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {//如果当前桶的溢出桶非空桶，表示后面还有键值对goto notLast}} else {if b.tophash[i+1] != emptyRest {goto notLast}}for {//设置tophash标志位为emptyRest,表示此index之后(包括溢出桶)不存在键值对了，减少查找时的无效遍历b.tophash[i] = emptyRest//省略代码...}notLast:h.count--// Reset the hash seed to make it more difficult for attackers to// repeatedly trigger hash collisions. See issue 25237.if h.count == 0 {h.hash0 = fastrand()//重置hash种子}break search}}if h.flags&hashWriting == 0 {fatal("concurrent map writes")}h.flags &^= hashWriting }

六、扩容与迁移源码分析

6.1、扩容条件

我们在4.1节看到扩容有两个条件：
1）当前负载因子大于6.5；
2）有过多的溢出桶

6.1.1、当前负载因子大于6.5

func overLoadFactor(count int, B uint8) bool {return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen) }

逻辑简单，就是当前map键值对个数count除以桶个数2^B > 6.5
当前负载过小说明空间利用率低，负载过大说明冲突严重，存取效率低。

6.1.2、有过多的溢出桶

func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {if B > 15 {B = 15}// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.return noverflow >= uint16(1)<<(B&15) }

当B<16时，溢出桶个数大于正常桶个数就属于有过多溢出桶；
当B>16时，溢出桶个数大于2^15就属于有过多溢出桶；

6.2、扩容

扩容发生在插入时，如果负载因子大于6.5则进行增量扩容，否则是等量扩容，因为等量扩容时桶个数不变，所以key在新桶的索引也不变
等量扩容也叫伪缩容，它只是把减少溢出桶的数量，并不会减少正常桶数量
扩容逻辑简单，重新申请一块内存，旧的桶暂时放到h.oldbuckets,h.extra.oldoverflow字段上。

func hashGrow(t *maptype, h *hmap) {// If we've hit the load factor, get bigger.// Otherwise, there are too many overflow buckets,// so keep the same number of buckets and "grow" laterally.bigger := uint8(1)if !overLoadFactor(h.count+1, h.B) {//如果不满足负载因子大于6.5，但满足有过多溢出桶，说明键值对存储不够紧凑，浪费了大量空间bigger = 0h.flags |= sameSizeGrow//等量扩容}oldbuckets := h.bucketsnewbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)//重新申请内存，不管是增量扩容还是等量扩容flags := h.flags &^ (iterator | oldIterator)if h.flags&iterator != 0 {flags |= oldIterator}// commit the grow (atomic wrt gc)h.B += biggerh.flags = flagsh.oldbuckets = oldbuckets//将旧的桶转移h.buckets = newbuckets//设置新的桶h.nevacuate = 0h.noverflow = 0//if h.extra != nil && h.extra.overflow != nil {// Promote current overflow buckets to the old generation.if h.extra.oldoverflow != nil {throw("oldoverflow is not nil")}h.extra.oldoverflow = h.extra.overflow //将旧的溢出桶转移h.extra.overflow = nil}if nextOverflow != nil {if h.extra == nil {h.extra = new(mapextra)}h.extra.nextOverflow = nextOverflow//设置新的预申请的溢出桶} }

6.3、迁移

迁移发生在插入和删除时，并且每次最多迁移2个bucket,一个是插入删除key所映射的旧桶，另一个是按照迁移进度排到的旧桶，这样就能保证一定能迁移完，如果只迁移插入删除key所映射的旧桶，虽说严格遵循了copy on write,但可能会导致很久很久也迁不完。
迁移插入删除key所映射的旧桶，不表示旧桶中一定包含该key

func growWork(t *maptype, h *hmap, bucket uintptr) {// make sure we evacuate the oldbucket corresponding to the bucket we're about to useevacuate(t, h, bucket&h.oldbucketmask())//优先迁移插入删除key所映射的旧桶// evacuate one more oldbucket to make progress on growingif h.growing() {evacuate(t, h, h.nevacuate)//按正常的迁移进度迁} } func evacuate(t *maptype, h *hmap, oldbucket uintptr) {b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))newbit := h.noldbuckets()//旧桶的个数if !evacuated(b) {//判定下当前桶是否迁移过了，已经迁移的不操作了，否则进行迁移操作//迁移旧桶及其溢出桶代码 ...}if oldbucket == h.nevacuate {advanceEvacuationMark(h, t, newbit)//更新迁移进度} } func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {h.nevacuate++stop := h.nevacuate + 1024if stop > newbit {stop = newbit}for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {//如果下一个旧桶也迁移完了，就令h.nevacuate++h.nevacuate++}if h.nevacuate == newbit { // 表示迁移完了，将h.oldbuckets与h.extra.oldoverflow置空h.oldbuckets = nilif h.extra != nil {h.extra.oldoverflow = nil}h.flags &^= sameSizeGrow} }

七、遍历源码分析

map的迭代是通过hiter结构以及两个辅助函数（mapiterinit和mapiternext）实现的，hiter结构由编译器在调用辅助函数之前创建并传入，每次迭代结果也由hiter结构传回。
hiter结构结构如下：

type hiter struct {key unsafe.Pointer // Must be in first position. Write nil to indicate iteration end (see cmd/compile/internal/walk/range.go).elem unsafe.Pointer // Must be in second position (see cmd/compile/internal/walk/range.go).t *maptypeh *hmapbuckets unsafe.Pointer // bucket ptr at hash_iter initialization timebptr *bmap // current bucket，正在遍历的桶overflow *[]*bmap // keeps overflow buckets of hmap.buckets aliveoldoverflow *[]*bmap // keeps overflow buckets of hmap.oldbuckets alivestartBucket uintptr // bucket iteration started atoffset uint8 // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)wrapped bool // already wrapped around from end of bucket array to beginningB uint8 //遍历初始化时桶个数i uint8 //已经遍历的键值对数量，i初始为0，当i=8时表示这个桶遍历完了，将it.bptr移向下一个桶bucket uintptrcheckBucket uintptr }

Go map的这种遍历机制会出现遍历中途插入的key可能遍历得到也可能遍历不到

7.1、mapiterinit函数

其主要工作是保存当前map快照，并决定从哪个桶开始遍历，所以每次遍历的顺序是不一样的。

func mapiterinit(t *maptype, h *hmap, it *hiter) {//省略代码...it.t = tif h == nil || h.count == 0 {return}if unsafe.Sizeof(hiter{})/goarch.PtrSize != 12 {throw("hash_iter size incorrect") // see cmd/compile/internal/reflectdata/reflect.go}it.h = h//保存一下map快照it.B = h.Bit.buckets = h.bucketsif t.bucket.ptrdata == 0 {h.createOverflow()it.overflow = h.extra.overflowit.oldoverflow = h.extra.oldoverflow}// 生成一个随机种子，决定从哪个桶开始迭代，这也是为什么每次map遍历都不一样的原因var r uintptrif h.B > 31-bucketCntBits {r = uintptr(fastrand64())} else {r = uintptr(fastrand())}it.startBucket = r & bucketMask(h.B)//这个是桶数组的偏移量，表示遍历从哪个桶开始it.offset = uint8(r >> h.B & (bucketCnt - 1))//桶内的偏移量，表示每个桶的遍历都从这个偏移量开始// iterator stateit.bucket = it.startBucket//省略代码...mapiternext(it) }

7.2、mapiternext函数

func mapiternext(it *hiter) {h := it.hif h.flags&hashWriting != 0 {fatal("concurrent map iteration and map write")}t := it.tbucket := it.bucketb := it.bptri := it.icheckBucket := it.checkBucket next:if b == nil {if bucket == it.startBucket && it.wrapped {// end of iterationit.key = nilit.elem = nilreturn}if h.growing() && it.B == h.B {//如果正在扩容迁移，要么遍历开始于扩容后，要么是等量扩容，才符合要求oldbucket := bucket & it.h.oldbucketmask()b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.bucketsize)))if !evacuated(b) {checkBucket = bucket} else {//旧桶已经迁移的话就到新桶里面去b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))checkBucket = noCheck}} else {b = (*bmap)(add(it.buckets, bucket*uintptr(t.bucketsize)))checkBucket = noCheck}bucket++if bucket == bucketShift(it.B) {bucket = 0it.wrapped = true}i = 0}//代码省略...}b = b.overflow(t)i = 0goto next }

八、总结

本文针对map的初始化，查找，插入，删除，扩容迁移，遍历等操作做了源码探索，总结以下几个特征：

hash冲突采用拉链法与线性探测法；
遍历是无序的，每次遍历顺序都不同；
扩容时渐进式的，迁移发生在插入和删除的时候，每次至多迁移2个桶，首先迁移插入删除key所映射的旧桶，再按迁移进度迁移另一个桶；
map的key是无法取地址的，因为扩容会导致key地址发生变更；
map不支持并发读-写，写-写。

时间复杂度分析：

正常情况，且不考虑扩容状态，复杂度O(1)：通过hash值定位桶是O(1)，一个桶最多8个元素，合理的hash算法应该能把元素相对均匀散列，所以溢出链表（如果有）也不会太长，所以虽然在桶和溢出链表上定位key是遍历，考虑到数量小也可以认为是O(1)
正常情况，处于扩容状态时，复杂度也是O(1)：相比于上一种状态，扩容会增加搬迁最多2个桶和溢出链表的时间消耗，当溢出链表不太长时，复杂度也可以认为是O(1)
极端情况，散列极不均匀，大部分数据被集中在一条散列链表上，复杂度退化为O(n)。
go采用的hash算法应是很成熟的算法，极端情况暂不考虑。所以综合情况下go map的时间复杂度应为O(1)

空间复杂度分析：

首先我们不考虑因删除大量元素导致的空间浪费情况（这种情况现在go是留给程序员自己解决），只考虑一个持续增长状态的map的一个空间使用率：
由于溢出桶数量超过hash桶数量时会触发缩容，所以最坏的情况是数据被集中在一条链上，hash表基本是空的，这时空间浪费O(n)。
最好的情况下，数据均匀散列在hash表上，没有元素溢出，这时最好的空间复杂度就是扩散因子决定了，当前go的扩散因子由全局变量决定，即loadFactorNum/loadFactorDen = 6.5。即平均每个hash桶被分配到6.5个元素以上时，开始扩容。所以最小的空间浪费是(8-6.5)/8 = 0.1875，即O(0.1875n)

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

收藏

标签：技术

上一篇 > VSCode 完全卸载

下一篇 > Golang程序结构概述

相关文章

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce