HashMap源码解析

2023-02-22 85

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

全局流量管理 GTM，标准版 1个月

公共DNS（含HTTPDNS解析），每月1000万次HTTP解析

云解析 DNS，旗舰版 1个月

简介： 一直没有看HashMap源码上的Javadoc，写的是非常详细，首先翻译一波，第一部分是大概的描述，内容如下：HashMap是基于哈希表的Map接口实现，改实现提供所有可选的map操作，并且允许key为空以及value为空。HashMap和HashTable大致相等，只是HashMap是线程不安全的，而Hashtable是线程安全的，且Hashtable不允许空key和空value。、

一、简介

HashMap是基于哈希表的Map接口实现，改实现提供所有可选的map操作，并且允许key为空以及value为空。HashMap和HashTable大致相等，只是HashMap是线程不安全的，而Hashtable是线程安全的，且Hashtable不允许空key和空value。

影响HashMap性能的两个参数：初始容量(initial capacity)以及负载因子(load factor)，容量我们都知道，就是哈希表中的桶数量，初始容量就是哈希表创建时的初始桶数。负载因子就是描述哈希表自动扩容前的饱和程度，当哈希表中的桶数大于当前容量和负载因子的乘积，hash表会进行rehash(也就是内部数据结构会重建)，hash表的容量会扩容为大概原先的两倍。

通常，默认负载因子(0.75)在时间和空间消耗上刚好折中，更大的值虽然可以减少空间开辟但也会增加查询消耗，如果HashMap中要存储很多元素，创建时指定足够大的初始容量会比自动hash扩容更高效。(注意：hashCode值相同的key会降低hash表的性能)

HashMap是线程不安全的，如果多个线程并发访问，且至少一个线程修改了map的结构，那么必须外部保证线程安全(也就是得我们手动控制)，这里还提出了一个结构化修改，结构化修改就是新增和删除操作，而不是改变key的值。

第二部分是别人写实现时做的一个笔记，内容如下：

由于HashMap是由桶组成的哈希表，当桶中越来越大时，桶中的元素会转换成TreeNodes(红黑树)，和TreeMap相似。桶中的树形节点会进行旋转，这样当节点个数过多时会加快元素检索。

树形结构的桶(元素都是树形节点)主要通过hashCode值排序，但如果hashCode值也相同，那么如果两个元素实现了Comparable接口，此时会根据compareTo方法进行排序。虽然树形结构的桶增加了空间复杂度，但在时间复杂度上有了提升，最差为O(log n)。

当桶中元素的数量达到TREEIFY_THRESHOLD时(默认值为8)，链表会转换为红黑树，而当桶中的元素小于或等于6时，红黑树又会转换为链表。

二、源码解读

1、HashMap中桶(bucket)的数据结构

HashMap中桶的数据结构结构主要有两种，一种是链表，如下：

static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
        Node(int hash, K key, V value, Node<K,V> next) {
            this.hash = hash;
            this.key = key;
            this.value = value;
            this.next = next;
        }
        public final K getKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "=" + value; }
        public final int hashCode() {
            return Objects.hashCode(key) ^ Objects.hashCode(value);
        }
        public final V setValue(V newValue) {
            V oldValue = value;
            value = newValue;
            return oldValue;
        }
        public final boolean equals(Object o) {
            if (o == this)
                return true;
            if (o instanceof Map.Entry) {
                Map.Entry<?,?> e = (Map.Entry<?,?>)o;
                if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false;
        }
    }

还有几个常量，我们看一下：

// 默认初始化容量为16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// 最大容量为2的30次方
static final int MAXIMUM_CAPACITY = 1 << 30;
// 默认负载因子为0.75
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// 转化为红黑树的阀值，当链表长度为8时会转换为红黑树
static final int TREEIFY_THRESHOLD = 8;

2、调用put方法的操作流程

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
}

static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

看图说话，假如HashMap的容量设为16，如果高16位不进行右移，只要两个key的hash值后4位相同，那么 (当前容量-1) & hash运算后得到的下标都是一样的，这样就会发生hash冲突。而高16位右移后与低16位进行异或运算后(扰动)能使后4位的值发生变化，从而减少hash冲突。下面我们具体看看putVal()方法：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 表为空则进行初始化，默认容量为16
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // (n-1) & hash其实就是对长度进行取模
        if ((p = tab[i = (n - 1) & hash]) == null)
            // 当前位置没有值则创建一个新节点
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            // 如果当前下标有元素且hash值和key相同，则是同一个元素
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            // 如果当前下标元素为红黑树节点
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // 否则当前下标元素为链表的头结点
            else {
                // 对链表进行遍历
                for (int binCount = 0; ; ++binCount) {
                    // 遍历到链表尾部
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 链表长度为8转换为红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            // 根据hash值进行排序
                            treeifyBin(tab, hash);
                        break;
                    }
                    // 如果链表上的某个元素hash值与预放置元素的hash值和key相同，则为同一个元素
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        // 大于阀值(当前容量*负载因子)则进行扩容
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
  }

扩容会hash表中的元素会重新进行哈希，下面我们看看resize()方法：

  /**
   * 如果hash表为空，则分配初始容量，否则扩容为原来的两倍，桶中的元素要么还是在原来的位置，
   * 要么移动到(原来位置+原hash表容量)的位置    
   */
  final Node<K, V>[] resize() {
    Node<K, V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
      if (oldCap >= MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return oldTab;
      } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double
                                                                                                                        // threshold
    } else if (oldThr > 0) // initial capacity was placed in threshold
      newCap = oldThr;
    else { // zero initial threshold signifies using defaults
      newCap = DEFAULT_INITIAL_CAPACITY;
      newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
      float ft = (float) newCap * loadFactor;
      newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ? (int) ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({ "rawtypes", "unchecked" })
    Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];
    table = newTab;
    if (oldTab != null) {
      for (int j = 0; j < oldCap; ++j) {
        Node<K, V> e;
        if ((e = oldTab[j]) != null) {
          // 原hash表当前位置元素赋空值，使被引用的对象失去引用，等待垃圾回收
          oldTab[j] = null;
          // 如果改位置只有一个元素
          if (e.next == null)
            //重新hash
            newTab[e.hash & (newCap - 1)] = e;
          else if (e instanceof TreeNode)
            ((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
          // 如果该位置存储的是链表
          else { // preserve order
            Node<K, V> loHead = null, loTail = null;
            Node<K, V> hiHead = null, hiTail = null;
            Node<K, V> next;
            // 遍历链表
            do {
              next = e.next;
              // 同一个链表中哈希值与原容量与运算等于0的放原位置，否则放(当前下标+oldCap)位置 
              if ((e.hash & oldCap) == 0) {
                if (loTail == null)
                  loHead = e;
                else loTail.next = e;
                loTail = e;
              } else {
                if (hiTail == null)
                  hiHead = e;
                else hiTail.next = e;
                hiTail = e;
              }
            } while ((e = next) != null);
            // (hash & oldCap)等于0的元素组成的链表选第一个元素为头节点
            if (loTail != null) {
              loTail.next = null;
              newTab[j] = loHead;
            }
            // (hash & oldCap)不等于0的元素组成的链表选第一个元素为头节点
            if (hiTail != null) {
              hiTail.next = null;
              newTab[j + oldCap] = hiHead;
            }
          }
        }
      }
    }
    return newTab;
  }

3、调用get(Object key)方法的操作流程

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
}

// 根据hash值和key获取节点，hash值就是我们前面看到的(hash>>>16) ^ hash
final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 判断是红黑树还是链表
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 遍历链表，如果hash值和key值都相同则找到了节点
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
}

4、调用remove(Object key)方法的操作流程

public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
}

final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            // 似曾相识的代码，找到待移除的key
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                // 找到红黑树中待移除的节点
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                // 遍历链表
                else {
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                // 如果该位置是普通元素，赋空值,node.next == null
                else if (node == p)
                    tab[index] = node.next;
                // 如果是链表则将当前下标元素的后继指针指向待删除元素的后继指针
                else
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

HashMap源码解析

一、简介

二、源码解读

1、HashMap中桶(bucket)的数据结构

2、调用put方法的操作流程

3、调用get(Object key)方法的操作流程

4、调用remove(Object key)方法的操作流程

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

HashMap源码解析

一、简介

二、源码解读

1、HashMap中桶(bucket)的数据结构

2、调用put方法的操作流程

3、调用get(Object key)方法的操作流程

4、调用remove(Object key)方法的操作流程

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像