深度剖析Java HashMap：源码分析、线程安全与最佳实践-阿里云开发者社区

深度剖析Java HashMap：源码分析、线程安全与最佳实践

2024-08-16 277

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 深度剖析Java HashMap：源码分析、线程安全与最佳实践

Java中的HashMap是最常用的数据结构之一，在实际开发中起着至关重要的作用。本文将详细探讨HashMap的工作原理、源码分析、线程安全问题、以及扩容机制等方面。

一、HashMap的基本概念

HashMap是Java集合框架中的一个类，提供了基于哈希表的数据结构。它允许存储键值对，并通过键快速检索对应的值。HashMap允许键和值为null，并且不保证映射的顺序。

二、HashMap的工作原理

HashMap通过哈希函数将键映射到桶（bucket）数组中的一个位置，以实现快速查找。基本操作如put和get的时间复杂度为O(1)。

1. 哈希函数

HashMap使用键的hashCode()方法计算哈希值，然后通过取模运算（hash % array.length）将哈希值映射到数组的索引位置。例如：

int hash = key.hashCode();
int index = (array.length - 1) & hash;

2. 处理哈希冲突

当两个不同的键有相同的哈希值时，会发生哈希冲突。HashMap使用链地址法（separate chaining）处理冲突，即每个桶存储一个链表或红黑树。当链表长度超过阈值（默认为8）时，链表转换为红黑树，以提高查询效率。

三、源码分析

以下是HashMap的核心代码段，包含put方法和get方法。

1. put方法

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}
 
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1)
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) {
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

2. get方法

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
 
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

3. 线程不安全的原因

上述put和get方法在多线程环境中是不安全的。具体原因如下：

put方法线程不安全分析

扩容（resize）：当HashMap需要扩容时，可能会导致多个线程同时进行扩容操作。这会导致数据丢失和不一致。

if (++size > threshold)
    resize();

插入节点（newNode）：插入节点时，多个线程可能会同时访问同一个桶位置，导致链表或树结构损坏。

if ((p = tab[i = (n - 1) & hash]) == null)
    tab[i] = newNode(hash, key, value, null);

链表操作：在处理哈希冲突时，链表或红黑树的插入操作不是原子的，可能会导致链表结构损坏。

for (int binCount = 0; ; ++binCount) {
    if ((e = p.next) == null) {
        p.next = newNode(hash, key, value, null);
        if (binCount >= TREEIFY_THRESHOLD - 1)
            treeifyBin(tab, hash);
        break;
    }
    if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
        break;
    p = e;
}

get方法线程不安全分析

读取不一致：在读取节点时，如果另一个线程正在进行插入或删除操作，可能会导致读取的数据不一致。

if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {
    if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k))))
        return first;
    if ((e = first.next) != null) {
        if (first instanceof TreeNode)
            return ((TreeNode<K,V>)first).getTreeNode(hash, key);
        do {
            if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        } while ((e = e.next) != null);
    }
}

由于这些原因，HashMap在多线程环境中使用时可能会导致不可预测的问题。因此，在多线程环境中，建议使用ConcurrentHashMap替代HashMap。

四、线程安全的解决方案

1. 使用`ConcurrentHashMap`

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
 
public class ConcurrentHashMapExample {
    private static final ConcurrentHashMap<Integer, String> map = new ConcurrentHashMap<>();
 
    public static void main(String[] args) {
        ExecutorService executorService = Executors.newFixedThreadPool(10);
 
        // 使用多个线程并发访问和修改ConcurrentHashMap
        for (int i = 0; i < 100; i++) {
            final int key = i;
            executorService.execute(() -> map.put(key, "Value" + key));
        }
 
        // 读取ConcurrentHashMap中的数据
        executorService.execute(() -> {
            for (int i = 0; i < 100; i++) {
                System.out.println("Key: " + i + ", Value: " + map.get(i));
            }
        });
 
        executorService.shutdown();
    }
}

五、HashMap的初始值设置

在实际开发中，合理设置HashMap的初始容量和负载因子可以提高性能，减少扩容次数。

1. 初始容量

初始容量是HashMap创建时桶数组的大小，默认值为16。初始容量应根据预期的元素数量和负载因子计算：

int initialCapacity = (int) (expectedSize / loadFactor) + 1;

例如，如果预期有100个元素，负载因子为0.75：

int initialCapacity = (int) (100 / 0.75) + 1; // 约等于134

2. 负载因子

负载因子是HashMap在扩容之前允许的最大填充比例，默认值为0.75。负载因子越小，HashMap的空间利用率越低，但查找效率更高。一般情况下，使用默认负载因子即可。

Map<Integer, String> map = new HashMap<>(initialCapacity, 0.75f);

合理设置初始容量和负载因子，可以避免频繁扩容，提高性能。在不确定具体情况时，默认值通常是一个好的选择。

六、HashMap家族中的其他实现

在Java中，除了HashMap，还有其他几个基于哈希表的数据结构实现，它们各自有不同的特点和用途。

1. LinkedHashMap

LinkedHashMap继承自HashMap，并且保留了插入顺序。它使用一个双向链表来维护插入顺序，可以用于需要保持元素顺序的场景。

Map<Integer, String> linkedHashMap = new LinkedHashMap<>();
linkedHashMap.put(1, "one");
linkedHashMap.put(2, "two");
linkedHashMap.put(3, "three");
System.out.println(linkedHashMap); // 输出顺序为1, 2, 3

2. ConcurrentHashMap

ConcurrentHashMap是一个线程安全的HashMap实现，使用了分段锁（segment locking）机制来提高并发性能。适用于高并发场景。

Map<Integer, String> concurrentHashMap = new ConcurrentHashMap<>();
concurrentHashMap.put(1, "one");
concurrentHashMap.put(2, "two");
concurrentHashMap.put(3, "three");
System.out.println(concurrentHashMap);

3. WeakHashMap

WeakHashMap是一种使用弱引用（weak reference）的哈希表实现。其键在没有其他强引用时可以被垃圾回收器回收。适用于缓存和内存敏感的场景。

Map<Integer, String> weakHashMap = new WeakHashMap<>();
Integer key = new Integer(1);
weakHashMap.put(key, "one");
key = null;
System.gc();
System.out.println(weakHashMap); // 可能为空，因为key可能被回收

4. IdentityHashMap

IdentityHashMap使用键的引用相等性（reference equality）而不是键的equals方法来比较键。适用于需要比较对象引用而不是对象内容的场景。

Map<Integer, String> identityHashMap = new IdentityHashMap<>();
identityHashMap.put(new Integer(1), "one");
identityHashMap.put(new Integer(1), "one again");
System.out.println(identityHashMap.size()); // 输出2

深度剖析Java HashMap：源码分析、线程安全与最佳实践

一、HashMap的基本概念

二、HashMap的工作原理

1. 哈希函数

2. 处理哈希冲突

三、源码分析

1. put方法

2. get方法

3. 线程不安全的原因

put方法线程不安全分析

get方法线程不安全分析

四、线程安全的解决方案

1. 使用`ConcurrentHashMap`

五、HashMap的初始值设置

1. 初始容量

2. 负载因子

六、HashMap家族中的其他实现

1. LinkedHashMap

2. ConcurrentHashMap

3. WeakHashMap

4. IdentityHashMap

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

深度剖析Java HashMap：源码分析、线程安全与最佳实践

一、HashMap的基本概念

二、HashMap的工作原理

1. 哈希函数

2. 处理哈希冲突

三、源码分析

1. put方法

2. get方法

3. 线程不安全的原因

put方法线程不安全分析

get方法线程不安全分析

四、线程安全的解决方案

1. 使用ConcurrentHashMap

五、HashMap的初始值设置

1. 初始容量

2. 负载因子

六、HashMap家族中的其他实现

1. LinkedHashMap

2. ConcurrentHashMap

3. WeakHashMap

4. IdentityHashMap

热门文章

最新文章

相关课程

相关电子书

1. 使用`ConcurrentHashMap`