某一天,和小伙伴之间的话题不知怎么转到如何实现Object::hashCode上,于是就有了这篇文章。
有什么好讨论的呢,取对象的内存基址不就挺好的吗?方便又高效。且看下文的讨论
当GC发生时……
JavaDoc中描述了Object::hashCode的三个约束,其中要求对象不变时其hash code就应该不变,Object本身没什么属性可变的,自然hash code也就不会变。而Java是自带GC的语言,大家都知道。某些GC算法,比如Copy,比如Mark-Compact都会移动对象,自然地对象的基址也会改变,基于内存基址实现hashCode返回值就有可能在GC后变了。
我们还是假设就用对象内存基址做hashCode的返回值,这样通常也不会有什么问题,毕竟直接调用hashCode方法等场景少之又少。直到遇到以下场景
Object obj = new Object(); // allocated at 0x02
Map<Object, String> map = new HashMap<>(); // 16 slots
map.put(obj, "a1"); // assume hashed in slot[0x02]
// after GC, obj moved (0x02 -> 0x20)
String value = map.get(obj); // assume hashed in slot[0x00]
System.out.println("true or false? : " + (value == null)); // ???
虽然我们不太可能会用到一个Object instance作为map的key,但如果以内存基址作为hashCode的实现还真是令人头皮发麻:刚存到map不久的数据居然找不回来了!
解决对象移动
好的,既然对象可能跑来跑去,每次都取内存基址行不通,不过又要求生成后就不变,那我们要找个字段把Object的hashCode存好。类似这样
class Object {
private final int _hashCode = _toAddress(this);
public int hashCode() {
return _hashCode;
}
}
一切完美,无论对象被移动多少次,我的map都可以正常工作。不过缺点也很明显,比较浪费内存:Java中所有的类都是Object的子类,于是每个类都至少多占用一个Word的内存,而且这个字段绝大部分情况也是用不到的。
怎么更省空间
从上面讨论来看,为了保证hashCode的约束,这个Word无论如何都省不掉,我们最好能让这字段能存更多信息,比如放Java对象头中。首先从openjdk(jdk-9+181)里面抠点信息,了解一下一个Word究竟怎么个物尽其用
// hotspot/src/share/vm/oops/markOop.hpp
// 64 bits:
// --------
// unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
// PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
// size:64 ----------------------------------------------------->| (CMS free block)
//
// unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object)
// JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object)
// narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
// unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)
// - the two lock bits are used to describe three states: locked/unlocked and monitor.
//
// [ptr | 00] locked ptr points to real header on stack
// [header | 0 | 01] unlocked regular object header
// [ptr | 10] monitor inflated lock (header is wapped out)
// [ptr | 11] marked used by markSweep to mark an object
// not valid at any other time
可以看到一个Word里面存了几个信息:hash code、锁优化标识、GC标识,主要是根据末两位标识做不同的表意,甚至这个东西上锁时还会copy来copy去。不过我们还是只关注hash code,下面用hsdb工具浏览一下JVM内存。
首先要写一个小demo
public class Hash {
int verbose;
public Hash(int verbose) {this.verbose = verbose;}
public static void main(String[] args) throws Exception {
Hash h1 = new Hash(0x1234);
Hash h2 = new Hash(0x5678);
System.out.println("breakpoint 1");
System.out.println("before gc, h1.hashCode=" + Integer.toHexString(h1.hashCode()) +
", h2.hashCode=" + Integer.toHexString(h2.hashCode()));
System.out.println("breakpoint 2");
h1 = null;
System.gc();
System.out.println("after gc, h2.hashCode=" + Integer.toHexString(h2.hashCode()));
System.out.println("breakpoint 3");
}
}
代码的目的是借用Hotspot的System.gc方法触发FullGC,使得h2对象被复制到old gen。接下来要用调试器调试代码,eclipse、IDEA什么的都OK,在对应的地方加上断点。注意为了按预期执行和方便查看,要设置一下JVM参数: -XX:+UseSerialGC -Xmx10m -XX:-UseCompressedOops
。
假设程序已经停在了 System.out.println("breakpoint 1")
,我们就可以启动hsdb attach到目标进程:
# JDK 8
java -cp .:$JAVA_HOME/lib/sa-jdi.jar sun.jvm.hotspot.HSDB
# JDK 9
jhsdb hsdb
进入到hsdb后,先用 Tools - Find Object by Query
OQL查出所有实例: select x from test.Hash x
,然后用各种查看器看内存数据即可。一顿操作后类似这个样子
# Hash h1
hsdb> inspect 0x000000010b33d690
instance of Oop for test/Hash @ 0x000000010b33d690 @ 0x000000010b33d690 (size = 24)
_mark: 1
_metadata._klass: InstanceKlass for test/Hash
verbose: 4660
hsdb> mem 0x000000010b33d690 3
0x000000010b33d690: 0x0000000000000001
0x000000010b33d698: 0x000000010c000578
0x000000010b33d6a0: 0x0000000000001234
# Hash h2
hsdb> inspect 0x000000010b33d6a8
instance of Oop for test/Hash @ 0x000000010b33d6a8 @ 0x000000010b33d6a8 (size = 24)
_mark: 1
_metadata._klass: InstanceKlass for test/Hash
verbose: 22136
hsdb> mem 0x000000010b33d6a8 3
0x000000010b33d6a8: 0x0000000000000001
0x000000010b33d6b0: 0x000000010c000578
0x000000010b33d6b8: 0x0000000000005678
可以看到两个对象的的MarkWord都是0x0000000000000001,即未被锁定、没有偏向、分代年龄为0、hashCode还未分配。后面的Class标识、实例字段和padding略过不谈。
下一步是让程序执行到第二个断点(注意,要先让hsdb detach,否则调试器无法工作),即 System.out.println("breakpoint 2")
,程序控制台也输出了:
breakpoint 1
before gc, h1.hashCode=6f2b958e, h2.hashCode=1eb44e46
hsdb再次连上,查看数据,发现预期一样写入了对应的位: 0x000000 6f2b958e 01 0x000000 1eb44e46 01
# Hash h1
hsdb> mem 0x000000010b33d690 3
0x000000010b33d690: 0x0000006f2b958e01
0x000000010b33d698: 0x000000010c000578
0x000000010b33d6a0: 0x0000000000001234
# Hash h2
hsdb> mem 0x000000010b33d6a8 3
0x000000010b33d6a8: 0x0000001eb44e4601
0x000000010b33d6b0: 0x000000010c000578
0x000000010b33d6b8: 0x0000000000005678
再让程序执行到第三个断点,程序输出 after gc, h2.hashCode=1eb44e46
,hash code没变。理论上此时h1被回收,h2被copy到old gen,地址变化了。于是使用OQL再次查询h2的地址为0x000000010b5ea220,查看内存如下
# Hash h2
hsdb> mem 0x000000010b5ea220 3
0x000000010b5ea220: 0x0000001eb44e4601
0x000000010b5ea228: 0x000000010c000578
0x000000010b5ea230: 0x0000000000005678
对象数据不变,所以还是能从MarkWord 0x000000 1eb44e46 01 中取出生成过的hash code。那此时h2被copy到哪里了呢?再次执行universe命令,看堆概况
hsdb> universe
Heap Parameters:
Gen 0: eden [0x000000010b200000,0x000000010b20dc68,0x000000010b4b0000) space capacity = 2818048, 2.0022370094476742 used
from [0x000000010b4b0000,0x000000010b4b0000,0x000000010b500000) space capacity = 327680, 0.0 used
to [0x000000010b500000,0x000000010b500000,0x000000010b550000) space capacity = 327680, 0.0 usedInvocations: 0
Gen 1: old [0x000000010b550000,0x000000010b5eabd0,0x000000010bc00000) space capacity = 7012352, 9.038451007593459 usedInvocations: 1
输出含义: [0x000000010b200000,0x000000010b20dc68,0x000000010b4b0000) 表示的是分代回收中区(eden、survivor、old gen等)内存地址段,三个地址分别表示段起始、已分配指针、段截止。可以看到GC前h2地址(0x000000010b33d6a8)在eden区,而GC后h2地址(0x000000010b5ea220)落在old gen。
总结
回到标题,hashCode的返回值很明确不仅仅是对象地址。从openjdk源码中可以找到其实现,目前默认用hashCode=5的实现。有兴趣的同学可以试试加上 -XX:+UnlockExperimentalVMOptions -XX:hashCode=2
再输出对象的hashCode
// hotspot/src/share/vm/runtime/synchronizer.cpp
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0;
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random();
} else if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom;
} else if (hashCode == 2) {
value = 1; // for sensitivity testing
} else if (hashCode == 3) {
value = ++GVars.hcSequence;
} else if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj);
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX;
t ^= (t << 11);
Self->_hashStateX = Self->_hashStateY;
Self->_hashStateY = Self->_hashStateZ;
Self->_hashStateZ = Self->_hashStateW;
unsigned v = Self->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
Self->_hashStateW = v;
value = v;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD;
assert(value != markOopDesc::no_hash, "invariant");
TEVENT(hashCode: GENERATE);
return value;
}
参考资料
- 借HSDB来探索HotSpot VM的运行时数据 http://rednaxelafx.iteye.com/blog/1847971
- Java对象结构 https://www.jianshu.com/p/ec28e3a59e80