【密码学】一文读懂MurMurHash3
本文应该是MurMurHash算法介绍的最后一篇,来一起看一下最新的MurMurHash算法的具体过程,对于最新的算法来说,整个流程和之前的其实也比较相似,这里从维基百科当中找到了伪代码,也就不贴出来Google官方给出的推荐代码了,先来看一下维基百科给出的伪代码,这里只有32位的伪代码。
algorithm Murmur3_32 is // Note: In this version, all arithmetic is performed with unsigned 32-bit integers. // In the case of overflow, the result is reduced modulo 232. input: key, len, seed c1 ← 0xcc9e2d51 c2 ← 0x1b873593 r1 ← 15 r2 ← 13 m ← 5 n ← 0xe6546b64 hash ← seed for each fourByteChunk of key do k ← fourByteChunk k ← k × c1 k ← k ROL r1 k ← k × c2 hash ← hash XOR k hash ← hash ROL r2 hash ← (hash × m) + n with any remainingBytesInKey do remainingBytes ← SwapToLittleEndian(remainingBytesInKey) // Note: Endian swapping is only necessary on big-endian machines. // The purpose is to place the meaningful digits towards the low end of the value, // so that these digits have the greatest potential to affect the low range digits // in the subsequent multiplication. Consider that locating the meaningful digits // in the high range would produce a greater effect upon the high digits of the // multiplication, and notably, that such high digits are likely to be discarded // by the modulo arithmetic under overflow. We don't want that. remainingBytes ← remainingBytes × c1 remainingBytes ← remainingBytes ROL r1 remainingBytes ← remainingBytes × c2 hash ← hash XOR remainingBytes hash ← hash XOR len hash ← hash XOR (hash >> 16) hash ← hash × 0x85ebca6b hash ← hash XOR (hash >> 13) hash ← hash × 0xc2b2ae35 hash ← hash XOR (hash >> 16)
这个入参和之前的两个算法都是一样的,依然可以看做是两个,第一个是需要哈希的值,第二个是seed。对于这个seed的取值,看了一下一些语言的实现,有些是0x0,有些是其他的值,目前也没发现推荐的值,如果有知道这个seed有什么取的方案的话,也欢迎和我交流。
算法过程
MurMurHash3算法结构
这里依然稍微改了一下之前用到的图,也是一个出现了三次的老演员了,其实除了整个对消息的哈希处理过程当中,对于具体的哈希函数有一些差别,整个算法结构依然一样。好了到这里,有关MurMurHash算法的相关知识就水完了,恭喜我,又填完了一个坑。
代码实现
这里只给出32位的版本了,128位的就不写了,偷一个懒。
Rust
在查找相关代码的时候,这里学了rust的一个新的函数,这些代码主要是借鉴了参考资料里面的代码,他这个seed固定了另外一个值,原因未知,这里我改了一下,seed还是当参数吧。
use std::convert::TryInto; use std::u32; const C1: u32 = 0xcc9e_2d51; const C2: u32 = 0x1b87_3593; const D: u32 = 0xe654_6b64; const FMIX1: u32 = 0x85eb_ca6b; const FMIX2: u32 = 0xc2b2_ae35; fn fmix32(mut h: u32) -> u32 { h ^= h >> 16; h = h.wrapping_mul(FMIX1); h ^= h >> 13; h = h.wrapping_mul(FMIX2); h ^= h >> 16; h } pub fn murmurhash3(key: &[u8], seed: u32) -> u32 { let mut h = seed; let mut chunks = key.chunks_exact(4); while let Some(chunk) = chunks.next() { let mut k: u32 = u32::from_le_bytes(chunk.try_into().unwrap()); k = k.wrapping_mul(C1); k = k.rotate_left(15); k = k.wrapping_mul(C2); h ^= k; h = h.rotate_left(13); h = (h.wrapping_mul(5)).wrapping_add(D); } let remainder = chunks.remainder(); match remainder.len() { 3 => { let mut k = u32::from(remainder[2]) << 16; k ^= u32::from(remainder[1]) << 8; k ^= u32::from(remainder[0]); k = k.wrapping_mul(C1); k = k.rotate_left(15); k = k.wrapping_mul(C2); h ^= k; } 2 => { let mut k = u32::from(remainder[1]) << 8; k ^= u32::from(remainder[0]); k = k.wrapping_mul(C1); k = k.rotate_left(15); k = k.wrapping_mul(C2); h ^= k; } 1 => { let mut k = u32::from(remainder[0]); k = k.wrapping_mul(C1); k = k.rotate_left(15); k = k.wrapping_mul(C2); h ^= k; } _ => {} } fmix32(h ^ key.len() as u32) }
Go
这里,用Go来在实现一下,就当学习了。
const C1 = 0xcc9e_2d51 const C2 = 0x1b87_3593 const D = 0xe654_6b64 const FMIX1 = 0x85eb_ca6b const FMIX2 = 0xc2b2_ae35 func fmix32(h uint32) uint32 { h ^= h >> 16 h *= FMIX1 h ^= h >> 13 h *= FMIX2 h ^= h >> 16 return h } func murmurhash3(key []byte, seed uint32) uint32 { h := seed keyLen := uint32(len(key)) offset := 0 e := binary.LittleEndian for keyLen >= 4 { k := e.Uint32(key[offset : offset+4]) fmt.Println(k) k *= C1 k = bits.RotateLeft32(k, 15) k *= C2 h ^= k h = bits.RotateLeft32(h, 13) h *= 5 h += D offset += 4 keyLen -= 4 } k1 := uint32(0) switch keyLen { case 3: k1 ^= uint32(key[offset+2]) << 16 fallthrough case 2: k1 ^= uint32(key[offset+1]) << 8 fallthrough case 1: k1 ^= uint32(key[offset]) k1 *= C1 k1 = bits.RotateLeft32(k1, 15) k1 *= C2 h ^= k1 } return fmix32(h ^ uint32(len(key))) }