SpringBoot 中使用布隆过滤器 Guava、Redission实现 1:https://developer.aliyun.com/article/1394545
四、小结
我实际测试的时候,Guava 的效果应该是最好的,Redission 虽然是直接集成了Redis,但实际效果比起Guava较差一些,我这里没有贴上时间,Redission所创建出来的布隆过滤器,速度较慢。
当然我的测试范围是有限的,并且只是循环测试,另外服务器也并非在本地,这都有影响。
但是仅目前看来是这样的。
还有就是将 Guava 结合 Redis 一起使用。
五、Guava 布隆过滤器结合 Redis 使用
仅限于测试,一切效果还是需看实测。
我是以 Guava 中创建 布隆过滤器为基础,利用它构造的方法,来进行修改,功能相比于 guava 还是少了很多的。
package com.nzc.boom; import com.google.common.annotations.VisibleForTesting; import com.google.common.base.Preconditions; import com.google.common.hash.Funnel; import com.google.common.hash.Hashing; import com.google.common.primitives.Longs; public class BloomFilterHelper<T> { private int numHashFunctions; private int bitSize; private Funnel<T> funnel; public BloomFilterHelper(Funnel<T> funnel, int expectedInsertions, double fpp) { Preconditions.checkArgument(funnel != null, "funnel不能为空"); this.funnel = funnel; // 计算bit数组长度 bitSize = optimalNumOfBits(expectedInsertions, fpp); // 计算hash方法执行次数 numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, bitSize); } /** 源码 *public <T> boolean mightContain( * T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) { * long bitSize = bits.bitSize(); * byte[] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal(); * long hash1 = lowerEight(bytes); * long hash2 = upperEight(bytes); * * long combinedHash = hash1; * for (int i = 0; i < numHashFunctions; i++) { * // Make the combined hash positive and indexable * if (!bits.get((combinedHash & Long.MAX_VALUE) % bitSize)) { * return false; * } * combinedHash += hash2; * } * return true; * } * @param value * @return */ public long[] murmurHashOffset(T value) { long[] offset = new long[numHashFunctions]; byte[] bytes = Hashing.murmur3_128().hashObject(value, funnel).asBytes(); long hash1 = lowerEight(bytes); long hash2 = upperEight(bytes); long combinedHash = hash1; for (int i = 1; i <= numHashFunctions; i++) { long nextHash = hash1 + i * hash2; if (nextHash < 0) { nextHash = ~nextHash; } offset[i - 1] = nextHash % bitSize; } return offset; } private /* static */ long lowerEight(byte[] bytes) { return Longs.fromBytes( bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]); } private /* static */ long upperEight(byte[] bytes) { return Longs.fromBytes( bytes[15], bytes[14], bytes[13], bytes[12], bytes[11], bytes[10], bytes[9], bytes[8]); } /** * 计算bit数组长度 * 同样是guava创建布隆过滤器中的计算bit数组长度方法 */ private int optimalNumOfBits(long n, double p) { if (p == 0) { // 设定最小期望长度 p = Double.MIN_VALUE; } return (int) (-n * Math.log(p) / (Math.log(2) * Math.log(2))); } /** * 这里是从guava 中 copy 出来的 * 就是guava 创建一个 布隆过滤器时, * 计算hash方法执行次数的方法 */ private int optimalNumOfHashFunctions(long n, long m) { int countOfHash = Math.max(1, (int) Math.round((double) m / n * Math.log(2))); return countOfHash; } }
以上的这些代码,在guava包都可以找到的。
在redisConfig中注入布隆过滤器
/** * @description: * @author: Yihui Wang * @date: 2022年08月26日 22:06 */ @Configuration @EnableCaching public class RedisConfig { @Bean public CacheManager cacheManager(RedisConnectionFactory connectionFactory) { RedisCacheManager rcm=RedisCacheManager.create(connectionFactory); return rcm; } @Bean public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) { RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>(); redisTemplate.setConnectionFactory(factory); Jackson2JsonRedisSerializer jackson2JsonRedisSerializer = new Jackson2JsonRedisSerializer(Object.class); ObjectMapper om = new ObjectMapper(); om.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY); om.enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL); jackson2JsonRedisSerializer.setObjectMapper(om); //序列化设置 ,这样计算是正常显示的数据,也能正常存储和获取 redisTemplate.setKeySerializer(jackson2JsonRedisSerializer); redisTemplate.setValueSerializer(jackson2JsonRedisSerializer); redisTemplate.setHashKeySerializer(jackson2JsonRedisSerializer); redisTemplate.setHashValueSerializer(jackson2JsonRedisSerializer); return redisTemplate; } @Bean public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) { StringRedisTemplate stringRedisTemplate = new StringRedisTemplate(); stringRedisTemplate.setConnectionFactory(factory); return stringRedisTemplate; } //初始化布隆过滤器,放入到spring容器里面 @Bean public BloomFilterHelper<String> initBloomFilterHelper() { return new BloomFilterHelper<String>((Funnel<String>) (from, into) -> into.putString(from, Charsets.UTF_8).putString(from, Charsets.UTF_8), 1000, 0.01); } @Bean public BloomFilterHelper<Long> initLongBloomFilterHelper() { return new BloomFilterHelper<Long>((Funnel<Long>) (from, into) -> into.putLong(from),1000, 0.01); } }
也就是注入我们刚刚编写的那个布隆过滤器。
然后再编写一个Service 层
/** * @description: * @author: Yihui Wang */ @Slf4j @Service public class RedisBloomFilter { @Autowired private RedisTemplate redisTemplate; /** * 根据给定的布隆过滤器添加值 */ public <T> void addByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能为空"); long[] offset = bloomFilterHelper.murmurHashOffset(value); for (long i : offset) { log.info("key :{} ,value : {}", key, i); redisTemplate.opsForValue().setBit(key, i, true); } } /** * 根据给定的布隆过滤器判断值是否存在 */ public <T> boolean includeByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) { Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能为空"); long[] offset = bloomFilterHelper.murmurHashOffset(value); for (long i : offset) { log.info("key :{} ,value : {}", key, i); if (!redisTemplate.opsForValue().getBit(key, i)) { return false; } } return true; } }
测试:
@Test public void test1() { // 预期插入数量 long capacity = 1000L; // 错误比率 double errorRate = 0.01; for (long i = 0; i < capacity; i++) { redisBloomFilter.addByBloomFilter(bloomFilterHelper, "nzc:bloomFilter1", i); } log.info("存入元素为=={}", capacity); // 统计误判次数 int count = 0; // 我在数据范围之外的数据,测试相同量的数据,判断错误率是不是符合我们当时设定的错误率 for (long i = capacity; i < capacity * 2; i++) { if (redisBloomFilter.includeByBloomFilter(bloomFilterHelper, "nzc:bloomFilter1", i)) { count++; } } System.out.println("误判个数为==>" + count); }
输出:
存入元素为==1000 误判个数为==>12
后记
终于又到周六周日啦,这次一定要好好整点东西出来,完成自己的flag~