带你读《图解算法小抄》十一、布隆过滤器（2）-阿里云开发者社区

带你读《图解算法小抄》十一、布隆过滤器（2）

2023-10-12 81

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 带你读《图解算法小抄》十一、布隆过滤器（2）

带你读《图解算法小抄》十一、布隆过滤器（1）https://developer.aliyun.com/article/1348194?groupCode=tech_library

4. 误报

误报的概率由三个因素决定：布隆过滤器的大小，我们使用的哈希函数数量，以及已经插入到过滤器中的项目数量。

计算误报概率的公式是：

( 1 - e -kn/m ) k

k = 哈希函数的数量

m = 过滤器大小

n = 插入的项目数量

这些变量，k，m，和n，应该根据误报的接受程度进行选择。如果选择的值导致的概率过高，那么应该调整这些值并重新计算概率。

5. 应用

布隆过滤器可以用于博客网站。如果目标是只向读者展示他们从未见过的文章，布隆过滤器就很完美。它可以存储基于文章的哈希值。用户阅读了几篇文章后，它们可以被插入到过滤器中。下次用户访问网站时，这些文章可以被从结果中过滤掉。

一些文章不可避免地会被误过滤掉，但这种成本是可以接受的。只要他们每次访问网站时都能看到新的文章，用户就不会介意看不到几篇文章。

6. 完整代码

export default class BloomFilter {
  /**
   * @param {number} size - 存储区的大小。
   */
  constructor(size = 100) {
    // 布隆过滤器的大小直接影响误报的可能性。
    // 大小越大，误报的可能性越低。
    this.size = size;
    this.storage = this.createStore(size);
  }
  /**
   * @param {string} item
   */
  insert(item) {
    const hashValues = this.getHashValues(item);
    // 将每个hash值索引设置为true。
    hashValues.forEach((val) => this.storage.setValue(val));
  }
  /**
   * @param {string} item
   * @return {boolean}
   */
  mayContain(item) {
    const hashValues = this.getHashValues(item);
    for (let hashIndex = 0; hashIndex < hashValues.length; hashIndex += 1) {
      if (!this.storage.getValue(hashValues[hashIndex])) {
        // 我们知道该项目肯定未被插入。
        return false;
      }
    }
    // 该项可能插入或可能未插入。
    return true;
  }
  /**
   * 为过滤器创建数据存储。
   * 我们使用此方法生成存储以封装数据本身并只提供访问
   * 必要方法的接口。
   *
   * @param {number} size
   * @return {Object}
   */
  createStore(size) {
    const storage = [];
    // 将所有索引初始化为false
    for (let storageCellIndex = 0; storageCellIndex < size; storageCellIndex += 1) {
      storage.push(false);
    }
    const storageInterface = {
      getValue(index) {
        return storage[index];
      },
      setValue(index) {
        storage[index] = true;
      },
    };
    return storageInterface;
  }
  /**
   * @param {string} item
   * @return {number}
   */
  hash1(item) {
    let hash = 0;
    for (let charIndex = 0; charIndex < item.length; charIndex += 1) {
      const char = item.charCodeAt(charIndex);
      hash = (hash << 5) + hash + char;
      hash &= hash; // 转换为32位整数
      hash = Math.abs(hash);
    }
    return hash % this.size;
  }
  /**
   * @param {string} item
   * @return {number}
   */
  hash2(item) {
    let hash = 5381;
    for (let charIndex = 0; charIndex < item.length; charIndex += 1) {
      const char = item.charCodeAt(charIndex);
      hash = (hash << 5) + hash + char; /* hash * 33 + c */
    }
    return Math.abs(hash % this.size);
  }
  /**
   * @param {string} item
   * @return {number}
   */
  hash3(item) {
    let hash = 0;
    for (let charIndex = 0; charIndex < item.length; charIndex += 1) {
      const char = item.charCodeAt(charIndex);
      hash = (hash << 5) - hash;
      hash += char;
      hash &= hash; // 转换为32位整数
    }
  return Math.abs(hash % this.size);
  }
  /**
   * 在输入上运行所有3个哈希函数并返回结果数组。
   *
   * @param {string} item
   * @return {number[]}
   */
  getHashValues(item) {
    return [
      this.hash1(item),
      this.hash2(item),
      this.hash3(item),
    ];
  }}