SnowFlake 雪花算法和原理（分布式 id 生成算法）

2023-08-20 137

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： SnowFlake 雪花算法和原理（分布式 id 生成算法）

一、概述

SnowFlake 算法：是 Twitter 开源的分布式 id 生成算法。

核心思想：使用一个 64 bit 的 long 型的数字作为全局唯一 id。

编辑

算法原理

最高位是符号位，始终为0，不可用。

41位的时间序列，精确到毫秒级，41位的长度可以使用69年。时间位还有一个很重要的作用是可以根据时间进行排序。

10位的机器标识，10位的长度最多支持部署1024个节点

12位的计数序列号，序列号即一系列的自增id，可以支持同一节点同一毫秒生成多个ID序号，12位的计数序列号支持每个节点每毫秒产生4096个ID序号

算法优缺点

优点

高并发分布式环境下生成不重复 id，每秒可生成百万个不重复 id。

基于时间戳，以及同一时间戳下序列号自增，基本保证 id 有序递增。

不依赖第三方库或者中间件。

算法简单，在内存中进行，效率高。

缺点

依赖服务器时间，服务器时钟回拨时可能会生成重复 id。算法中可通过记录最后一个生成 id 时的时间戳来解决，每次生成 id 之前比较当前服务器时钟是否被回拨，避免生成重复 id。

二、算法实现

<dependency>
<groupId>cn.hutool</groupId>
<artifactId>hutool-all</artifactId>
<version>5.8.11</version>
</dependency>

public class IdWorker {
    //下面两个每个5位，加起来就是10位的工作机器id
    private long workerId;    //工作id
    private long datacenterId;   //数据id
    //12位的序列号
    private long sequence;
    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        System.out.printf("worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);
        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;
    }
    //初始时间戳
    private long twepoch = 1288834974657L;
    //长度为5位
    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;
    //最大值
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    //序列号id长度
    private long sequenceBits = 12L;
    //序列号最大值
    private long sequenceMask = -1L ^ (-1L << sequenceBits);
    //工作id需要左移的位数，12位
    private long workerIdShift = sequenceBits;
    //数据id需要左移位数 12+5=17位
    private long datacenterIdShift = sequenceBits + workerIdBits;
    //时间戳需要左移位数 12+5+5=22位
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    //上次时间戳，初始值为负数
    private long lastTimestamp = -1L;
    public long getWorkerId() {
        return workerId;
    }
    public long getDatacenterId() {
        return datacenterId;
    }
    public long getTimestamp() {
        return System.currentTimeMillis();
    }
    //下一个ID生成算法
    public synchronized long nextId() {
        long timestamp = timeGen();
        //获取当前时间戳如果小于上次时间戳，则表示时间戳获取出现异常
        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds",
                    lastTimestamp - timestamp));
        }
        //获取当前时间戳如果等于上次时间戳（同一毫秒内），则在序列号加一；否则序列号赋值为0，从0开始。
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }
        //将上次时间戳值刷新
        lastTimestamp = timestamp;
        /**
         * 返回结果：
         * (timestamp - twepoch) << timestampLeftShift) 表示将时间戳减去初始时间戳，再左移相应位数
         * (datacenterId << datacenterIdShift) 表示将数据id左移相应位数
         * (workerId << workerIdShift) 表示将工作id左移相应位数
         * | 是按位或运算符，例如：x | y，只有当x，y都为0的时候结果才为0，其它情况结果都为1。
         * 因为个部分只有相应位上的值有意义，其它位上都是0，所以将各部分的值进行 | 运算就能得到最终拼接好的id
         */
        return ((timestamp - twepoch) << timestampLeftShift) |
                (datacenterId << datacenterIdShift) |
                (workerId << workerIdShift) |
                sequence;
    }
    //获取时间戳，并与上次时间戳比较
    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }
    //获取系统时间戳
    private long timeGen() {
        return System.currentTimeMillis();
    }
    //---------------测试---------------
    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {
            System.out.println(worker.nextId());
        }
    }
}

解决时间回拨问题

原生的 Snowflake 算法是完全依赖于时间的，如果有时钟回拨的情况发生，会生成重复的 ID，市场上的解决方案也是不少。简单粗暴的办法有：

最简单的方案，就是关闭生成唯一 ID 机器的时间同步。

使用阿里云的的时间服务器进行同步，2017 年 1 月 1 日的闰秒调整，阿里云服务器 NTP 系统 24 小时“消化”闰秒，完美解决了问题。

如果发现有时钟回拨，时间很短比如 5 毫秒，就等待，然后再生成。或者就直接报错，交给业务层去处理。也可以采用 SonyFlake 的方案，精确到 10 毫秒，以 10 毫秒为分配单元。

twitter的雪花算法：GitHub - twitter-archive/snowflake: Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.

其它全局唯一的分布式ID的方式：如百度的uid-generator、美团的Leaf、滴滴的TinyId等

文章下方有交流学习区！一起学习进步！也可以前往官网，加入官方微信交流群你的支持和鼓励是我创作的动力❗❗❗

官网：Doker 多克; 官方旗舰店：首页-Doker 多克多克创新科技企业店-淘宝网全品优惠

SnowFlake 雪花算法和原理（分布式 id 生成算法）

一、概述

算法原理

算法优缺点

二、算法实现

解决时间回拨问题

热门文章

最新文章

相关课程

相关电子书

相关实验场景