golang 的重试弹性模式-阿里云开发者社区

golang 的重试弹性模式

2024-07-01 12

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，Serverless 5000PCU 100GB

云原生数据库 PolarDB PostgreSQL 版，企业版 4核16GB

云原生内存数据库 Tair，内存型 2GB

简介： Golang 中的重试机制实现了一个名为 `Retrier` 的结构体，用于实现弹性模式。`Retrier` 创建时需要指定重试间隔（如常量间隔或指数递增间隔）和错误分类器。分类器决定了哪些错误应被重试。默认情况下，如果未提供分类器，则使用默认分类器，它简单地将非 nil 错误标记为应重试。提供了三种分类器：默认、白名单和黑名单。`Run` 和 `RunCtx` 是执行重试的函数，后者接受上下文以便处理超时。通过 `calcSleep` 计算带有随机抖动的休眠时间，增加重试的不可预测性，减少并发冲突。如果达到最大重试次数或上下文超时，重试将停止。

golang 的可重构弹性模式。

创建重试器需要两个参数：

重试间隔的时间（隐含重试次数）决定重试哪些错误的分类器

仓库给的例子：

复制代码

r := retrier.New(retrier.ConstantBackoff(3, 100*time.Millisecond), nil)

err := r.Run(func() error {
	// do some work
	return nil
})

if err != nil {
	// handle the case where the work failed three times
}

创建重试器时，传入了两个参数，一个是重试时间的间隔（它是一个 time.Duration 类型的数组,数组的长度就是它隐含的重试次数），另一个是分类器，可以决定哪些错误需要重试，哪些错误不需要重试。

重试器的结构体

复制代码

// Retrier implements the "retriable" resiliency pattern, abstracting out the process of retrying a failed action
// a certain number of times with an optional back-off between each retry.
// Retrier 实现了 "可重试 "弹性模式，将重试失败操作的过程抽象为
// 重试一定次数，每次重试之间可选择后退。
type Retrier struct {
 // 重试时间的间隔
	backoff []time.Duration
	// 分类器
	class   Classifier
	// 基数
	jitter  float64
	// 随机数种子
	rand    *rand.Rand
	// 计算休眠时间的锁
	randMu  sync.Mutex
}

新建一个重试器的函数

复制代码

// New constructs a Retrier with the given backoff pattern and classifier. The length of the backoff pattern
// indicates how many times an action will be retried, and the value at each index indicates the amount of time
// waited before each subsequent retry. The classifier is used to determine which errors should be retried and
// which should cause the retrier to fail fast. The DefaultClassifier is used if nil is passed.
// New 使用给定的后退模式和分类器构建一个 Retrier。后退模式的长度
// 每个索引的值表示每次重试前等待的时间。
// 每次重试前等待的时间。分类器用于确定哪些错误应重试，哪些错误应导致重试。
// 哪些错误会导致重试快速失败。如果传入的是 nil，则使用 DefaultClassifier。
func New(backoff []time.Duration, class Classifier) *Retrier {
  // 如果分类器为 nil,则使用默认的分类器
	if class == nil {
		class = DefaultClassifier{}
	}

	return &Retrier{
		backoff: backoff,
		class:   class,
		rand:    rand.New(rand.NewSource(time.Now().UnixNano())),
	}
}

如果传入的分类器为 nil，则使用默认的分类器

有三种不同的分类器

默认分类器
白名单分类器
黑名单分类器

默认分类器以最简单的方式对错误进行分类。如果错误为 nil，则返回 Succeed，否则返回 Retry

WhitelistClassifier 根据白名单对错误进行分类。如果错误为 nil，则返回 Succeed;如果错误在白名单中，则返回 Retry;否则，它将返回 Fail。

BlacklistClassifier 根据黑名单对错误进行分类。如果错误为 nil，则返回 Succeed;如果错误在黑名单中，则返回 Fail;否则，它将返回 Retry。

重试器的执行有两个函数

一个是执行时，不用传入上下文字段的，实际执行还是调用了需要传入上下文字段的 RunCtx函数，只是传了个非 nil 的空 Context

复制代码

// Run executes the given work function by executing RunCtx without context.Context.
func (r *Retrier) Run(work func() error) error {
	return r.RunCtx(context.Background(), func(ctx context.Context) error {
		// never use ctx
		return work()
	})
}

一个是执行时，需要传入上下文字段的

复制代码

// RunCtx executes the given work function, then classifies its return value based on the classifier used
// to construct the Retrier. If the result is Succeed or Fail, the return value of the work function is
// returned to the caller. If the result is Retry, then Run sleeps according to the its backoff policy
// before retrying. If the total number of retries is exceeded then the return value of the work function
// is returned to the caller regardless.
// 分类器对其返回值进行分类。
// 构造 Retrier 所使用的分类器对其返回值进行分类。如果结果是 "成功 "或 "失败"，工作函数的返回值将
// 返回给调用者。如果结果是重试，运行将根据其后退策略休眠，然后再重试。
// 在重试之前休眠。如果超过了重试的总次数，则工作函数的返回值
// 返回给调用者。
func (r *Retrier) RunCtx(ctx context.Context, work func(ctx context.Context) error) error {
	// 刚开始重试次数为 0
	retries := 0
	for {
	  // 执行工作函数（即我们想要进行处理的逻辑）
		ret := work(ctx)
	  // 分类器根据返回值，判断是否需要重试
		switch r.class.Classify(ret) {
		case Succeed, Fail:
			return ret
		case Retry:
		  // 如果重试次数大于等于隐含的重试次数，返回工作函数的返回值
			if retries >= len(r.backoff) {
				return ret
			}
			// 如果重试次数小于隐含的重试次数，根据当前已重试的次数，计算休眠的时间
			timeout := time.After(r.calcSleep(retries))
			// 执行休眠函数
			if err := r.sleep(ctx, timeout); err != nil {
				return err
			}

			retries++
		}
	}
}

计算休眠时间的函数

这里不理解的是为什么要加锁，看了测试用例，有可能会并发执行 Run 函数，但实际有场景会用得上吗？

这里还有一个基数的作为休息时间的随机性种子，可以通过 SetJitter 函数设置，jitter 的范围在 [0,1],否则设置无效，设置了基数后，回退时间在一定的范围内，比如你设置了基数为 0.25, backoff[i] 为 10 * time.Millisecond，那么这时的回退时间在（7500 * time.Microsecond,12500*time.Microsecond）的范围内

复制代码

func (r *Retrier) calcSleep(i int) time.Duration {
	// lock unsafe rand prng
	r.randMu.Lock()
	defer r.randMu.Unlock()
	// take a random float in the range (-r.jitter, +r.jitter) and multiply it by the base amount
	return r.backoff[i] + time.Duration(((r.rand.Float64()*2)-1)*r.jitter*float64(r.backoff[i]))
}

休眠函数

从代码可以看出，如果到达时间范围了会返回 nil，然后 RunCtx 函数增加重试次数，继续重试，如果传入的上下文有带超时时长，这时候超时时间到了，返回错误，RunCtx 直接退出，这点也就是使用 Run 和 RunCtx 函数的唯一区别

复制代码

func (r *Retrier) sleep(ctx context.Context, t <-chan time.Time) error {
	select {
	case <-t:
		return nil
	case <-ctx.Done():
		return ctx.Err()
	}
}

创建重试器，传递超时时长时，有三个辅助函数

ConstantBackoff

ConstantBackoff 生成一个简单的回退策略，即重试“n”次，并在每次重试后等待相同的时间。

ExponentialBackoff

ExponentialBackoff 生成一个简单的回退策略，即重试 'n' 次，并将每次重试后等待的时间加倍。

LimitedExponentialBackoff

LimitedExponentialBackoff 生成一个简单的回退策略，即重试 'n' 次，并将每次重试后等待的时间加倍。如果回退达到 'limitAmount' ，则此后回退将填充 'limitAmount' 。

转载来源：https://juejin.cn/post/7369884289712275507

golang 的重试弹性模式

数据库

热门文章

最新文章

相关电子书

推荐镜像