Goroutine泄露的危害、成因、检测与防治-阿里云开发者社区

微信截图_20230626173730.png

goroutine泄露的危害

Go内存泄露，相当多数都是goroutine泄露导致的。虽然每个goroutine仅占用少量(栈)内存，但当大量goroutine被创建却不会释放时(即发生了goroutine泄露)，也会消耗大量内存，造成内存泄露。

另外，如果goroutine里还有在堆上申请空间的操作，则这部分堆内存也不能被垃圾回收器回收

坊间有说法，Go 10次内存泄漏，8次goroutine泄漏，1次是真正内存泄漏，还有1次是cgo导致的内存泄漏 (“才高八斗”的既视感..)

关于单个Goroutine占用内存，可参考Golang计算单个Goroutine占用内存, 在不发生栈扩张情况下, 新版本Go大概单个goroutine 占用2.6k左右的内存

massiveGoroutine.go:

package main
import (
  "net/http"
  "runtime/pprof"
)
var quit chan struct{} = make(chan struct{})
func f() {
  // 从无缓冲的channel中读取数据，如果没有写入，会一直阻塞
  <-quit
}
func getGoroutineNum(w http.ResponseWriter, r *http.Request) {
  w.Header().Set("Content-Type", "text/plain")
  p := pprof.Lookup("goroutine")
  p.WriteTo(w, 1)
}
func deal0() {
  // 创建100w协程； 协程中 从一个无缓冲的channel中读取数据，因为没有写入，会一直阻塞，goroutine得不到释放
  for i := 0; i < 100_0000; i++ {
    go f()
  }
  http.HandleFunc("/", getGoroutineNum)
  http.ListenAndServe(":11181", nil)
}
func main() {
  deal0()
}

微信截图_20230626173842.png

微信截图_20230626173859.png

参考 golang使用pprof检查goroutine泄露

造成goroutine泄露的原因 && 检测goroutine泄露的工具

原因：

goroutine泄露：原理、场景、检测和防范比较全面总结了造成goroutine泄露的几个原因：

从 channel 里读，但是同时没有写入操作
向无缓冲 channel 里写，但是同时没有读操作
向已满的有缓冲 channel 里写，但是同时没有读操作
select操作在所有case上都阻塞()
goroutine进入死循环，一直结束不了

可见，很多都是因为channel使用不当造成阻塞，从而导致goroutine也一直阻塞无法退出导致的。

检测:

可以使用pprof做分析，但大多数情况都是发生在事后，无法在开发阶段就把问题提早暴露(即“测试左移”)

而uber出品的goleak可以集成到单元测试中，能快速检测 goroutine 泄露，达到避免和排查的目的

channel使用不当造成的泄露:

例如以下代码 (2.向无缓冲 channel 里写，但是同时没有读操作)

// 只写不读
package main
import (
  "fmt"
  "log"
  "net/http"
  "runtime"
  "strconv"
  "time"
)
// 把数组s中的数字加起来
func sumInt(s []int, c chan int) {
  sum := 0
  for _, v := range s {
    sum += v
  }
  c <- sum
}
// HTTP handler for /sum
func sumConcurrent(w http.ResponseWriter, r *http.Request) {
  x := deal()
  // write the response.
  fmt.Fprintf(w, strconv.Itoa(x))
}
func deal() int {
  s := []int{7, 2, 8, -9, 4, 0}
  c1 := make(chan int)
  c2 := make(chan int)
  go sumInt(s[:len(s)/2], c1) // 即s[0:3],即7,2,8  [a:b]均为左开右闭
  go sumInt(s[len(s)/2:], c2) // 即s[3:6],即-9,4,0
  // 这里故意不在c2中读取数据，导致向c2写数据的协程阻塞。
  x := <-c1
  fmt.Println("x is:", x)
  return x
}
func main() {
  StasticGroutine := func() {
    for {
      time.Sleep(1e9)
      total := runtime.NumGoroutine()
      fmt.Println("当前协程数:", total)
    }
  }
  go StasticGroutine()
  http.HandleFunc("/sum", sumConcurrent)
  err := http.ListenAndServe(":8001", nil)
  if err != nil {
    log.Fatal("ListenAndServe: ", err)
  }
}

微信截图_20230626173959.png

微信截图_20230626174012.png

使用goleak检测,

leak_test.go:

package main
import (
  "go.uber.org/goleak"
  "testing"
)
func TestLeak(t *testing.T) {
  defer goleak.VerifyNone(t)
  deal()
}

微信截图_20230626174059.png

每次都会新建两个协程去处理但对其中一个无缓冲的channel c2只写不读，在这里发生了阻塞，如报错提示：

Goroutine 21 in state chan send，这个协程一直在通道发送状态(因为没有读取，所以一直阻塞着)

更复杂一点的例子：

// 还是只写不读造成阻塞
package main
import (
  "fmt"
  "math/rand"
  "os"
  "runtime"
  "time"
)
func main() {
  deal2()
}
func deal2() {
  fmt.Fprintf(os.Stderr, "最初的协程数%d\n", runtime.NumGoroutine())
  // 生产
  newRandStream := func() <-chan int {
    randStream := make(chan int)
    go func() {
      defer fmt.Println("newRandStream closure exited.")
      defer close(randStream)
      // 死循环：不断向channel中放数据，直到阻塞
      for {
        randStream <- rand.Int()
      }
    }()
    return randStream
  }
  randStream := newRandStream()
  // 消费
  // 只消费3个数据，然后去做其他的事情，此时生产者阻塞，
  // 若主goroutine不处理生产者goroutine，则就产生了泄露
  fmt.Println("3 random ints:")
  for i := 1; i <= 3; i++ {
    fmt.Printf("%d: %d\n", i, <-randStream)
  }
  fmt.Fprintf(os.Stderr, "当前协程数%d\n", runtime.NumGoroutine())
  time.Sleep(10e9)
  fmt.Fprintf(os.Stderr, "10s后的协程数%d\n", runtime.NumGoroutine())
}

执行：

最初的协程数1
3 random ints:
1: 5577006791947779410
2: 8674665223082153551
3: 6129484611666145821
当前协程数2
10s后的协程数2

leak_test.go:

func TestLeak2(t *testing.T) {
  defer goleak.VerifyNone(t)
  deal2()
}

微信截图_20230626174218.png

解决方案：

package main
import (
  "fmt"
  "math/rand"
  "os"
  "runtime"
  "time"
)
func main() {
  fmt.Fprintf(os.Stderr, "最初的协程数%d\n", runtime.NumGoroutine())
  newRandStream := func(done <-chan interface{}) <-chan int {
    randStream := make(chan int)
    go func() {
      defer fmt.Println("newRandStream closure exited.")
      defer close(randStream)
      for {
        select {
        case randStream <- rand.Int():
        case <-done: // 得到通知，结束自己
          return
        }
      }
    }()
    return randStream
  }
  done := make(chan interface{})
  randStream := newRandStream(done)
  fmt.Println("3 random ints:")
  for i := 1; i <= 3; i++ {
    fmt.Printf("%d: %d\n", i, <-randStream)
  }
  fmt.Fprintf(os.Stderr, "当前协程数%d\n", runtime.NumGoroutine())
  // 通知子协程结束自己
  //done <- struct{}{}
  close(done)
  // 模拟程序继续执行
  time.Sleep(1 * time.Second)
  fmt.Fprintf(os.Stderr, "最后的协程数%d\n", runtime.NumGoroutine())
}

输出：

最初的协程数1
3 random ints:
1: 5577006791947779410
2: 8674665223082153551
3: 6129484611666145821
当前协程数2
newRandStream closure exited.
最后的协程数1

详细代码及解决方案，参考 Go并发编程--goroutine leak的产生和解决之道

goroutine leak 往往是由于协程在channel上发生阻塞，或协程进入死循环，在使用channel和goroutine时要注意：

创建goroutine时就要想好该goroutine该如何结束

使用channel时，要考虑到channel阻塞时协程可能的行为

要注意平时一些常见的goroutine leak的场景，包括：master-worker模式，producer-consumer模式等等。

另外几种(1. 从 channel 里读，但是同时没有写入操作; 3. 向已满的有缓冲 channel 里写，但是同时没有读操作)使用channel不当造成阻塞的情况与之类似

select操作在所有case上都阻塞造成的泄露

其实本质上还是channel问题, 因为 select..case只能处理 channel类型, 即每个 case 必须是一个通信操作, 要么是发送要么是接收

select 将随机执行一个可运行的 case。如果没有 case 可运行，它将阻塞，直到有 case 可运行。*

Golang中select的四大用法

4. select操作在所有case上都阻塞 的情况：

package main
import (
  "fmt"
  "runtime"
  "time"
)
func fibonacci(c chan int) {
  fmt.Println("进入协程,开始计算")
  x, y := 0, 1
  for {
    select {
    case c <- x:
      x, y = y, x+y
    }
  }
}
func deal4() {
  c := make(chan int)
  go fibonacci(c)
  for i := 0; i < 10; i++ {
    fmt.Println(<-c)
  }
  // 执行10次后，就不再从channel中读取数据，fibonacci()里select唯一一个case不可运行，这个select被阻塞，从而deal4方法执行结束这个协程也得不到释放
}
func main() {
  fmt.Println("开始时goroutine的数量:", runtime.NumGoroutine())
  deal4()
  time.Sleep(3e9)
  fmt.Println("结束时goroutine的数量:", runtime.NumGoroutine())
}

微信截图_20230626174335.png

微信截图_20230626174353.png

解决方案：

有个独立 goroutine去做某些操作的场景下，为了能在外部结束它，通常有两种方法：

a. 同时传入一个用于控制goroutine退出的 quit channel，配合 select，当需要退出时close 这个 quit channel，该 goroutine 就可以退出

package main
import (
  "fmt"
  "runtime"
  "time"
)
func fibonacci(c, quit chan int) {
  fmt.Println("进入协程,开始计算")
  x, y := 0, 1
  for {
    select {
    case c <- x:
      x, y = y, x+y
    case <-quit:
      fmt.Printf("收到退出的信号，信号值为(%d)\n", <-quit)
      return
    }
  }
}
func deal4() {
  c := make(chan int)
  quit := make(chan int)
  go fibonacci(c, quit)
  for i := 0; i < 10; i++ {
    fmt.Println(<-c)
  }
  // 执行10次后，就不再从channel中读取数据，fibonacci()里select唯一一个case不可运行，这个select被阻塞，从而deal4方法执行结束这个协程也得不到释放；
  // 如果close掉一个无缓冲的channel，可从中读到 对应channel类型的零值，从而满足了第二个case的条件，进而return
  fmt.Println("未close时goroutine的数量:", runtime.NumGoroutine())
  close(quit)
  time.Sleep(1e9)
  fmt.Println("close后goroutine的数量:", runtime.NumGoroutine())
}
func main() {
  fmt.Println("开始时goroutine的数量:", runtime.NumGoroutine())
  deal4()
  time.Sleep(3e9)
  fmt.Println("结束时goroutine的数量:", runtime.NumGoroutine())
}

微信截图_20230626174434.png

微信截图_20230626174444.png

b. 使用 context 包的WithCancel，可参考 context.WithCancel()的使用

time.After和select搭配使用时存在的坑

package main
import (
  "context"
  "fmt"
  "runtime"
  "time"
)
func fibonacci(c chan int, ctx context.Context) {
  fmt.Println("进入协程,开始计算")
  x, y := 0, 1
  for {
    select {
    case c <- x:
      x, y = y, x+y
    case <-ctx.Done():
      fmt.Printf("收到取消的信号，cancel！,信号值为(%#v)\n", <-ctx.Done())
      return
    }
  }
}
func deal4() {
  ctx, cancel := context.WithCancel(context.Background())
  defer cancel()
  c := make(chan int)
  go fibonacci(c, ctx)
  for i := 0; i < 10; i++ {
    fmt.Println(<-c)
  }
  // 执行10次后，就不再从channel中读取数据，fibonacci()里select唯一一个case不可运行，这个select被阻塞，从而deal4方法执行结束这个协程也得不到释放
  // 执行cancel后，见满足第二个case，进而return
  fmt.Println("未close时goroutine的数量:", runtime.NumGoroutine())
  cancel()
  time.Sleep(1e9)
  fmt.Println("close后goroutine的数量:", runtime.NumGoroutine())
}
func main() {
  fmt.Println("开始时goroutine的数量:", runtime.NumGoroutine())
  deal4()
  time.Sleep(3e9)
  fmt.Println("结束时goroutine的数量:", runtime.NumGoroutine())
}

微信截图_20230626174532.png

微信截图_20230626174541.png

关于goleak的更具体使用及简单源码分析，可参考远离P0线上事故，一个可以事前检测 Go 泄漏的工具

Goroutine泄露的危害、成因、检测与防治

goroutine泄露的危害

造成goroutine泄露的原因 && 检测goroutine泄露的工具

原因：

检测:

channel使用不当造成的泄露:

select操作在所有case上都阻塞造成的泄露

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Goroutine泄露的危害、成因、检测与防治

goroutine泄露的危害

造成goroutine泄露的原因 && 检测goroutine泄露的工具

原因：

检测:

channel使用不当造成的泄露:

select操作在所有case上都阻塞造成的泄露

热门文章

最新文章

相关课程

相关电子书