kubernetes版本:1.13.2
背景
由于operator创建的 redis集群,在kubernetes apiserver重启后,redis集群被异常删除(包括redis exporter statefulset、redis statefulset)。删除后operator将其重建,重新组建集群,实例IP发生变更(中间件容器化,我们开发了固定IP,当statefulset删除后,IP会被回收),导致创建集群失败,最终集群不可用。 经多次复现,apiserver重启后,通过查询redis operator日志,并没有发现主动去删除redis集群(redis statefulset)、监控实例(redis exporter)。进一步去查看kube-controller-manager的日志,将其日志级别设置--v=5,继续复现,最终在kube-controller-manager日志中发现如下日志: 可以看到是garbage collector触发删除操作的。这个问题在apiserver正常的时候是不存在,要想弄其究竟,就得 看看kube-controller-manager内置组件garbage collector这个控制器的逻辑。
由于内容偏长,分为多节来讲:
①、monitors
作为生产者将变化的资源放入graphChanges
队列;同时restMapper
定期检测集群内资源类型,刷新monitors
②、runProcessGraphChanges
从graphChanges
队列中取出变化的item
,根据情况放入attemptToDelete
队列;runAttemptToDeleteWorker
取出处理垃圾资源;
③、runProcessGraphChanges
从graphChanges
队列中取出变化的item
,根据情况放入attemptToOrphan
队列;runAttemptToOrphanWorker
取出处理该该孤立的资源;
正文
想要启用GC
,需要在kube-apiserver
和kube-controller-manager
的启动参数中都设置--enable-garbage-collector
为true
,1.13.2
版本中默认开启GC
。
需要注意:两组件该参数必须保持同步。
kube-controller-manager
启动入口,app.NewControllerManagerCommand()
中加载controller manage
r默认启动参数,创建* cobra.Command
对象:
func main() { rand.Seed(time.Now().UnixNano()) //加载controller manager默认启动参数,创建* cobra.Command对象 command := app.NewControllerManagerCommand() //......省略....... //执行cobra.command,并启动controller-manager if err := command.Execute(); err != nil { fmt.Fprintf(os.Stderr, "%v\n", err) os.Exit(1) } }
以下代码处去启动kube-controller-manager
:
NewDefaultComponentConfig(ports.InsecureKubeControllerManagerPort)
加载各个控制器的配置:
//NewKubeControllerManagerOptions使用默认配置创建一个新的KubeControllerManagerOptions func NewKubeControllerManagerOptions() (*KubeControllerManagerOptions, error) { //加载各个控制器的默认配置 componentConfig, err := NewDefaultComponentConfig(ports.InsecureKubeControllerManagerPort) if err != nil { return nil, err } s := KubeControllerManagerOptions{ Generic: cmoptions.NewGenericControllerManagerConfigurationOptions(componentConfig.Generic), //.....省略 GarbageCollectorController: &GarbageCollectorControllerOptions{ ConcurrentGCSyncs: componentConfig.GarbageCollectorController.ConcurrentGCSyncs, EnableGarbageCollector: componentConfig.GarbageCollectorController.EnableGarbageCollector, }, //.....省略 } //gc忽略的资源对象列表 gcIgnoredResources := make([]kubectrlmgrconfig.GroupResource, 0, len(garbagecollector.DefaultIgnoredResources())) for r := range garbagecollector.DefaultIgnoredResources() { gcIgnoredResources = append(gcIgnoredResources, kubectrlmgrconfig.GroupResource{Group: r.Group, Resource: r.Resource}) } s.GarbageCollectorController.GCIgnoredResources = gcIgnoredResources return &s, nil }
// NewDefaultComponentConfig返回kube-controller管理器配置对象 func NewDefaultComponentConfig(insecurePort int32) (kubectrlmgrconfig.KubeControllerManagerConfiguration, error) { scheme := runtime.NewScheme() if err := kubectrlmgrschemev1alpha1.AddToScheme(scheme); err != nil { return kubectrlmgrconfig.KubeControllerManagerConfiguration{}, err } if err := kubectrlmgrconfig.AddToScheme(scheme); err != nil { return kubectrlmgrconfig.KubeControllerManagerConfiguration{}, err } versioned := kubectrlmgrconfigv1alpha1.KubeControllerManagerConfiguration{} //加载默认参数 scheme.Default(&versioned) internal := kubectrlmgrconfig.KubeControllerManagerConfiguration{} if err := scheme.Convert(&versioned, &internal, nil); err != nil { return internal, err } internal.Generic.Port = insecurePort return internal, nil }
// 根据Object,获取提供的默认参数 func (s *Scheme) Default(src Object) { if fn, ok := s.defaulterFuncs[reflect.TypeOf(src)]; ok { fn(src) } }
s.defaulterFuncs类型为map[reflect.Type]func(interface{}),用于根据指针类型获取默认值函数。该map中的数据从哪里来的呢?
代码位于src\k8s.io\kubernetes\pkg\controller\apis\config\v1alpha1\zz_generated.defaults.go
可以看到默认参数中garbage collector中默认开启gc(EnableGarbageCollector),并发数为20(ConcurrentGCSyncs)
func SetDefaults_GarbageCollectorControllerConfiguration(obj *kubectrlmgrconfigv1alpha1.GarbageCollectorControllerConfiguration) { if obj.EnableGarbageCollector == nil { obj.EnableGarbageCollector = utilpointer.BoolPtr(true) } if obj.ConcurrentGCSyncs == 0 { obj.ConcurrentGCSyncs = 20 } }
回到Run函数,里面调用了NewControllerInitializers启动所有控制器:
重点来到启动garbage collector的startGarbageCollectorController函数:
func startGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) { //k8s 1.13.2中默认为true,可在kube-apiserver和kube-controller-manager的启动参数中加--enable-garbage-conllector=false设置 //需保证这两个组件中参数值一致 if !ctx.ComponentConfig.GarbageCollectorController.EnableGarbageCollector { return nil, false, nil } //k8s各种原生资源对象客户端集合(默认启动参数中用SimpleControllerClientBuilder构建) gcClientset := ctx.ClientBuilder.ClientOrDie("generic-garbage-collector") discoveryClient := cacheddiscovery.NewMemCacheClient(gcClientset.Discovery()) //生成rest config config := ctx.ClientBuilder.ConfigOrDie("generic-garbage-collector") dynamicClient, err := dynamic.NewForConfig(config) if err != nil { return nil, true, err } // Get an initial set of deletable resources to prime the garbage collector. //获取一组初始可删除资源以填充垃圾收集器。 deletableResources := garbagecollector.GetDeletableResources(discoveryClient) ignoredResources := make(map[schema.GroupResource]struct{}) //忽略gc的资源类型 for _, r := range ctx.ComponentConfig.GarbageCollectorController.GCIgnoredResources { ignoredResources[schema.GroupResource{Group: r.Group, Resource: r.Resource}] = struct{}{} } garbageCollector, err := garbagecollector.NewGarbageCollector( dynamicClient, ctx.RESTMapper, deletableResources, ignoredResources, ctx.InformerFactory, ctx.InformersStarted, ) if err != nil { return nil, true, fmt.Errorf("Failed to start the generic garbage collector: %v", err) } // Start the garbage collector. //启动参数中默认是20个协程 workers := int(ctx.ComponentConfig.GarbageCollectorController.ConcurrentGCSyncs) //启动monitors和deleteWorkers、orphanWorkers go garbageCollector.Run(workers, ctx.Stop) // Periodically refresh the RESTMapper with new discovery information and sync // the garbage collector. //使用新的发现信息定期刷新RESTMapper并同步垃圾收集器。 go garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop) //gc提供debug dot grap依赖关系图接口 return garbagecollector.NewDebugHandler(garbageCollector), true, nil }
该函数主要作用有:
1、deletableResources := garbagecollector.GetDeletableResources(discoveryClient)获取集群内所有可删除的资源对象;排除掉忽略的资源对象。
2、构建garbageCollector结构体对象;
3、garbageCollector.Run(workers, ctx.Stop)启动一个monitors用来监听资源对象的变化(对应的由runProcessGraphChanges死循环处理),和默认20个deleteWorkers协程处理可删除的资源对象、20个orphanWorkers协程处理孤儿对象。
4、garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop) 定时去获取一个集群内是否有新类型的资源对象的加入,并重新刷新monitors,以监听新类型的资源对象。
5、garbagecollector.NewDebugHandler(garbageCollector)注册debug接口,用来提供获取dot流程图接口:
curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph?uid=11211212edsaddkqedmk12
使用graphviz提供的dot.exe可以生成svg格式的图,可用google浏览器查看如下:
// curl http://127.0.0.1:10252/debug/controllers/garbagecollector/graph?uid=11211212edsaddkqedmk12 func (h *debugHTTPHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) { if req.URL.Path != "/graph" { http.Error(w, "", http.StatusNotFound) return } var graph graph.Directed if uidStrings := req.URL.Query()["uid"]; len(uidStrings) > 0 { uids := []types.UID{} for _, uidString := range uidStrings { uids = append(uids, types.UID(uidString)) } graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraphForObj(uids...) } else { graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraph() } //生成dot流程图数据,用graphviz工具中的dot.exe工具转换为svg图(用google浏览器打开)或者png图 //API参考:https://godoc.org/gonum.org/v1/gonum/graph //graphviz下载地址:https://graphviz.gitlab.io/_pages/Download/Download_windows.html //dot.exe test.dot -T svg -o test.svg data, err := dot.Marshal(graph, "full", "", " ", false) if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError) return } w.Write(data) w.WriteHeader(http.StatusOK) }
kubernetes垃圾回收器GarbageCollector源码分析(一)(2)