负载和 Pod 的关联逻辑

负载和 Pod 的关联逻辑

在 Kubernetes 中,负载指的是 Deployment、StatefulSet 和 DaemonSet 这三种资源:

  • Deployment 用于管理无状态的 Pod,通过 ReplicaSet 进行管理;
  • StatefulSet 用于管理有状态的 Pod;
  • DaemonSet 管理的 Pod 会部署在所有集群节点中。

如何确定哪些 Pod 是由哪个负载进行管理的?这些 Pod 是怎么与负载进行关联的?

Deployment 与 Pod

对 Kubernetes 有所了解的应该都知道,Deployment 通过 ReplicaSet 管理 Pod,一个 Pod 的管理调度流程如下图:

Deployment 处理流程

下面是一个 Helm 部署的 Chaos Mesh 的 Deployment 实例:

Deployment
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
apiVersion: apps/v1
kind: Deployment
metadata:
name: chaos-dashboard
namespace: chaos-testing
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: chaos-dashboard
app.kubernetes.io/instance: chaos-mesh
app.kubernetes.io/name: chaos-mesh
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/component: chaos-dashboard
app.kubernetes.io/instance: chaos-mesh
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: chaos-mesh
app.kubernetes.io/part-of: chaos-mesh
app.kubernetes.io/version: 2.2.0
helm.sh/chart: chaos-mesh-2.2.0
spec:
containers:
- command:
- /usr/local/bin/chaos-dashboard
env:
- name: CLEAN_SYNC_PERIOD
value: 12h
image: 'chaos-mesh/chaos-dashboard:v2.2.0'
imagePullPolicy: IfNotPresent
name: chaos-dashboard
ports:
- containerPort: 2333
name: http
protocol: TCP
- containerPort: 2334
name: metric
protocol: TCP
resources:
requests:
cpu: 25m
memory: 256Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: storage-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: chaos-dashboard
serviceAccountName: chaos-dashboard
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: storage-volume

DeploymentController 代码中可以看到,ReplicaSet 由 Deployment 的 .spec.selector 关联:

pkg/controller/deployment/deployment_controller.go:516
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func (dc *DeploymentController) getReplicaSetsForDeployment(ctx context.Context, d *apps.Deployment) ([]*apps.ReplicaSet, error) {
// List all ReplicaSets to find those we own but that no longer match our
// selector. They will be orphaned by ClaimReplicaSets().
rsList, err := dc.rsLister.ReplicaSets(d.Namespace).List(labels.Everything())
if err != nil {
return nil, err
}
deploymentSelector, err := metav1.LabelSelectorAsSelector(d.Spec.Selector)
if err != nil {
return nil, fmt.Errorf("deployment %s/%s has invalid label selector: %v", d.Namespace, d.Name, err)
}
// If any adoptions are attempted, we should first recheck for deletion with
// an uncached quorum read sometime after listing ReplicaSets (see #42639).
canAdoptFunc := controller.RecheckDeletionTimestamp(func(ctx context.Context) (metav1.Object, error) {
fresh, err := dc.client.AppsV1().Deployments(d.Namespace).Get(ctx, d.Name, metav1.GetOptions{})
if err != nil {
return nil, err
}
if fresh.UID != d.UID {
return nil, fmt.Errorf("original Deployment %v/%v is gone: got uid %v, wanted %v", d.Namespace, d.Name, fresh.UID, d.UID)
}
return fresh, nil
})
cm := controller.NewReplicaSetControllerRefManager(dc.rsControl, d, deploymentSelector, controllerKind, canAdoptFunc)
return cm.ClaimReplicaSets(ctx, rsList)
}

ReplicaSetController 中,Pod 由 ReplicaSet 的 .spec.selector 关联,代码如下:

pkg/controller/replicaset/replica_set.go:679
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
func (rsc *ReplicaSetController) syncReplicaSet(ctx context.Context, key string) error {
// ...略
selector, err := metav1.LabelSelectorAsSelector(rs.Spec.Selector)
if err != nil {
utilruntime.HandleError(fmt.Errorf("error converting pod selector to selector for rs %v/%v: %v", namespace, name, err))
return nil
}

// list all pods to include the pods that don't match the rs`s selector
// anymore but has the stale controller ref.
// TODO: Do the List and Filter in a single pass, or use an index.
allPods, err := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())
if err != nil {
return err
}
// Ignore inactive pods.
filteredPods := controller.FilterActivePods(allPods)

// NOTE: filteredPods are pointing to objects from cache - if you need to
// modify them, you need to copy it first.
filteredPods, err = rsc.claimPods(ctx, rs, selector, filteredPods)
// ...略
}

ReplicaSet 对象如下:

ReplicaSet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
apiVersion: apps/v1
kind: ReplicaSet
metadata:
annotations:
# ...略
creationTimestamp: "2022-09-15T02:55:01Z"
generation: 3
labels:
app.kubernetes.io/component: chaos-dashboard
app.kubernetes.io/instance: chaos-mesh
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: chaos-mesh
app.kubernetes.io/part-of: chaos-mesh
app.kubernetes.io/version: 2.2.0
helm.sh/chart: chaos-mesh-2.2.0
pod-template-hash: 5f97f6658f
name: chaos-dashboard-5f97f6658f
namespace: chaos-testing
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: chaos-dashboard
uid: 5d58936a-7623-4129-9309-a49764400901
resourceVersion: "1294668979"
uid: 2578a179-3262-48bf-8f73-c9b7efa3f70e
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: chaos-dashboard
app.kubernetes.io/instance: chaos-mesh
app.kubernetes.io/name: chaos-mesh
pod-template-hash: 5f97f6658f
template:
# ...略

在 ReplicaSet 的 .spec.selector 中,比 Deployment 多出了一个 pod-template-hash,这个 label 用于区别不同 ReplicaSet 管理的不同 Pod。

ReplicaSet 依靠 spec.template 来区分不同版本的 ReplicaSet,每个 ReplicaSet 都对 .spec.template 的内容的进行 hash 运算,并使用 hash 值来进行区分。当 Deployment 修改了自己 .spec.template 的内容后,会基于新的内容来创建新的 ReplicaSet 并进行滚动更新。

对应的 Pod 如下:

Pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2023-05-01T17:09:03Z"
generateName: chaos-dashboard-5f97f6658f-
labels:
app.kubernetes.io/component: chaos-dashboard
app.kubernetes.io/instance: chaos-mesh
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: chaos-mesh
app.kubernetes.io/part-of: chaos-mesh
app.kubernetes.io/version: 2.2.0
helm.sh/chart: chaos-mesh-2.2.0
pod-template-hash: 5f97f6658f
name: chaos-dashboard-5f97f6658f-fjqvp
namespace: chaos-testing
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: chaos-dashboard-5f97f6658f
uid: 2578a179-3262-48bf-8f73-c9b7efa3f70e
resourceVersion: "1294668970"
uid: db800641-28d9-4f15-ac0a-41ce7fec419c
spec:
containers:
# ...略

注意看 ownerReferences 字段,内容指向了管理当前对象的资源信息,这个字段对向上查找时很有帮助。

StatefulSet 与 Pod

与 Deployment 不同的是,StatefulSet 不存在 ReplicaSet 这样的中间层,StatefulSet 直接关联的 Pod。

StatefulSet 使用 .spec.selector 与 Pod 直接关联,代码如下:

pkg/controller/statefulset/stateful_set.go:445
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func (ssc *StatefulSetController) sync(ctx context.Context, key string) error {
// ...略

selector, err := metav1.LabelSelectorAsSelector(set.Spec.Selector)
if err != nil {
utilruntime.HandleError(fmt.Errorf("error converting StatefulSet %v selector: %v", key, err))
// This is a non-transient error, so don't retry.
return nil
}

if err := ssc.adoptOrphanRevisions(ctx, set); err != nil {
return err
}

pods, err := ssc.getPodsForStatefulSet(ctx, set, selector)
if err != nil {
return err
}

return ssc.syncStatefulSet(ctx, set, pods)
}

DaemonSet 与 Pod

DaemonSet 和 StatefulSet 相同,使用 .spec.selector 字段与 Pod 直接关联:

pkg/controller/daemon/daemon_controller.go:719
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
func (dsc *DaemonSetsController) getDaemonPods(ctx context.Context, ds *apps.DaemonSet) ([]*v1.Pod, error) {
selector, err := metav1.LabelSelectorAsSelector(ds.Spec.Selector)
if err != nil {
return nil, err
}

// List all pods to include those that don't match the selector anymore but
// have a ControllerRef pointing to this controller.
pods, err := dsc.podLister.Pods(ds.Namespace).List(labels.Everything())
if err != nil {
return nil, err
}
// If any adoptions are attempted, we should first recheck for deletion with
// an uncached quorum read sometime after listing Pods (see #42639).
dsNotDeleted := controller.RecheckDeletionTimestamp(func(ctx context.Context) (metav1.Object, error) {
fresh, err := dsc.kubeClient.AppsV1().DaemonSets(ds.Namespace).Get(ctx, ds.Name, metav1.GetOptions{})
if err != nil {
return nil, err
}
if fresh.UID != ds.UID {
return nil, fmt.Errorf("original DaemonSet %v/%v is gone: got uid %v, wanted %v", ds.Namespace, ds.Name, fresh.UID, ds.UID)
}
return fresh, nil
})

// Use ControllerRefManager to adopt/orphan as needed.
cm := controller.NewPodControllerRefManager(dsc.podControl, ds, selector, controllerKind, dsNotDeleted)
return cm.ClaimPods(ctx, pods)
}

总结

在 Kubernetes 中,不存在直接通过 Deployment、StatefulSet 或 DaemonSet name 的方式查询管理的 Pod,只能先查出工作负载对象,再通过对象中的 .spec.selector 信息再去查询 Pod。

Deployment 由于存在管理部署版本的中间层 ReplicaSet,所以 Deployment 的 .spec.selector 查询到的是它全部版本 ReplicaSet 的 Pod,要精细到单个版本的 Pod 是需要使用 ReplicaSet 的 .spec.selector 来查询。

作者

Jakes Lee

发布于

2023-05-04

更新于

2023-05-04

许可协议

评论