<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>iMoe Tech</title>
  
  <subtitle>站在阳光下，享受我单薄的青春~</subtitle>
  <link href="https://blog.imoe.tech/atom.xml" rel="self"/>
  
  <link href="https://blog.imoe.tech/"/>
  <updated>2025-07-15T14:15:13.083Z</updated>
  <id>https://blog.imoe.tech/</id>
  
  <author>
    <name>Jakes Lee</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Kubernetes Operator 版本化</title>
    <link href="https://blog.imoe.tech/2025/06/23/operator-versioning/"/>
    <id>https://blog.imoe.tech/2025/06/23/operator-versioning/</id>
    <published>2025-06-23T13:39:33.000Z</published>
    <updated>2025-07-15T14:15:13.083Z</updated>
    
    <content type="html"><![CDATA[<p>Operator 的版本升级通常在考虑的是 Operator 自定义的 CRD 对象和存储数据的升级。</p><p>当 Operator 版本升级可能会需要对 CRD 结构进行升级，同时随着 Operator 开发完成，CRD 也会调整版本号为更正式的版本，这时候就要考虑对升级前的数据进行兼容，而如果 CRD 结构有调整也要考虑进行数据的迁移。</p><p>但在此之前，我们可以看一下 Kubernetes 中对于 CRD 版本的一些定义。</p><span id="more"></span><h2 id="Kubernetes-中-CRD-的版本化"><a href="#Kubernetes-中-CRD-的版本化" class="headerlink" title="Kubernetes 中 CRD 的版本化"></a>Kubernetes 中 CRD 的版本化</h2><p>Kubernetes 中 <code>CustomResourceDefinition</code> 版本在 <code>spec.versions</code> 中进行设置，支持同时配置多个版本，字段配置如下：</p><figure class="highlight yaml"><figcaption><span>CRD 声明的版本</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apiextensions.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">CustomResourceDefinition</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">crontabs.example.com</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">group:</span> <span class="string">example.com</span></span><br><span class="line">  <span class="attr">names:</span></span><br><span class="line">    <span class="attr">plural:</span> <span class="string">crontabs</span></span><br><span class="line">    <span class="attr">singular:</span> <span class="string">crontab</span></span><br><span class="line">    <span class="attr">kind:</span> <span class="string">CronTab</span></span><br><span class="line">  <span class="attr">scope:</span> <span class="string">Namespaced</span></span><br><span class="line">  <span class="attr">versions:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">v1alpha1</span></span><br><span class="line">    <span class="attr">served:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">storage:</span> <span class="literal">false</span></span><br><span class="line">    <span class="attr">deprecated:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">deprecationWarning:</span> <span class="string">&quot;example.com/v1alpha1 CronTab is deprecated; see http://example.com/v1alpha1-v1 for instructions to migrate to example.com/v1 CronTab&quot;</span></span><br><span class="line">    </span><br><span class="line">    <span class="attr">schema:</span> <span class="string">...</span></span><br></pre></td></tr></table></figure><p>字段说明：</p><ul><li><code>served</code>：是否启用，如果为 false，请求该接口会报 404；</li><li><code>storage</code>：是否为存储版本，一个 CRD 只能有一个存储版本，其它的版本要转换成存储版本；</li><li><code>deprecated</code>：是否为弃用版本；</li><li><code>deprecationWarning</code>：弃用版本告警，当操作弃用版本时，API server 会返回对应告警。</li></ul><p>当存在多个版本时，只能有一个版本设置为存储版本，其它的必需设置为 <code>false</code>，同时需要配置转换接口，让 API server 知道如何进行接口转换：</p><figure class="highlight yaml"><figcaption><span>CRD 版本转换配置</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apiextensions.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">CustomResourceDefinition</span></span><br><span class="line"><span class="string">...</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="string">...</span></span><br><span class="line">  <span class="attr">conversion:</span></span><br><span class="line">    <span class="attr">strategy:</span> <span class="string">Webhook</span></span><br><span class="line">    <span class="attr">webhook:</span></span><br><span class="line">  <span class="attr">conversionReviewVersions:</span> [<span class="string">&quot;v1&quot;</span>, <span class="string">&quot;v1beta1&quot;</span>]</span><br><span class="line">      <span class="attr">clientConfig:</span></span><br><span class="line">        <span class="attr">service:</span></span><br><span class="line">          <span class="attr">namespace:</span> <span class="string">my-service-namespace</span></span><br><span class="line">          <span class="attr">name:</span> <span class="string">my-service-name</span></span><br><span class="line">          <span class="attr">path:</span> <span class="string">/my-path</span></span><br><span class="line">          <span class="attr">port:</span> <span class="number">1234</span></span><br><span class="line">        <span class="attr">caBundle:</span> <span class="string">&quot;Ci0tLS0tQk...&lt;base64-encoded PEM bundle&gt;...tLS0K&quot;</span></span><br></pre></td></tr></table></figure><h3 id="理解存储版本（Stored-Version）"><a href="#理解存储版本（Stored-Version）" class="headerlink" title="理解存储版本（Stored Version）"></a>理解存储版本（Stored Version）</h3><p>存储版本既存储在 ETCD 中的版本，对于一个 CRD 来说，只有一个版本会存储到 ETCD 中。</p><p>而 Kubernetes API server 可以同时为多个版本提供服务，这主要是基于 conversion 接口实现的。当操作非存储版本时，API server 会调用 conversion 接口将对象进行转换再返回或存储。</p><p>kubectl 等 client 会通过 discovery API 获取资源的版本列表并决定请求哪个版本的接口。对于 CRD 来说，kubectl 会<a href="https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#version-priority">选择最新的</a>稳定版进行请求。</p><p>如果要请求非默认版本，需要按以下方式请求：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">kubectl get resource.version.group</span><br><span class="line"><span class="comment"># 如</span></span><br><span class="line">kubectl get cronjobs.v1.batch.tutorial.kubebuilder.io -o yaml</span><br></pre></td></tr></table></figure><h3 id="转换-API-的请求响应结构"><a href="#转换-API-的请求响应结构" class="headerlink" title="转换 API 的请求响应结构"></a>转换 API 的请求响应结构</h3><p>通过设置 <code>conversionReviewVersions</code> 版本列表，可以配置 Webhook 支持的版本转换。如果接到支持的请求，会向 API 发送 <code>ConversionReview</code> 对象：</p><figure class="highlight json"><figcaption><span>转换请求体</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;apiVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;apiextensions.k8s.io/v1&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ConversionReview&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;request&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    # Random uid uniquely identifying this conversion call</span><br><span class="line">    <span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;705ab4f5-6393-11e8-b7cc-42010a800002&quot;</span><span class="punctuation">,</span></span><br><span class="line">    </span><br><span class="line">    # The API group and version the objects should be converted to</span><br><span class="line">    <span class="attr">&quot;desiredAPIVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;example.com/v1&quot;</span><span class="punctuation">,</span></span><br><span class="line">    </span><br><span class="line">    # The list of objects to convert.</span><br><span class="line">    # May contain one or more objects<span class="punctuation">,</span> in one or more versions.</span><br><span class="line">    <span class="attr">&quot;objects&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">      <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;CronTab&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;apiVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;example.com/v1beta1&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;metadata&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">          <span class="attr">&quot;creationTimestamp&quot;</span><span class="punctuation">:</span> <span class="string">&quot;2019-09-04T14:03:02Z&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;name&quot;</span><span class="punctuation">:</span> <span class="string">&quot;local-crontab&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;namespace&quot;</span><span class="punctuation">:</span> <span class="string">&quot;default&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;resourceVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;143&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;3415a7fc-162b-4300-b5da-fd6083580d66&quot;</span></span><br><span class="line">        <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;hostPort&quot;</span><span class="punctuation">:</span> <span class="string">&quot;localhost:1234&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">]</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>转换成功的 API 需要响应如下类似数据：</p><figure class="highlight json"><figcaption><span>转换响应</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;apiVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;apiextensions.k8s.io/v1&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ConversionReview&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;response&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    # must match &lt;request.uid&gt;</span><br><span class="line">    <span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;705ab4f5-6393-11e8-b7cc-42010a800002&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;result&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">      <span class="attr">&quot;status&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Success&quot;</span></span><br><span class="line">    <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">    # Objects must match the order of request.objects<span class="punctuation">,</span> and have apiVersion set to &lt;request.desiredAPIVersion&gt;.</span><br><span class="line">    # kind<span class="punctuation">,</span> metadata.uid<span class="punctuation">,</span> metadata.name<span class="punctuation">,</span> and metadata.namespace fields must not be changed by the webhook.</span><br><span class="line">    # metadata.labels and metadata.annotations fields may be changed by the webhook.</span><br><span class="line">    # All other changes to metadata fields by the webhook are ignored.</span><br><span class="line">    <span class="attr">&quot;convertedObjects&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">      <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;CronTab&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;apiVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;example.com/v1&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;metadata&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">          <span class="attr">&quot;creationTimestamp&quot;</span><span class="punctuation">:</span> <span class="string">&quot;2019-09-04T14:03:02Z&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;name&quot;</span><span class="punctuation">:</span> <span class="string">&quot;local-crontab&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;namespace&quot;</span><span class="punctuation">:</span> <span class="string">&quot;default&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;resourceVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;143&quot;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;3415a7fc-162b-4300-b5da-fd6083580d66&quot;</span></span><br><span class="line">        <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;host&quot;</span><span class="punctuation">:</span> <span class="string">&quot;localhost&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;port&quot;</span><span class="punctuation">:</span> <span class="string">&quot;1234&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">]</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>响应的内容要注意：</p><ol><li><code>uid</code> 要和请求时一样；</li><li><code>kind</code>、<code>metadata.uid</code>、<code>metadata.name</code> 和 <code>metadata.namespace</code> 必需和请求时一样；</li><li><code>metadata.labels</code> 和 <code>metadata.annotations</code> 可以修改；</li><li><code>metadata</code> 其它字段的修改会被忽略；</li><li><code>convertedObjects</code> 响应的对象顺序要和请求时一样，且  <code>apiVersion</code> 和请求的 <code>desiredAPIVersion</code> 一样。</li></ol><h3 id="简化转换方法数量"><a href="#简化转换方法数量" class="headerlink" title="简化转换方法数量"></a>简化转换方法数量</h3><p>如果只有 v1 和 v2 版本，那么只需要开发两个方向的转换方法。但是，如果有 4 个，甚至是 8 个版本时，那转换版本的方法就已经是非常难以维护的了。</p><p>当前 controller-runtime 进行 conversion 时，使用的是 Hub and Spoke 模型，得以简化版本转换的维护成本。</p><p>Hub and Spoke 可以将网状转换结构转换成星型结构：</p><p><img src="//images.imoe.tech/blog/sDadpr.png" alt="网状转换结构转换成星型结构"></p><p>将一个版本指定成 Hub 版本，当其它非 Hub 版本间转换时，需要先转换成 Hub 版本，再转换成其它版本。</p><p><img src="//images.imoe.tech/blog/dhgx9G.png" alt="Hub and Spoke"></p><p>这样可以减少我们需要定义的转换函数数量，并且 Kubernetes 内部的实现也是这样的。</p><h2 id="Operator-版本迭代"><a href="#Operator-版本迭代" class="headerlink" title="Operator 版本迭代"></a>Operator 版本迭代</h2><p>Operator 是基于 controller-runtime 进行开发的，使用 kubebuilder 的工具可以快速生成上面的 Hub and Spoke 方法。</p><p>首先在之前的 Demo 工程中，创建新的接口版本：</p><figure class="highlight bash"><figcaption><span>创建新版本接口</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">operator-sdk create api --version v1beta2 --kind DemoApplication</span><br></pre></td></tr></table></figure><p>我们计划选择 v1beta2 版本作为 Hub 版本，所以在生成的 types 中增加注释：</p><figure class="highlight go"><figcaption><span>备注存储版本</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// +kubebuilder:storageversion</span></span><br></pre></td></tr></table></figure><p>表示该结构是存储版本：</p><figure class="highlight go"><figcaption><span>新版本设置示例</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// +kubebuilder:object:root=true  </span></span><br><span class="line"><span class="comment">// +kubebuilder:subresource:status  </span></span><br><span class="line"><span class="comment">// +kubebuilder:storageversion  </span></span><br><span class="line">  </span><br><span class="line"><span class="comment">// DemoApplication is the Schema for the demoapplications API</span></span><br><span class="line"><span class="keyword">type</span> DemoApplication <span class="keyword">struct</span> &#123;  </span><br><span class="line">    metav1.TypeMeta   <span class="string">`json:&quot;,inline&quot;`</span>  </span><br><span class="line">    metav1.ObjectMeta <span class="string">`json:&quot;metadata,omitempty&quot;`</span>  </span><br><span class="line">  </span><br><span class="line">    Spec   DemoApplicationSpec   <span class="string">`json:&quot;spec,omitempty&quot;`</span>  </span><br><span class="line">    Status DemoApplicationStatus <span class="string">`json:&quot;status,omitempty&quot;`</span>  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>执行 <code>make</code> 命令生成代码和 manifests 配置。</p><p>接着执行下面命令生成 webhook 相关代码：</p><figure class="highlight shell"><figcaption><span>生成 webhook</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">operator-sdk create webhook --version v1beta2 --kind DemoApplication --conversion</span><br></pre></td></tr></table></figure><ul><li><code>--version</code>：在哪个版本下生成</li><li><code>--conversion</code>：创建 conversion 代码</li></ul><p>执行结果如下：</p><p><img src="//images.imoe.tech/blog/K6LLhK.png" alt="执行结果"></p><p>可以看到如下提示：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">You need to implement the conversion.Hub and conversion.Convertible interfaces for your CRD types. </span><br></pre></td></tr></table></figure><p>我们修改 <code>api/v1beta2/demoapplication_types.go</code>，增加一行代码：</p><figure class="highlight go"><figcaption><span>在新存储版本实现 conversion.Hub 接口</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *DemoApplication)</span></span> Hub() &#123;&#125;</span><br></pre></td></tr></table></figure><p>修改 <code>api/v1beta1/demoapplication_types.go</code> 增加 Spoke 相关实现：</p><figure class="highlight go"><figcaption><span>在旧版本实现转换到 Hub 的代码</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(a *DemoApplication)</span></span> ConvertTo(dst conversion.Hub) <span class="type">error</span> &#123;</span><br><span class="line">v1beta2App := dst.(*v1beta2.DemoApplication)</span><br><span class="line">v1beta2App.ObjectMeta = a.ObjectMeta</span><br><span class="line">v1beta2App.Spec.Image = a.Spec.Image</span><br><span class="line">v1beta2App.Spec.Size = a.Spec.Replica</span><br><span class="line">v1beta2App.Spec.Ports = a.Spec.Ports</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(a *DemoApplication)</span></span> ConvertFrom(src conversion.Hub) <span class="type">error</span> &#123;</span><br><span class="line">v1beta2App := src.(*v1beta2.DemoApplication)</span><br><span class="line">a.ObjectMeta = v1beta2App.ObjectMeta</span><br><span class="line">a.Spec.Image = v1beta2App.Spec.Image</span><br><span class="line">a.Spec.Replica = v1beta2App.Spec.Size</span><br><span class="line">a.Spec.Ports = v1beta2App.Spec.Ports</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>使用以下命令生成部署 YAML 文件进行调试，看哪些配置需要调整：</p><figure class="highlight shell"><figcaption><span>执行构建</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">make build-installer</span><br></pre></td></tr></table></figure><h3 id="修改-manifest-配置"><a href="#修改-manifest-配置" class="headerlink" title="修改 manifest 配置"></a>修改 manifest 配置</h3><h4 id="启用-webhook"><a href="#启用-webhook" class="headerlink" title="启用 webhook"></a>启用 webhook</h4><p>开启 webhook 功能需要调整以下配置：</p><ul><li>启用 <code>config/crd/kustomization.yaml</code> 文件里的 <code>patches/webhook_in_&lt;kind&gt;.yaml</code>；<blockquote><p>注入  webhook 配置到 CRD 文件中，默认调用 <code>operater-sdk create webhook</code> 时这个会自动添加并启用</p></blockquote></li><li>启用 <code>config/default/kustomization.yaml</code> 里的 <code>../webhook</code> 和 <code>manager_webhook_patch.yaml</code><blockquote><p>注入证书到 controller，不启用会报 <code>serving-certs/tls.crt: no such file</code> 而无法启动</p></blockquote></li><li>注释 <code>config/webhook/kustomization.yaml</code> 里的 <code>- manifests.yaml</code> 配置。<blockquote><p><code>manifests.yaml</code> 配置在启用 Admission WebHook 的时候才会生成，不注释的话生成 <code>installer.yaml</code> 会报错。</p></blockquote></li></ul><p>单单启用 webhook 还是不可用的。Operator 使用的自签名证书并不被 API server 所信任，需要使用一个叫 cert-manager 的组件给我们的应用颁发证书。</p><p>cert-manager 组件会注入 CA 到 API server 中，所以 API server 请求 conversion webhook 时就不再报证书错误。</p><h4 id="启用-cert-manager"><a href="#启用-cert-manager" class="headerlink" title="启用 cert-manager"></a>启用 cert-manager</h4><p>一般集群里都已经安装好了这个组件，如果没安装可以执行以下命令直接安装：</p><figure class="highlight shell"><figcaption><span>安装 cert-manager</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.0/cert-manager.yaml</span><br></pre></td></tr></table></figure><p>默认情况下，certmanager 的配置被禁用了，我们需要到 manifests 里手动打开：</p><ul><li>启用 <code>config/crd/kustomization.yaml</code> 文件里的 <code>patches/cainjection_in_&lt;kind&gt;.yaml</code>；</li><li>启用 <code>config/default/kustomization.yaml</code> 文件里的 <code>./certmanager</code> 目录（创建证书和 CA）；</li><li>启用 <code>config/default/kustomization.yaml</code> 文件里 <code>CERTMANAGER</code> 块下的所有变量（注入 CRD、）。</li></ul><p>注意，如果报以下错，说明错误地启用了 admission webhooks 的 CA 注入：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Error: no resource matches strategic merge patch &quot;MutatingWebhookConfiguration.v1.admissionregistration.k8s.io/mutating-webhook-configuration.[noNs]&quot;: no matches for Id MutatingWebhookConfiguration.v1.admissionregistration.k8s.io/mutating-webhook-configuration.[noNs]; failed to find unique target for patch MutatingWebhookConfiguration.v1.admissionregistration.k8s.io/mutating-webhook-configuration.[noNs]</span><br><span class="line">make: *** [build-installer] Error 1</span><br></pre></td></tr></table></figure><p>因为在 webhook 生成时是没启用 admission webhooks 代码生成的，所以并没有对应的 webhook 配置用于 patch，就会报找不到错误。</p><p>虽然没有生成 admission webhooks 配置，但是注入的开关和模板代码是提前生成了。我们把下面配置注释掉既可：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="bullet">-</span> <span class="attr">path:</span> <span class="string">webhookcainjection_patch.yaml</span></span><br></pre></td></tr></table></figure><h3 id="部署测试"><a href="#部署测试" class="headerlink" title="部署测试"></a>部署测试</h3><figure class="highlight shell"><figcaption><span>部署测试</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export IMG=registry-c.cmft.com/cmhk-grd-paas-portal/demo-app:5</span><br><span class="line">make docker-build deploy</span><br></pre></td></tr></table></figure><h2 id="旧数据的迁移"><a href="#旧数据的迁移" class="headerlink" title="旧数据的迁移"></a>旧数据的迁移</h2><p>当部署新版本的 CRD 后，在集群中一般会同时存在多个版本，而只能有一个版本是存储版本。</p><p>对于前面的示例来说，我们可以看到，存在 <code>v1beta1</code> 和 <code>v1beta2</code> 两个版本，而 <code>v1beta2</code> 是存储版本：</p><figure class="highlight yaml"><figcaption><span>CRD 最新配置</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apiextensions.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">CustomResourceDefinition</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">demoapplications.paas.cmft.com</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">versions:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">v1beta1</span></span><br><span class="line">    <span class="attr">served:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">storage:</span> <span class="literal">false</span></span><br><span class="line">    <span class="attr">subresources:</span></span><br><span class="line">      <span class="attr">status:</span> &#123;&#125;</span><br><span class="line">  <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">v1beta2</span></span><br><span class="line">    <span class="attr">served:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">storage:</span> <span class="literal">true</span></span><br></pre></td></tr></table></figure><p>那部署新版本后，旧数据怎样了呢？</p><article class="message is-info">        <div class="message-header"><p><i class="far fa-edit mr-2"></i>Tips</p></div>        <div class="message-body">            <p>部署新版本后，旧数据的存储不会变化，只有对数据进行了操作才会以新的存储版本进行重新保存。</p>        </div>    </article><p>所以我们可以通过以下命令看到 CRD 同时存在两个存储版本在使用：</p><figure class="highlight shell"><figcaption><span>查看当前在使用的存储版本</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_">$ </span><span class="language-bash">kubectl get crd demoapplications.paas.cmft.com -ojson | jq .status.storedVersions</span></span><br><span class="line">[</span><br><span class="line">  &quot;v1beta1&quot;,</span><br><span class="line">  &quot;v1beta2&quot;</span><br><span class="line">]</span><br></pre></td></tr></table></figure><h3 id="为什么要迁移"><a href="#为什么要迁移" class="headerlink" title="为什么要迁移"></a>为什么要迁移</h3><p>前面提到，当前 controller-runtime 使用的 Hub and Spoke 模型来管理各版本 CRD 的转换。</p><p>当 CRD 的结构随版本变化而变化时，维护转换函数的成本会变得越来越大。旧版本应该逐渐弃用，然后移除。所以应该像 Kubernetes 一样，当弃用或移除接口时，自动将旧数据迁移到接的存储版本中。</p><h3 id="迁移方法"><a href="#迁移方法" class="headerlink" title="迁移方法"></a>迁移方法</h3><p>目前 Kubernetes 建议两种迁移方案。</p><h4 id="使用-Storage-Version-Migrator-工具"><a href="#使用-Storage-Version-Migrator-工具" class="headerlink" title="使用 Storage Version Migrator 工具"></a>使用 Storage Version Migrator 工具</h4><p>Storage Version Migrator 由两个控制器组成：</p><ol><li><code>trigger controller</code>：每 10 分钟调用 discovery 接口获取一次，检测默认存储版本是否变化，如果有变化就给对应 Kind 创建 StorageVersionMigration；</li><li><code>migration controller</code>：负责处理 StorageVersionMigration，当有新的 kind 需要迁移时，<code>migration controller</code> 会将对象全部读取出来再原样写回 API server，触发 API server 使用最新的存储版本进行保存。</li></ol><p>Storage Version Migrator 在 Kubernetes 1.30 中是 alpha 状态，小于这个版本的需要手动安装。</p><p>本地构建：</p><figure class="highlight shell"><figcaption><span>构建 SVM 镜像</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">make all-images</span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">如果不远程部署可以不推</span></span><br><span class="line">make push-all</span><br></pre></td></tr></table></figure><p>执行以下命令部署到集群：</p><figure class="highlight shell"><figcaption><span>部署 SVM 到集群</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">export REGISTRY=registry-c.cmft.com/cmhk-grd-paas-portal</span><br><span class="line">make local-manifests</span><br><span class="line">pushd manifests.local &amp;&amp; kubectl apply -k ./ &amp;&amp; popd</span><br></pre></td></tr></table></figure><h4 id="手动迁移"><a href="#手动迁移" class="headerlink" title="手动迁移"></a>手动迁移</h4><p>手动迁移的动作和 Storage Version Migrator 差不多，这里假定将 <code>v1beta1</code> 升级到 <code>v1</code>：</p><ol><li>CRD 将新版本设置为存储版本，此时 <code>status.storedVersions</code> 为 <code>v1beta1</code> 和 <code>v1</code>；</li><li>写一个脚本，读取所有的并直接写回 API server，强制以新的存储版本重新保存；</li><li>迁移完成后，手动从 <code>status.storedVersions</code> 中删除 <code>v1beta1</code>。</li></ol><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>当开发新版本时，如果<strong>涉及字段变化</strong>需要开发 conversion webhook 接口提供给 API server 调用进行转换。如果字段无变化，API server 无需基于 conversion webhook 就可以自动完成转换。</p><p>controller-runtime 使用 Hub and Spoke 的模型实现接口转换，只需要实现 Spoke 和 Hub 间的转换方法既可，Spoke 间的转换会经过 Hub 版本进行转换。</p><p>启用 conversion webhook 接口需要同时启用 cert-manager 组件，否则 API server 调用时会报证书错误。</p><p>修改存储版本后，旧数据不会自动进行迁移，只有在数据更新的时候才会重新以新的存储版本保存。</p><p>统一迁移旧数据可以使用 Storage Version Migrator 工具或写个脚本实现。只需要将数据重新写回 API server 就可以完成迁移。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/">Versions in CustomResourceDefinitions</a></li><li><a href="https://book.kubebuilder.io/multiversion-tutorial/tutorial">Tutorial: Multi-Version API</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;Operator 的版本升级通常在考虑的是 Operator 自定义的 CRD 对象和存储数据的升级。&lt;/p&gt;
&lt;p&gt;当 Operator 版本升级可能会需要对 CRD 结构进行升级，同时随着 Operator 开发完成，CRD 也会调整版本号为更正式的版本，这时候就要考虑对升级前的数据进行兼容，而如果 CRD 结构有调整也要考虑进行数据的迁移。&lt;/p&gt;
&lt;p&gt;但在此之前，我们可以看一下 Kubernetes 中对于 CRD 版本的一些定义。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="Operator" scheme="https://blog.imoe.tech/tags/Operator/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes Operator 初体验</title>
    <link href="https://blog.imoe.tech/2025/03/20/start-to-operator/"/>
    <id>https://blog.imoe.tech/2025/03/20/start-to-operator/</id>
    <published>2025-03-20T12:42:48.000Z</published>
    <updated>2025-07-15T14:15:13.087Z</updated>
    
    <content type="html"><![CDATA[<p><img src="//images.imoe.tech/blog/s3uRem.png" title="Operator SDK Workflow"></p><p>Kubernetes 1.7 版本以来就引入了<a href="https://kubernetes.io/docs/concepts/api-extension/custom-resources/">自定义控制器</a>的概念，该功能可以让开发人员扩展添加新功能，更新现有的功能，并且可以自动执行一些管理任务。</p><p>Operator 是由 CoreOS 开发的，用来扩展 Kubernetes API 的控制器框架，它用来创建、配置和管理复杂的有状态应用，如数据库、缓存和监控系统。</p><p>Operator 基于 Kubernetes 的资源和控制器概念之上构建，但同时又包含了应用程序特定的领域知识。</p><p>这些自定义的控制器就<strong>像 Kubernetes 原生的组件一样</strong>，Operator 直接使用 Kubernetes API 进行开发，也就是说可以根据这些控制器内部编写的自定义规则来监控集群、更改 Pods/Services、对正在运行的应用进行扩缩容。</p><p>创建 Operator 的关键是 CRD（自定义资源）的设计。本文将通过虚拟需求，设计 CRD 并实现 CRD 的控制逻辑，以体验 Operator 的开发过程。</p><span id="more"></span><h2 id="Operator-Framework"><a href="#Operator-Framework" class="headerlink" title="Operator Framework"></a>Operator Framework</h2><p>Operator Framework 是 CoreOS 开源的一个用于快速开发 Operator 的工具包，该框架包含两个主要的部分：</p><ul><li>Operator SDK: 无需了解复杂的 Kubernetes API 特性，即可让你根据你自己的专业知识构建一个 Operator 应用。</li><li>Operator Lifecycle Manager OLM: 帮助你安装、更新和管理跨集群的运行中的所有 Operator（以及他们的相关服务）</li></ul><h3 id="工作流程"><a href="#工作流程" class="headerlink" title="工作流程"></a>工作流程</h3><p>Operator SDK 提供以下工作流来开发一个新的 Operator：</p><ol><li>使用 SDK 创建一个新的 Operator 项目</li><li>通过添加自定义资源（CRD）定义新的资源 API</li><li>指定使用 SDK API 来 watch 的资源</li><li>定义 Operator 的协调（reconcile）逻辑</li><li>使用 Operator SDK 构建并生成 Operator 部署清单文件</li></ol><h2 id="开发一个程序"><a href="#开发一个程序" class="headerlink" title="开发一个程序"></a>开发一个程序</h2><p>部署一个简单的 Web 服务到 Kubernetes 集群中的时候，都需要：</p><ul><li>编写一个 Deployment 的控制器；</li><li>创建一个 Service 对象，通过 Pod 的 label 标签进行关联；</li><li>通过 Ingress 或者 <code>type=NodePort</code> 类型的 Service 来暴露服务。</li></ul><p>每次都需要这样操作，略显麻烦。</p><p>可以创建一个自定义的资源对象，通过自定义的 CRD 来描述要部署的应用信息，比如镜像、服务端口、环境变量等等。</p><p>创建自定义类型的资源对象的时候，通过控制器去创建对应的 Deployment 和 Service，是不是就方便很多了，相当于用一个资源清单去描述了 Deployment 和 Service 要做的两件事情。</p><figure class="highlight yaml"><figcaption><span>CRD 的设计</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">paas.cmft.com/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">DemoApplication</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx-demo</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">replica:</span> <span class="number">2</span></span><br><span class="line">  <span class="attr">image:</span> <span class="string">nginx:latest</span></span><br><span class="line">  <span class="attr">ports:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">port:</span> <span class="number">80</span></span><br><span class="line">      <span class="attr">targetPort:</span> <span class="number">80</span></span><br><span class="line">      <span class="attr">nodePort:</span> <span class="number">30002</span></span><br></pre></td></tr></table></figure><p>通过这里的自定义的 <code>DemoApplication</code> 资源对象去创建副本数为 2 的 Pod，然后通过 nodePort=30002 的端口去暴露服务</p><h3 id="环境准备"><a href="#环境准备" class="headerlink" title="环境准备"></a>环境准备</h3><p>需要准备的环境：</p><ul><li>Go 语言开发环境；</li><li>operator-sdk 工具；</li><li>Kubernetes 环境。</li></ul><p>安装 operator-sdk：</p><figure class="highlight shell"><figcaption><span>安装命令</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">brew install operator-sdk</span><br></pre></td></tr></table></figure><h3 id="创建项目"><a href="#创建项目" class="headerlink" title="创建项目"></a>创建项目</h3><p>执行以下命令创建项目脚手架：</p><figure class="highlight shell"><figcaption><span>生成项目框架</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">operator-sdk init --plugins=go/v4 --domain=paas.cmft.com --repo code-inc.cmft.com/CMHK/GRD-PAAS-PORTAL/demo-operator.git</span><br></pre></td></tr></table></figure><ul><li><code>--domain</code>：指定 API Group</li><li><code>--repo</code>：指定 Go 模块名</li></ul><h3 id="创建-API"><a href="#创建-API" class="headerlink" title="创建 API"></a>创建 API</h3><p>默认新创建的项目只有一些框架文件，需要为自定义资源添加一个新的 API：</p><figure class="highlight shell"><figcaption><span>生成 API</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">operator-sdk create api --version v1beta1 --kind DemoApplication</span><br></pre></td></tr></table></figure><p>输出如下：</p><p><img src="//images.imoe.tech/blog/9ikra8.png" title="创建 API 输出"></p><h3 id="调整-API"><a href="#调整-API" class="headerlink" title="调整 API"></a>调整 API</h3><p>生成的 API 文件存放在 <code>api/v1beta1/demoapplication_types.go</code>，需要根据需求去自定义结构体 <code>DemoApplicationSpec</code> 的结构，比如对应上面 Demo 的结构：</p><figure class="highlight go"><figcaption><span>实现 CRD 参数</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> DemoApplicationSpec <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster</span></span><br><span class="line"><span class="comment">// Important: Run &quot;make&quot; to regenerate code after modifying this file</span></span><br><span class="line"></span><br><span class="line">Image   <span class="type">string</span>               <span class="string">`json:&quot;image,omitempty&quot;`</span></span><br><span class="line">Replica *<span class="type">int32</span>               <span class="string">`json:&quot;replica,omitempty&quot;`</span></span><br><span class="line">Ports   []corev1.ServicePort <span class="string">`json:&quot;ports,omitempty&quot;`</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>修改完后，运行 <code>make</code> 命令重新生成一些代码。</p><h3 id="增加业务逻辑"><a href="#增加业务逻辑" class="headerlink" title="增加业务逻辑"></a>增加业务逻辑</h3><p>我们的业务逻辑需要在 <code>internal/controller/demoapplication_controller.go</code> 的 <code>Reconcile</code> 中添加：</p><figure class="highlight go"><figcaption><span>Reconcile 的默认内容</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *DemoApplicationReconciler)</span></span> Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, <span class="type">error</span>) &#123;</span><br><span class="line">_ = log.FromContext(ctx)</span><br><span class="line"></span><br><span class="line"><span class="comment">// TODO(user): your logic here</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="Reconcile-方法"><a href="#Reconcile-方法" class="headerlink" title="Reconcile 方法"></a>Reconcile 方法</h4><h5 id="Reconcile-方法是什么？"><a href="#Reconcile-方法是什么？" class="headerlink" title="Reconcile 方法是什么？"></a><code>Reconcile</code> 方法是什么？</h5><p><code>Reconcile</code> 方法用于实现将 CR（Custom Resource）的实际状态变更为我们的期望状态。</p><p>比如：将 Deployment 的 replicas 值从 1 修改为 2，DeploymentController 的 Reconcile 方法就创建一个新的 Pod，来满足 replicas 的描述。</p><h5 id="Reconcile-方法什么时候被调用？"><a href="#Reconcile-方法什么时候被调用？" class="headerlink" title="Reconcile 方法什么时候被调用？"></a><code>Reconcile</code> 方法什么时候被调用？</h5><p>Reconcile 方法在每次 <code>watch</code> 的 CR 或资源变更时触发。</p><p>Reconcile 方法每次调用时都会传入 Request 变量，该变量由 <code>Namespace/Name</code> 组成，这个值用于从缓存中查询我们的对象。</p><p>Reconcile 方法根据处理结果不同可以返回不同的结果：</p><figure class="highlight go"><figcaption><span>Reconcile 方法响应的含意</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 存在错误</span></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line"></span><br><span class="line"><span class="comment">// 无错误，但重入队列</span></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;Requeue: <span class="literal">true</span>&#125;, <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 处理完成</span></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// XX 时间后重新执行</span></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;RequeueAfter: nextRun.Sub(r.Now())&#125;, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><h4 id="业务逻辑"><a href="#业务逻辑" class="headerlink" title="业务逻辑"></a>业务逻辑</h4><p>业务逻辑主要是：</p><ol><li><code>Watch</code> 我们的自定义资源 <code>DemoApplication</code>；</li><li><code>DemoApplication</code> 不存在时直接返回；</li><li><code>DemoApplication</code> 存在时，判断 <code>Deployment</code> 是否存在，不存在就创建；</li><li>如果已经存在就根据 <code>spec</code> 有没变化更新 <code>Deployment</code> 和 <code>Service</code>。</li></ol><h4 id="获取自定义资源"><a href="#获取自定义资源" class="headerlink" title="获取自定义资源"></a>获取自定义资源</h4><figure class="highlight go"><figcaption><span>获取自定义资源</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">application := &amp;paascmftcomv1beta1.DemoApplication&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> err := r.Get(ctx, req.NamespacedName, application); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> apierrors.IsNotFound(err) &#123;</span><br><span class="line">log.Info(<span class="string">&quot;DemoApplication resource not found. Ignoring since it must be deleted&quot;</span>)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to get DemoApplication&quot;</span>)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="维护-Deployment-和-Service"><a href="#维护-Deployment-和-Service" class="headerlink" title="维护 Deployment 和 Service"></a>维护 Deployment 和 Service</h4><p>代码框架如下：</p><figure class="highlight go"><figcaption><span>获取管理的 Deployment</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">found := &amp;appsv1.Deployment&#123;&#125;</span><br><span class="line">err := r.Get(ctx, req.NamespacedName, found)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &amp;&amp; apierrors.IsNotFound(err) &#123;</span><br><span class="line"><span class="comment">// TODO 不存在，创建 Deployment 和 Service</span></span><br><span class="line">&#125; <span class="keyword">else</span> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to get Deployment&quot;</span>)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">oldSpec := paascmftcomv1beta1.DemoApplicationSpec&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> application.Annotations[AnnotationLastAppliedConfig] != <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err = json.Unmarshal([]<span class="type">byte</span>(application.Annotations[AnnotationLastAppliedConfig]), &amp;oldSpec); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> !reflect.DeepEqual(application.Spec, oldSpec) &#123;</span><br><span class="line"><span class="comment">// TODO 存在且有变化，更新 Deployment 和 Service</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><p>一般 Operator 都是用于管理特定业务的部署，Deployment 可以根据我们需要提供一个业务默认模板，这样只需要在 <code>DemoApplication</code> 开放一些可配置参数既可。</p><p>根据示例，我们创建 <code>Deployment</code> 的方法如下：</p><figure class="highlight go"><figcaption><span>构造 Deployment 对接</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *DemoApplicationReconciler)</span></span> deploymentForDemoApplication(application *paascmftcomv1beta1.DemoApplication) *appsv1.Deployment &#123;</span><br><span class="line">labels := <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;</span><br><span class="line"><span class="string">&quot;app&quot;</span>: application.Name,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">containerPorts := []corev1.ContainerPort</span><br><span class="line"><span class="keyword">for</span> _, port := <span class="keyword">range</span> application.Spec.Ports &#123;</span><br><span class="line">containerPorts = <span class="built_in">append</span>(containerPorts, corev1.ContainerPort&#123;</span><br><span class="line">ContainerPort: port.TargetPort.IntVal,</span><br><span class="line">&#125;)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> &amp;appsv1.Deployment&#123;</span><br><span class="line">ObjectMeta: metav1.ObjectMeta&#123;</span><br><span class="line">Name:      application.Name,</span><br><span class="line">Namespace: application.Namespace,</span><br><span class="line">&#125;,</span><br><span class="line">Spec: appsv1.DeploymentSpec&#123;</span><br><span class="line">Replicas: application.Spec.Replica,</span><br><span class="line">Selector: &amp;metav1.LabelSelector&#123;</span><br><span class="line">MatchLabels: labels,</span><br><span class="line">&#125;,</span><br><span class="line">Template: corev1.PodTemplateSpec&#123;</span><br><span class="line">ObjectMeta: metav1.ObjectMeta&#123;</span><br><span class="line">Labels: labels,</span><br><span class="line">&#125;,</span><br><span class="line">Spec: corev1.PodSpec&#123;</span><br><span class="line">Containers: []corev1.Container&#123;</span><br><span class="line">&#123;</span><br><span class="line">Name:  application.Name,</span><br><span class="line">Image: application.Spec.Image,</span><br><span class="line">Ports: containerPorts,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>类似的，<code>Service</code> 的创建逻辑如下：</p><figure class="highlight go"><figcaption><span>构造 Service 对象</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *DemoApplicationReconciler)</span></span> serviceForDemoApplication(application *paascmftcomv1beta1.DemoApplication) *corev1.Service &#123;</span><br><span class="line"><span class="keyword">return</span> &amp;corev1.Service&#123;</span><br><span class="line">ObjectMeta: metav1.ObjectMeta&#123;</span><br><span class="line">Name:      application.Name,</span><br><span class="line">Namespace: application.Namespace,</span><br><span class="line">&#125;,</span><br><span class="line">Spec: corev1.ServiceSpec&#123;</span><br><span class="line">Ports: application.Spec.Ports,</span><br><span class="line">Selector: <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;</span><br><span class="line"><span class="string">&quot;app&quot;</span>: application.Name,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>整合后，创建 <code>Deployment</code> 和 <code>Service</code> 的代码如下：</p><figure class="highlight go"><figcaption><span>创建 Deployment 和 Service</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">dep := r.deploymentForDemoApplication(application)</span><br><span class="line"></span><br><span class="line">log.Info(<span class="string">&quot;Creating a new Deployment&quot;</span>, <span class="string">&quot;Deployment.Namespace&quot;</span>, dep.Namespace, <span class="string">&quot;Deployment.Name&quot;</span>, dep.Name)</span><br><span class="line"><span class="keyword">if</span> err := r.Create(ctx, dep); err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to create new Deployment&quot;</span>, <span class="string">&quot;Deployment.Namespace&quot;</span>, dep.Namespace, <span class="string">&quot;Deployment.Name&quot;</span>, dep.Name)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">svc := r.serviceForDemoApplication(application)</span><br><span class="line"></span><br><span class="line">log.Info(<span class="string">&quot;Creating a new Service&quot;</span>, <span class="string">&quot;Service.Namespace&quot;</span>, svc.Namespace, <span class="string">&quot;Service.Name&quot;</span>, svc.Name)</span><br><span class="line"><span class="keyword">if</span> err := r.Create(ctx, svc); err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to create new Service&quot;</span>, <span class="string">&quot;Service.Namespace&quot;</span>, svc.Namespace, <span class="string">&quot;Service.Name&quot;</span>, svc.Name)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">cfg, _ := json.Marshal(application.Spec)</span><br><span class="line"><span class="keyword">if</span> application.Annotations != <span class="literal">nil</span> &#123;</span><br><span class="line">application.Annotations[AnnotationLastAppliedConfig] = <span class="type">string</span>(cfg)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">application.Annotations = <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;</span><br><span class="line">AnnotationLastAppliedConfig: <span class="type">string</span>(cfg),</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> err := r.Update(ctx, application); err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to update DemoApplication&quot;</span>, <span class="string">&quot;DemoApplication.Namespace&quot;</span>, application.Namespace, <span class="string">&quot;DemoApplication.Name&quot;</span>, application.Name)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Requeue the request to ensure the Deployment is created</span></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;RequeueAfter: time.Minute&#125;, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><p>修改时和创建略有不同：</p><figure class="highlight go"><figcaption><span>更新 Deployment 和 Service</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line">newDep := r.deploymentForDemoApplication(application)</span><br><span class="line">oldDep := &amp;appsv1.Deployment&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> err := r.Get(ctx, req.NamespacedName, oldDep); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">oldDep.Spec = newDep.Spec</span><br><span class="line"><span class="keyword">if</span> err := r.Update(ctx, oldDep); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">newSvc := r.serviceForDemoApplication(application)</span><br><span class="line">oldSvc := &amp;corev1.Service&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> err := r.Get(ctx, req.NamespacedName, oldSvc); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">oldSvc.Spec = newSvc.Spec</span><br><span class="line"><span class="keyword">if</span> err := r.Update(ctx, oldSvc); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">cfg, _ := json.Marshal(application.Spec)</span><br><span class="line"><span class="keyword">if</span> application.Annotations != <span class="literal">nil</span> &#123;</span><br><span class="line">application.Annotations[AnnotationLastAppliedConfig] = <span class="type">string</span>(cfg)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">application.Annotations = <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;</span><br><span class="line">AnnotationLastAppliedConfig: <span class="type">string</span>(cfg),</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> err := r.Update(ctx, application); err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to update DemoApplication&quot;</span>, <span class="string">&quot;DemoApplication.Namespace&quot;</span>, application.Namespace, <span class="string">&quot;DemoApplication.Name&quot;</span>, application.Name)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h3><figure class="highlight shell"><figcaption><span>本地运行</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">make install run</span><br></pre></td></tr></table></figure><p>应用配置：</p><figure class="highlight shell"><figcaption><span>增加实验数据</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kubectl apply -f config/samples/v1beta1_demoapplication.yaml</span><br></pre></td></tr></table></figure><h2 id="优化"><a href="#优化" class="headerlink" title="优化"></a>优化</h2><p>现存的问题：</p><ol><li><code>DemoApplication</code> 删除时，对应的 <code>Deployment</code> 和 <code>Service</code> 不会跟着删除；</li><li>如果 <code>Deployment</code> 或 <code>Service</code> 被修改时不会自动恢复。</li></ol><h3 id="问题一"><a href="#问题一" class="headerlink" title="问题一"></a>问题一</h3><p>如果<code>DemoApplication</code> 删除时，对应的 <code>Deployment</code> 和 <code>Service</code> 不会跟着删除，那会导致<code>Deployment</code> 和 <code>Service</code> 资源变得无人管理。</p><p>就像 Deployment 和 Pod 的关系一样，如果 Deployment 被删除，但是 Pod 没被删除会非常难管理。</p><figure class="highlight go"><figcaption><span>手动设置 OwnerReference</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">ObjectMeta: metav1.ObjectMeta&#123;</span><br><span class="line">Name:      application.Name,</span><br><span class="line">Namespace: application.Namespace,</span><br><span class="line">OwnerReferences: []metav1.OwnerReference&#123;</span><br><span class="line">*metav1.NewControllerRef(application, schema.GroupVersionKind&#123;</span><br><span class="line">Group:   paascmftcomv1beta1.GroupVersion.Group,</span><br><span class="line">Version: paascmftcomv1beta1.GroupVersion.Version,</span><br><span class="line">Kind:    <span class="string">&quot;DemoApplication&quot;</span>,</span><br><span class="line">&#125;),</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br></pre></td></tr></table></figure><h4 id="更好的方法"><a href="#更好的方法" class="headerlink" title="更好的方法"></a>更好的方法</h4><p><code>controller-runtime</code> 的 <code>controllerutil</code> 包提供了一个工具方法用于设置 Owner 引用。</p><figure class="highlight go"><figcaption><span>使用工具方法</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ctrl.SetControllerReference(application, dep, r.Scheme)</span><br></pre></td></tr></table></figure><h3 id="问题二"><a href="#问题二" class="headerlink" title="问题二"></a>问题二</h3><p>我们可以在 <code>Reconcile</code> 这个方法里对查询的到 <code>Deployment</code> 进行处理：</p><figure class="highlight go"><figcaption><span>处理 Replicas 不一致问题</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">replicas := application.Spec.Replica</span><br><span class="line"><span class="keyword">if</span> *replicas != *found.Spec.Replicas &#123;</span><br><span class="line">found.Spec.Replicas = replicas</span><br><span class="line"><span class="keyword">if</span> err := r.Update(ctx, found); err != <span class="literal">nil</span> &#123;</span><br><span class="line">log.Error(err, <span class="string">&quot;Failed to update Deployment&quot;</span>, <span class="string">&quot;Deployment.Namespace&quot;</span>, found.Namespace, <span class="string">&quot;Deployment.Name&quot;</span>, found.Name)</span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;&#125;, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> ctrl.Result&#123;Requeue: <span class="literal">true</span>&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>不过如果只是这样写，并不会有效果。为什么？</p><h4 id="Operator-里的资源"><a href="#Operator-里的资源" class="headerlink" title="Operator 里的资源"></a>Operator 里的资源</h4><p>在 controller-runtime 包里，将资源区分为了两种类型：Primary Resources 和 Secondary Resources。</p><ul><li>Primary Resources：主要资源，指的是控制器负责管理的资源，这里是 <code>DemoApplication</code>；</li><li>Secondary Resources：次要资源，控制器也可能管理的资源，但主要是为了支持 <code>DemoApplication</code> 的实现，这里是 <code>Deployment</code> 和 <code>Service</code>。</li></ul><p>次要资源的修改会直接影响主要资源，所以我们的控制器必须要监控并保证次要资源和主要资源的一致。</p><p>这里相当于少了对资源的监控（Watch）动作：</p><figure class="highlight go"><figcaption><span>注册 Reconciler</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *DemoApplicationReconciler)</span></span> SetupWithManager(mgr ctrl.Manager) <span class="type">error</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctrl.NewControllerManagedBy(mgr).</span><br><span class="line">For(&amp;paascmftcomv1beta1.DemoApplication&#123;&#125;).</span><br><span class="line">Owns(&amp;appsv1.Deployment&#123;&#125;).</span><br><span class="line">Complete(r)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>同时在 <code>Reconcile</code> 方法上加个以下注释：</p><figure class="highlight go"><figcaption><span>配置权限注释</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete</span></span><br><span class="line"><span class="comment">// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete</span></span><br></pre></td></tr></table></figure><p>执行 <code>make manifests</code> 生成 RBAC 授权信息。</p><article class="message is-warning">        <div class="message-header"><p>注意点</p></div>        <div class="message-body">            <p>如果不配置这个，部署到集群中后会报权限不足的错误。本地启动因为是读取的 ~/.kube/config 配置，大多是集群管理员身份，可能不会发现这个问题。</p>        </div>    </article><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>使用 Operator SDK 开发一个新的 Operator 可以按以下流程：</p><ol><li>使用 SDK 创建一个新的 Operator 项目</li><li>通过添加自定义资源（CRD）定义新的资源 API</li><li>指定使用 SDK API 来 watch 的资源</li><li>定义 Operator 的协调（reconcile）逻辑</li><li>使用 Operator SDK 构建并生成 Operator 部署清单文件</li></ol><p>当 CRD 需要创建其它资源来实现功能时，需要配置 <code>OwnerReferences</code>，可以使用 <code>ctrl.SetControllerReference</code> 进行配置。</p><p>当需要 Watch 次要资源时，在 <code>SetupWithManager</code> 中设置 <code>Owns</code>。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://www.qikqiak.com/post/k8s-operator-101/">Kubernetes Operator 快速入门教程</a></li><li><a href="https://juejin.cn/s/k8s%20operator%20controller%E5%8C%BA%E5%88%AB">k8s operator controller 区别</a></li><li><a href="https://book.kubebuilder.io/reference/watching-resources/secondary-owned-resources#watching-secondary-resources-owned-by-the-controller">Watching Secondary Resources <code>Owned</code> by the Controller</a></li><li><a href="https://kubebyexample.com/learning-paths/operator-framework/operator-sdk-go/watching-resources">Watching resources</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;&lt;img src=&quot;//images.imoe.tech/blog/s3uRem.png&quot; title=&quot;Operator SDK Workflow&quot;&gt;&lt;/p&gt;
&lt;p&gt;Kubernetes 1.7 版本以来就引入了&lt;a href=&quot;https://kubernetes.io/docs/concepts/api-extension/custom-resources/&quot;&gt;自定义控制器&lt;/a&gt;的概念，该功能可以让开发人员扩展添加新功能，更新现有的功能，并且可以自动执行一些管理任务。&lt;/p&gt;
&lt;p&gt;Operator 是由 CoreOS 开发的，用来扩展 Kubernetes API 的控制器框架，它用来创建、配置和管理复杂的有状态应用，如数据库、缓存和监控系统。&lt;/p&gt;
&lt;p&gt;Operator 基于 Kubernetes 的资源和控制器概念之上构建，但同时又包含了应用程序特定的领域知识。&lt;/p&gt;
&lt;p&gt;这些自定义的控制器就&lt;strong&gt;像 Kubernetes 原生的组件一样&lt;/strong&gt;，Operator 直接使用 Kubernetes API 进行开发，也就是说可以根据这些控制器内部编写的自定义规则来监控集群、更改 Pods/Services、对正在运行的应用进行扩缩容。&lt;/p&gt;
&lt;p&gt;创建 Operator 的关键是 CRD（自定义资源）的设计。本文将通过虚拟需求，设计 CRD 并实现 CRD 的控制逻辑，以体验 Operator 的开发过程。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="Operator" scheme="https://blog.imoe.tech/tags/Operator/"/>
    
  </entry>
  
  <entry>
    <title>深入学习 Deployment 实现</title>
    <link href="https://blog.imoe.tech/2024/08/20/dive-into-deployment/"/>
    <id>https://blog.imoe.tech/2024/08/20/dive-into-deployment/</id>
    <published>2024-08-20T15:19:38.000Z</published>
    <updated>2024-08-20T15:19:57.001Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p><code>Deployment</code> 是 Kubernetes 三个常用工作负载中最常用的。<code>Deployment</code> 用于管理应用的部署情况，可以实现应用的滚动升级和回滚，还能实现应用的扩缩容。</p><p><code>Deployment</code> 通过 <code>ReplicaSet</code> 来管理 <code>Pod</code>。一个完整的 <code>Deployment</code> 创建到 <code>Pod</code> 被拉起的流程由多个控制器协同完成：</p><p><img src="//images.imoe.tech/blog/bTM7PH.jpg" alt="Deployment 处理流程"></p><span id="more"></span><p>当用户创建 <code>Deployment</code> 时通过 kubectl 等客户端调用 API Server：</p><ul><li>API Server 对请求进行认证，最终创建一个 <code>Deloyment</code> 对象，此时会产生 <code>Deplyment</code> <strong>创建事件</strong>；</li><li><code>DeploymentController</code> 监听到事件后，创建 <code>ReplicaSet</code> 对象（由 <code>dc.syncDeployment</code> 方法实现），产生 <code>ReplicaSet</code> 创建事件；</li><li><code>ReplicaSetController</code> 监听到 <code>ReplicaSet</code> 创建事件，创建 <code>Pod</code> 对象（由 <code>syncReplicaSet</code> 方法实现），产生 <code>Pod</code> 创建事件；<ul><li>此时 <code>Pod</code> 的 <code>Spec.nodeName</code> 为空；</li></ul></li><li><code>scheduler</code> 监听到 <code>Pod</code> 创建事件并对 <code>Spec.nodeName</code> 为空的 <code>Pod</code> 执行调度逻辑，选定节点后更新 <code>Pod</code> 的 <code>Spec.nodeName</code>，产生 Pod 更新事件（由 <code>schedule</code> 的 <code>sched.scheduleOne</code> 方法实现）；</li><li><code>kubelet</code> 监听到 <code>Pod</code> 更新事件，判断 <code>Pod</code> 的 <code>Spec.nodeName</code> 是否是当前节点，匹配后按 <code>Pod</code> 的定义启动容器，同时更新 <code>Pod</code> 的 <code>Status</code>（由 <code>Kubelet</code> 的 <code>syncPod</code> 实现）。</li></ul><h2 id="Controller-实例的构造"><a href="#Controller-实例的构造" class="headerlink" title="Controller 实例的构造"></a>Controller 实例的构造</h2><p>前文《<a href="https://blog.imoe.tech/2023/10/11/kube-controller-manager-the-brain-of-cluster/">Kubernetes 集群的大脑 Controller Manager</a>》里介绍过 kube-controller-manager 启动内置控制器的方法。而在 <code>NewControllerInitializers</code> 函数可以看到本文的主角 <code>DeploymentController</code> 的启动方法：</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:445</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewControllerInitializers</span><span class="params">(loopMode ControllerLoopMode)</span></span> <span class="keyword">map</span>[<span class="type">string</span>]InitFunc &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">register(<span class="string">&quot;deployment&quot;</span>, startDeploymentController)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>进入里面看看：</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/apps.go:72</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">startDeploymentController</span><span class="params">(ctx context.Context, controllerContext ControllerContext)</span></span> (controller.Interface, <span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">dc, err := deployment.NewDeploymentController(</span><br><span class="line">controllerContext.InformerFactory.Apps().V1().Deployments(),</span><br><span class="line">controllerContext.InformerFactory.Apps().V1().ReplicaSets(),</span><br><span class="line">controllerContext.InformerFactory.Core().V1().Pods(),</span><br><span class="line">controllerContext.ClientBuilder.ClientOrDie(<span class="string">&quot;deployment-controller&quot;</span>),</span><br><span class="line">)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">true</span>, fmt.Errorf(<span class="string">&quot;error creating Deployment controller: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">go</span> dc.Run(ctx, <span class="type">int</span>(controllerContext.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">true</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>Deployment</code> 是通过 <code>ReplicaSet</code> 来管理 <code>Pod</code> 的，所以这里会获取这三种资源的 Informer。</p><p>然后在协程中启动 <code>DeploymentController.Run</code> 方法，开启 <code>Deployment</code> 资源管理的控制循环。在启动方法里使用了协程，所以在 <code>StartControllers</code> 遍历时没有使用。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/deployment_controller.go:144</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> Run(ctx context.Context, workers <span class="type">int</span>) &#123;</span><br><span class="line"><span class="keyword">defer</span> utilruntime.HandleCrash()</span><br><span class="line"></span><br><span class="line"><span class="comment">// Start events processing pipeline.</span></span><br><span class="line">dc.eventBroadcaster.StartStructuredLogging(<span class="number">0</span>)</span><br><span class="line">dc.eventBroadcaster.StartRecordingToSink(&amp;v1core.EventSinkImpl&#123;Interface: dc.client.CoreV1().Events(<span class="string">&quot;&quot;</span>)&#125;)</span><br><span class="line"><span class="keyword">defer</span> dc.eventBroadcaster.Shutdown()</span><br><span class="line"></span><br><span class="line"><span class="keyword">defer</span> dc.queue.ShutDown()</span><br><span class="line"></span><br><span class="line">klog.InfoS(<span class="string">&quot;Starting controller&quot;</span>, <span class="string">&quot;controller&quot;</span>, <span class="string">&quot;deployment&quot;</span>)</span><br><span class="line"><span class="keyword">defer</span> klog.InfoS(<span class="string">&quot;Shutting down controller&quot;</span>, <span class="string">&quot;controller&quot;</span>, <span class="string">&quot;deployment&quot;</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 等待本地 deployment、rs 和 pod 缓存与服务器同步</span></span><br><span class="line"><span class="keyword">if</span> !cache.WaitForNamedCacheSync(<span class="string">&quot;deployment&quot;</span>, ctx.Done(), dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 启动多个 worker 执行流程</span></span><br><span class="line"><span class="keyword">for</span> i := <span class="number">0</span>; i &lt; workers; i++ &#123;</span><br><span class="line"><span class="keyword">go</span> wait.UntilWithContext(ctx, dc.worker, time.Second)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">&lt;-ctx.Done()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>workers</code> 的默认值是 5，所以启动 5 个协程来运行 <code>worker</code>。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/deployment_controller.go:461</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> worker(ctx context.Context) &#123;</span><br><span class="line"><span class="keyword">for</span> dc.processNextWorkItem(ctx) &#123;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> processNextWorkItem(ctx context.Context) <span class="type">bool</span> &#123;</span><br><span class="line">key, quit := dc.queue.Get()</span><br><span class="line"><span class="keyword">if</span> quit &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> dc.queue.Done(key)</span><br><span class="line"></span><br><span class="line">err := dc.syncHandler(ctx, key.(<span class="type">string</span>))</span><br><span class="line">dc.handleErr(err, key)</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>worker</code> 函数直接无限循环执行 <code>processNextWorkItem</code> 函数。Kubernetes 很多 Controller 都使用这样的格式声明处理 Controller 数据的方法：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *XXController)</span></span> processNextWorkItem(ctx context.Context) <span class="type">bool</span> &#123;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>processNextWorkItem</code> 函数有以下特点：</p><ul><li>一次只从 <code>queue</code> 中取出一个 <code>key</code> 来处理；</li><li>同一个 <code>key</code> 不会并发调用 <code>syncHandler</code>；</li><li>处理完后将 <code>key</code> 标记为完成。</li></ul><h3 id="queue-组件"><a href="#queue-组件" class="headerlink" title="queue 组件"></a>queue 组件</h3><p><code>queue</code> 中的数据由 <code>Reflector</code> 回调的函数创建。在 <code>NewDeploymentController</code> 中创建 Controller 时注册了事件处理函数：</p><figure class="highlight go"><figcaption><span>注册事件处理函数</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewDeploymentController</span><span class="params">(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface)</span></span> (*DeploymentController, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs&#123;</span><br><span class="line">AddFunc:    dc.addDeployment,</span><br><span class="line">UpdateFunc: dc.updateDeployment,</span><br><span class="line"><span class="comment">// This will enter the sync loop and no-op, because the deployment has been deleted from the store.</span></span><br><span class="line">DeleteFunc: dc.deleteDeployment,</span><br><span class="line">&#125;)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当接收到 <code>Deployment</code> 的创建事件后，由 <code>enqueue</code> 方法实现加入队列 <code>queue</code>：</p><figure class="highlight go"><figcaption><span>入队逻辑</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> addDeployment(obj <span class="keyword">interface</span>&#123;&#125;) &#123;</span><br><span class="line">d := obj.(*apps.Deployment)</span><br><span class="line">klog.V(<span class="number">4</span>).InfoS(<span class="string">&quot;Adding deployment&quot;</span>, <span class="string">&quot;deployment&quot;</span>, klog.KObj(d))</span><br><span class="line">dc.enqueueDeployment(d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">dc.enqueueDeployment = dc.enqueue</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> enqueue(deployment *apps.Deployment) &#123;</span><br><span class="line">   key, err := controller.KeyFunc(deployment)</span><br><span class="line">   <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">      utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;couldn&#x27;t get key for object %#v: %v&quot;</span>, deployment, err))</span><br><span class="line">      <span class="keyword">return</span></span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   dc.queue.Add(key)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>无论是<strong>创建、变更还是删除事件</strong>，最终都是通过 <code>enqueue</code> 方法将事件加入队列中，然后在主流程中取出处理。</p><h2 id="主流程"><a href="#主流程" class="headerlink" title="主流程"></a>主流程</h2><p><code>queue</code> 中存储的事件都是调用的 <code>syncHandler</code> 函数进行处理。<code>syncHandler</code> 是个函数指针，指向了 <code>dc.syncDeployment</code> 函数：</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/deployment_controller.go:569</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> syncDeployment(ctx context.Context, key <span class="type">string</span>) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取事件的 Deployment</span></span><br><span class="line">deployment, err := dc.dLister.Deployments(namespace).Get(name)</span><br><span class="line"><span class="keyword">if</span> errors.IsNotFound(err) &#123;</span><br><span class="line">klog.V(<span class="number">2</span>).InfoS(<span class="string">&quot;Deployment has been deleted&quot;</span>, <span class="string">&quot;deployment&quot;</span>, klog.KRef(namespace, name))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Deep-copy otherwise we are mutating our cache.</span></span><br><span class="line"><span class="comment">// <span class="doctag">TODO:</span> Deep-copy only when needed.</span></span><br><span class="line">d := deployment.DeepCopy()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果 deployment 的 selector 为空，则发布告警事件并返回</span></span><br><span class="line">everything := metav1.LabelSelector&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> reflect.DeepEqual(d.Spec.Selector, &amp;everything) &#123;</span><br><span class="line">dc.eventRecorder.Eventf(d, v1.EventTypeWarning, <span class="string">&quot;SelectingAll&quot;</span>, <span class="string">&quot;This deployment is selecting all pods. A non-empty selector is required.&quot;</span>)</span><br><span class="line"><span class="keyword">if</span> d.Status.ObservedGeneration &lt; d.Generation &#123;</span><br><span class="line">d.Status.ObservedGeneration = d.Generation</span><br><span class="line">dc.client.AppsV1().Deployments(d.Namespace).UpdateStatus(ctx, d, metav1.UpdateOptions&#123;&#125;)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取 Deployment 对应的 ReplicaSet</span></span><br><span class="line">rsList, err := dc.getReplicaSetsForDeployment(ctx, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取 Deployment 的 Pod，使用 ReplicaSet 作为 Key 分组</span></span><br><span class="line"><span class="comment">// 返回值 podMap 的用途：</span></span><br><span class="line"><span class="comment">// 1. 检查 Pod 是否被 pod-template-hash 正确标识</span></span><br><span class="line"><span class="comment">// 2. 检查执行 Recreate Deployment 过程中，是否有旧 Pod 在运行</span></span><br><span class="line">podMap, err := dc.getPodMapForDeployment(d, rsList)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> d.DeletionTimestamp != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> dc.syncStatusOnly(ctx, d, rsList)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 在暂停或恢复 Deployment 时，使用 Unknown 状态更新 Deployment 状态。</span></span><br><span class="line"><span class="comment">// 这样可以保证有用户通过设置 progressDeadlineSeconds 恢复 Deployment 时不会超时</span></span><br><span class="line"><span class="keyword">if</span> err = dc.checkPausedConditions(ctx, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> d.Spec.Paused &#123;</span><br><span class="line"><span class="keyword">return</span> dc.sync(ctx, d, rsList)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// rollback is not re-entrant in case the underlying replica sets are updated with a new</span></span><br><span class="line"><span class="comment">// revision so we should ensure that we won&#x27;t proceed to update replica sets until we</span></span><br><span class="line"><span class="comment">// make sure that the deployment has cleaned up its rollback spec in subsequent enqueues.</span></span><br><span class="line"><span class="keyword">if</span> getRollbackTo(d) != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> dc.rollback(ctx, d, rsList)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检查是否 scaling 事件</span></span><br><span class="line">scalingEvent, err := dc.isScalingEvent(ctx, d, rsList)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scalingEvent &#123;</span><br><span class="line"><span class="keyword">return</span> dc.sync(ctx, d, rsList)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">switch</span> d.Spec.Strategy.Type &#123;</span><br><span class="line"><span class="keyword">case</span> apps.RecreateDeploymentStrategyType:</span><br><span class="line"><span class="keyword">return</span> dc.rolloutRecreate(ctx, d, rsList, podMap)</span><br><span class="line"><span class="keyword">case</span> apps.RollingUpdateDeploymentStrategyType:</span><br><span class="line"><span class="keyword">return</span> dc.rolloutRolling(ctx, d, rsList)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;unexpected deployment strategy type: %s&quot;</span>, d.Spec.Strategy.Type)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>syncDeployment</code> 主要执行了以下逻辑：</p><ul><li>获取事件的 <code>Deployment</code>；</li><li>如果 <code>Deployment</code> 的 <code>selector</code> 为空，则发布告警事件并返回；</li><li>获取 <code>Deployment</code> 对应的 <code>ReplicaSet</code>；</li><li>获取 <code>Deployment</code> 的 <code>Pod</code>，使用 <code>ReplicaSet</code> 作为 <code>Key</code> 分组；</li><li>如果当前是在暂停或恢复 <code>Deployment</code>，使用 <code>Unknown</code> 状态更新 <code>Deployment</code> 状态；</li><li>如果在回滚状态，进行回滚；</li><li>如果是 <a href="#scaling-%E7%8A%B6%E6%80%81"><code>scaling</code> 事件</a>，执行调整；</li><li>判断当前部署策略<ul><li>滚动更新（<code>RollingUpdateDeploymentStrategyType</code> 默认）：执行 <code>rolloutRolling</code></li><li>重建（<code>RecreateDeploymentStrategyType</code>）：执行 <code>rolloutRecreate</code></li></ul></li></ul><p>主流程主要做四件事：</p><ol><li>扩缩容：调用 <code>isScalingEvent()</code> 函数遍历活跃 <code>ReplicaSet</code>，判断是否 <code>desired-replicas</code> 注解与 <code>d.Spec.Replicas</code> 存在差异，有差异表示 <code>Deployment</code> 期望副本数有变化；</li><li>暂停处理；</li><li>回滚；</li><li>更新：<code>d.Spec.Template</code> 有变化更新 <code>ReplicaSet</code>，<code>d.Spec</code> 的其它字段变化实际就更新状态。</li></ol><p>创建 <code>Deployment</code> 时，其实是在更新逻辑的 <code>rolloutRolling</code> 或 <code>rolloutRecreate</code> 方法中创建的新的 <code>ReplicaSet</code>。</p><h2 id="扩缩容和暂停"><a href="#扩缩容和暂停" class="headerlink" title="扩缩容和暂停"></a>扩缩容和暂停</h2><p>当前 <code>Deployment</code> 的状态如果是 <code>.Spec.Paused</code> 状态或 <code>scaling</code> 状态（<code>dc.isScalingEvent</code> 函数判断），就执行 <code>dc.sync</code> 方法同步 <code>Deployment</code> 状态。</p><p><code>dc.sync</code> 方法负责协调 <code>Deployment</code> 的 <code>scaling</code> 状态事件或 <code>paused</code> 操作：</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/sync.go:49</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> sync(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 获取新 ReplicaSet 和历史所有的 ReplicaSet</span></span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">false</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 尝试执行扩缩容</span></span><br><span class="line"><span class="keyword">if</span> err := dc.scale(ctx, d, newRS, oldRSs); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// If we get an error while trying to scale, the deployment will be requeued</span></span><br><span class="line"><span class="comment">// so we can abort this resync</span></span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Clean up the deployment when it&#x27;s paused and no rollback is in flight.</span></span><br><span class="line"><span class="comment">// 当前处于 Pause 状态但没在进行回滚时，进行清理 Deployment</span></span><br><span class="line"><span class="keyword">if</span> d.Spec.Paused &amp;&amp; getRollbackTo(d) == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err := dc.cleanupDeployment(ctx, oldRSs, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line"><span class="comment">// 同步 Deploymen 状态</span></span><br><span class="line"><span class="keyword">return</span> dc.syncDeploymentStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>cleanupDeployment</code> 执行的清理是根据配置的历史版本数保存上限，清理超出限制的历史版本。</p><h3 id="scaling-状态"><a href="#scaling-状态" class="headerlink" title="scaling 状态"></a><code>scaling</code> 状态</h3><p><code>scaling</code> 状态的判断是通过 <code>ReplicaSet</code> 的 <code>deployment.kubernetes.io/desired-replicas</code> 注解进行的。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/sync.go:532</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> isScalingEvent(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">false</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, err</span><br><span class="line">&#125;</span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line">logger := klog.FromContext(ctx)</span><br><span class="line"><span class="keyword">for</span> _, rs := <span class="keyword">range</span> controller.FilterActiveReplicaSets(allRSs) &#123;</span><br><span class="line">desired, ok := deploymentutil.GetDesiredReplicasAnnotation(logger, rs)</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 注解的值和 Deployment 的期望值不同</span></span><br><span class="line"><span class="keyword">if</span> desired != *(d.Spec.Replicas) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果下面两个值不一样就是 <code>scaling</code> 状态：</p><ol><li><code>Deployment</code> 期望的副本数 <code>.Spec.Replicas</code>；</li><li>任一活跃 <code>ReplicaSet</code> 的注解 <code>deployment.kubernetes.io/desired-replicas</code>。</li></ol><h3 id="获取所有-ReplicaSet"><a href="#获取所有-ReplicaSet" class="headerlink" title="获取所有 ReplicaSet"></a>获取所有 <code>ReplicaSet</code></h3><p>上面通过调用 <code>dc.getAllReplicaSetsAndSyncRevision</code> 方法获取 Deployment 的所有旧的 <code>ReplicaSet</code> 和当前最新的 <code>ReplicaSet</code>。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/sync.go:116</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> getAllReplicaSetsAndSyncRevision(</span><br><span class="line">ctx context.Context, </span><br><span class="line">d *apps.Deployment, </span><br><span class="line">rsList []*apps.ReplicaSet, </span><br><span class="line">createIfNotExisted <span class="type">bool</span>) (*apps.ReplicaSet, []*apps.ReplicaSet, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">//...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>确定一个 <code>ReplicaSet</code> 是新 <code>ReplicaSet</code> 的方式是判断 <code>Deployment</code> 和 <code>ReplicaSet</code> 的 <code>.Spec.Template</code> 是否 Hash 相等。</p><p>当传入方法的第四个参数为 <code>true</code> 且找不到符合条件的新 ReplicaSet 时，会创建一个新的 ReplicaSet。</p><p>新 <code>ReplicaSet</code> 会使用 <code>Deployment</code> 对象去配置 <code>.Spec.Replicas</code> 和 <code>DesiredReplicasAnnotation</code> 注解：</p><ul><li><code>.Spec.Replicas</code>：<code>Deployment.Spec.Replicas + MaxSurge - 当前旧 Replicas 数量</code>，一般等于 <code>MaxSurge</code> 值；<ul><li>滚动更新时，总 Pod 数量会比期望数量多一些，更新完成后会恢复为期望值；</li><li>替换更新时，直接等于 <code>Deployment</code> 期望值；</li><li>该值不会高于 <code>Deployment</code> 期望值；</li></ul></li><li><code>DesiredReplicasAnnotation</code>：`Deployment.Spec.Replicas</li></ul><p>所以，这里返回的 <code>newRS</code> 就是我们 <code>Deployment</code> 期望的最终状态。</p><h3 id="活跃-ReplicaSet"><a href="#活跃-ReplicaSet" class="headerlink" title="活跃 ReplicaSet"></a>活跃 <code>ReplicaSet</code></h3><p>所谓活跃的就是 <code>.Spec.Replicas</code> 大于 0 的。这里有三种情况：</p><ul><li>单纯在调整 <code>Deployment</code> 的 <code>Replicas</code>：只有<strong>一个活跃</strong>的 <code>ReplicaSet</code>；</li><li>从 0 副本扩容：旧 <code>ReplicaSet</code> 全都不活跃，使用传入的新 <code>ReplicaSet</code>；</li><li>滚动升级：同时存在多个活跃的 <code>ReplicaSet</code>，不满足条件，走下一个分支<ul><li>旧的 <code>ReplicaSet</code>：正在缩容</li><li>新的 <code>ReplicaSet</code>：创建时设置了初始 <code>.Spec.Replicas</code>，所以能被当成活跃的 <code>ReplicaSet</code></li></ul></li></ul><p>99% 的时间里 <code>Deployment</code> 对应的活跃的  <code>ReplicaSet</code>  只有一个，只有更新时才会出现 2 个  <code>ReplicaSet</code> ，极少数情况下（短时间重复更新）才会出现 2 个以上的  <code>ReplicaSet</code>。</p><h3 id="scale-方法"><a href="#scale-方法" class="headerlink" title="scale 方法"></a>scale 方法</h3><p>接下来重点看一下 <code>scale</code> 方法：</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/sync.go:298</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> scale(ctx context.Context, deployment *apps.Deployment, newRS *apps.ReplicaSet, oldRSs []*apps.ReplicaSet) <span class="type">error</span> &#123;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取活跃或最新的 ReplicaSet，有多个活跃 RS 时返回 nil；</span></span><br><span class="line"><span class="comment">// 如果只有一个活跃的 ReplicaSet 就把这个 RS 副本数扩容到 Deployment 配置的；</span></span><br><span class="line"><span class="comment">// 如果无活跃的 RS，则扩容最新的 RS</span></span><br><span class="line"><span class="keyword">if</span> activeOrLatest := deploymentutil.FindActiveOrLatest(newRS, oldRSs); activeOrLatest != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// 如果 RS 副本数已经和 Deployment 配置的相同就退出</span></span><br><span class="line"><span class="keyword">if</span> *(activeOrLatest.Spec.Replicas) == *(deployment.Spec.Replicas) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 执行 scale 操作</span></span><br><span class="line">_, _, err := dc.scaleReplicaSetAndRecordEvent(ctx, activeOrLatest, *(deployment.Spec.Replicas), deployment)</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果新的 RS 已经饱和则旧 RS 应该全部缩容到 0</span></span><br><span class="line"><span class="keyword">if</span> deploymentutil.IsSaturated(deployment, newRS) &#123;</span><br><span class="line"><span class="keyword">for</span> _, old := <span class="keyword">range</span> controller.FilterActiveReplicaSets(oldRSs) &#123;</span><br><span class="line"><span class="keyword">if</span> _, _, err := dc.scaleReplicaSetAndRecordEvent(ctx, old, <span class="number">0</span>, deployment); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 在滚动更新时，同时存在老的 RS 和新的 RS，需要适当控制 RS 的扩缩容以保证不超过 MaxSurge</span></span><br><span class="line"><span class="keyword">if</span> deploymentutil.IsRollingUpdate(deployment) &#123;</span><br><span class="line">allRSs := controller.FilterActiveReplicaSets(<span class="built_in">append</span>(oldRSs, newRS))</span><br><span class="line">allRSsReplicas := deploymentutil.GetReplicaCountForReplicaSets(allRSs)</span><br><span class="line"></span><br><span class="line">allowedSize := <span class="type">int32</span>(<span class="number">0</span>)</span><br><span class="line"><span class="keyword">if</span> *(deployment.Spec.Replicas) &gt; <span class="number">0</span> &#123;</span><br><span class="line">allowedSize = *(deployment.Spec.Replicas) + deploymentutil.MaxSurge(*deployment)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 需要增加的 Pod 数量，负数则为减少。调整的数量需要合适的分布到活跃的 RS 中</span></span><br><span class="line">deploymentReplicasToAdd := allowedSize - allRSsReplicas</span><br><span class="line"></span><br><span class="line"><span class="comment">// 缩放方向决定了在我们试图缩放相同大小 RS 的情况下会发生什么。</span></span><br><span class="line"><span class="comment">// 当进行扩容时，应操作更新的 RS；缩容时先操作更老的 RS</span></span><br><span class="line"><span class="keyword">var</span> scalingOperation <span class="type">string</span></span><br><span class="line"><span class="keyword">switch</span> &#123;</span><br><span class="line"><span class="keyword">case</span> deploymentReplicasToAdd &gt; <span class="number">0</span>:</span><br><span class="line">sort.Sort(controller.ReplicaSetsBySizeNewer(allRSs))</span><br><span class="line">scalingOperation = <span class="string">&quot;up&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">case</span> deploymentReplicasToAdd &lt; <span class="number">0</span>:</span><br><span class="line">sort.Sort(controller.ReplicaSetsBySizeOlder(allRSs))</span><br><span class="line">scalingOperation = <span class="string">&quot;down&quot;</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 遍历所有活跃 RS，计算每个 RS 的副本数量</span></span><br><span class="line">deploymentReplicasAdded := <span class="type">int32</span>(<span class="number">0</span>)</span><br><span class="line">nameToSize := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">string</span>]<span class="type">int32</span>)</span><br><span class="line"><span class="keyword">for</span> i := <span class="keyword">range</span> allRSs &#123;</span><br><span class="line">rs := allRSs[i]</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果有需要调整的 Pod 就进行调整，否则保持</span></span><br><span class="line"><span class="keyword">if</span> deploymentReplicasToAdd != <span class="number">0</span> &#123;</span><br><span class="line">proportion := deploymentutil.GetProportion(rs, *deployment, deploymentReplicasToAdd, deploymentReplicasAdded)</span><br><span class="line"></span><br><span class="line">nameToSize[rs.Name] = *(rs.Spec.Replicas) + proportion</span><br><span class="line">deploymentReplicasAdded += proportion</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">nameToSize[rs.Name] = *(rs.Spec.Replicas)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 更新 RS</span></span><br><span class="line"><span class="keyword">for</span> i := <span class="keyword">range</span> allRSs &#123;</span><br><span class="line">rs := allRSs[i]</span><br><span class="line"></span><br><span class="line"><span class="comment">// Add/remove any leftovers to the largest replica set.</span></span><br><span class="line"><span class="comment">// 第一个 RS</span></span><br><span class="line"><span class="keyword">if</span> i == <span class="number">0</span> &amp;&amp; deploymentReplicasToAdd != <span class="number">0</span> &#123;</span><br><span class="line"><span class="comment">// 把多余的变化应用到第一个 RS 上</span></span><br><span class="line">leftover := deploymentReplicasToAdd - deploymentReplicasAdded</span><br><span class="line">nameToSize[rs.Name] = nameToSize[rs.Name] + leftover</span><br><span class="line"><span class="keyword">if</span> nameToSize[rs.Name] &lt; <span class="number">0</span> &#123;</span><br><span class="line">nameToSize[rs.Name] = <span class="number">0</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 更新 RS</span></span><br><span class="line"><span class="keyword">if</span> _, _, err := dc.scaleReplicaSet(ctx, rs, nameToSize[rs.Name], deployment, scalingOperation); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// Return as soon as we fail, the deployment is requeued</span></span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>该方法主要是对 <code>ReplicaSet</code> 进行扩缩容操作，这个方法只有 <code>scaling</code> 事件和 <code>paused</code> 的 <code>Deployment</code> 中使用，<strong>正常的滚动更新不会走这里处理</strong>：</p><ul><li>首先获取活跃的 <code>ReplicaSet</code>：<ul><li>如果有一个活跃 <code>ReplicaSet</code>（没有活跃就是最后一个 <code>ReplicaSet</code>）则直接对该 <code>ReplicaSet</code> 进行 scale 操作（<strong>替换更新</strong>和常规的的扩缩容在这里处理）；</li><li>多个活跃 <code>ReplicaSet</code> 就进入下一面代码；</li></ul></li><li>接下来就判断是否新 <code>ReplicaSet</code> 已经调整完毕：<ul><li>判断新的 <code>ReplicaSet</code> 是否已经饱和，如饱和将旧 RS 缩容到 0；</li></ul></li><li>没调整完说明同时存在老的 <code>ReplicaSet</code> 和新的 <code>ReplicaSet</code>，需要适当控制 RS 的扩缩容以保证不超过 <code>MaxSurge</code>。<ul><li>只有策略是滚动升级才可能运行到这里，替换更新一般只进入第一个步骤就结束了；</li><li>此时<strong>滚动更新正在进行，紧接着进行扩缩容操作</strong>。</li></ul></li></ul><h3 id="新旧-ReplicaSet-扩缩容"><a href="#新旧-ReplicaSet-扩缩容" class="headerlink" title="新旧 ReplicaSet 扩缩容"></a>新旧 <code>ReplicaSet</code> 扩缩容</h3><p>当同时存在新旧的 <code>ReplicaSet</code> 时，在 <code>scale</code> 方法的最后对滚动升级的 <code>ReplicaSet</code> 进行调整。</p><p>这个新旧 <code>ReplicaSet</code> 扩缩容的动作，只是在<strong>滚动更新的同时又进行扩缩容操作</strong>时进行，主要的逻辑将新增或减少的副本数<strong>先平均分摊</strong>到所有活跃的 <code>ReplicaSet</code>，再将<strong>剩余的</strong>应用到原来副本数最多的 <code>ReplicaSet</code> 上（当副本数相同就比较创建时间，扩容选新，缩容选旧）。</p><p>详细逻辑如下：</p><ul><li>首先通过预期 <code>Replicas</code> 数量和当前活跃 <code>ReplicaSet</code> 的副本总数，计算出要变动的 <code>Pod</code> 数量 <code>deploymentReplicasToAdd</code>：<ul><li>如果是负值：缩容，活跃 <code>ReplicaSet</code> 列表排序，数量多、旧的在前；</li><li>如果是正值：扩容，活跃 <code>ReplicaSet</code> 列表排序，数量多、新的在前。</li></ul></li><li>遍历所有活跃 <code>ReplicaSet</code>，计算每个 <code>ReplicaSet</code> 的副本数，将 <code>deploymentReplicasToAdd</code> 分摊到各个活跃的 <code>ReplicaSet</code> 上；</li><li>再次遍历所有活跃 <code>ReplicaSet</code>，更新对应 <code>ReplicaSet</code> 的副本数并将多余的 <code>deploymentReplicasToAdd</code> 应用到第一个 <code>ReplicaSet</code>：<ul><li>如果是扩容，应用到数量<strong>最多或最新</strong>的；</li><li>如果是缩容，应用到数量<strong>最多或最旧</strong>的。</li></ul></li></ul><article class="message is-success">        <div class="message-header"><p><i class="far fa-edit mr-2"></i>注意</p></div>        <div class="message-body">            <p>这里执行完后 <code>scaling</code> 事件就已经结束（<code>deployment.kubernetes.io/desired-replicas</code> 注解已更新成 Deployment 的 <code>.Spec.Replicas</code> 的值）。</p><p>后面继续走滚动更新的逻辑（<code>rolloutRolling</code>）完成新旧 <code>ReplicaSet</code> 的滚动。</p>        </div>    </article><h2 id="回滚"><a href="#回滚" class="headerlink" title="回滚"></a>回滚</h2><p>回滚操作的原理是：</p><ul><li>复制历史的某个版本的 <code>ReplicaSet</code> 里的 <code>podTemplate.Spec</code>；</li><li>替换当前 <code>Deployment</code> 的 <code>.Spec.Template</code>，删除 <code>rollback</code> 注解；</li><li>下一次 <code>Deployment</code> 完成更新操作。</li></ul><h2 id="更新"><a href="#更新" class="headerlink" title="更新"></a>更新</h2><p>这里的更新操作完成了 <code>ReplicaSet</code> 注解和 <code>.Spec.Replicas</code> 字段的操作，实际的扩缩容操作是由 <code>ReplicaSetController</code> 来进行的。</p><p>同时，只有 <code>Deployment.Spec.Template</code> 有变化才创建新的 <code>ReplicaSet</code> 并进行更新，而 <code>Deployment.Spec</code> 的其它字段变化只会更新 <code>Deployment</code> 和原有 <code>ReplicaSet</code> 的状态。</p><h3 id="滚动更新"><a href="#滚动更新" class="headerlink" title="滚动更新"></a>滚动更新</h3><p>滚动更新 <code>RollingUpdate</code> 是默认的策略。<code>Deployment</code> 的 <code>Spec.Template</code> 字段的内容只要一更新就会生成新的 <code>ReplicaSet</code>，并且基于新的 <code>ReplicaSet</code> 执行滚动更新，原有的 <code>ReplicaSet</code> 会进行滚动缩容。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/rolling.go:32</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> rolloutRolling(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 获取 RS，如果不存在新 RS 就创建</span></span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">true</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 尝试进行扩容操作</span></span><br><span class="line">scaledUp, err := dc.reconcileNewReplicaSet(ctx, allRSs, newRS, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledUp &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 尝试进行缩容操作</span></span><br><span class="line">scaledDown, err := dc.reconcileOldReplicaSets(ctx, allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledDown &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> deploymentutil.DeploymentComplete(d, &amp;d.Status) &#123;</span><br><span class="line"><span class="keyword">if</span> err := dc.cleanupDeployment(ctx, oldRSs, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Sync deployment status</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>滚动更新过程：</p><ul><li><code>getAllReplicaSetsAndSyncRevision</code>：获取所有 <code>ReplicaSet</code>，如果新 <code>ReplicaSet</code> 不存在就创建一个新的（参看上面：<a href="#%E8%8E%B7%E5%8F%96%E6%89%80%E6%9C%89-ReplicaSet">获取所有 ReplicaSet</a>）；</li><li><code>reconcileNewReplicaSet</code> 对新 <code>ReplicaSet</code> 进行扩容；</li><li><code>reconcileOldReplicaSets</code> 对旧 <code>ReplicaSet</code> 进行缩容；<ul><li>对于新增的 <code>ReplicaSet</code>，此时总副本数已经超过期望数，需要在这里对旧 <code>ReplicaSet</code> 进行缩容操作</li></ul></li><li>更新 <code>Deployment</code> 状态</li></ul><p>这里的扩缩容操作实际上是通过修改 <code>.Spec.Replicas</code> 和使用 <code>deploymentutil.SetReplicasAnnotations</code> 函数操作 <code>ReplicaSet</code> 注解 <code>deployment.kubernetes.io/desired-replicas</code> 实现的。</p><p><code>deployment.kubernetes.io/desired-replicas</code> 注解会设置为 <code>Deployment</code> 的 <code>.Spec.Replicas</code> 的值。滚动的的过程是修改 <code>ReplicaSet</code> 的 <code>.Spec.Replicas</code>。</p><p>整个滚动更新过程就是不断地扩容新 <code>ReplicaSet</code>、缩容旧 <code>ReplicaSet</code> 再更新 <code>Deployment</code> 过程：</p><ol><li>扩容新 <code>ReplicaSet</code> 的 <code>.Spec.Replicas</code>：原值 + <code>maxSurge</code>，直到等于 <code>Deployment.Spec.Replicas</code>；</li><li>缩容所有旧 <code>ReplicaSet</code> 的 <code>.Spec.Replicas</code>：总数缩减最多 <code>maxUnavalible</code>，直到等于 0；</li><li>更新 <code>Deployment</code> 的 <code>Status</code>，触发下一轮 reconcile。</li></ol><p>多轮滚动后，新 <code>ReplicaSet</code> 副本数达到预期值，旧 <code>ReplicaSet</code> 副本数也缩减到 0，滚动更新结束。</p><h3 id="替换更新"><a href="#替换更新" class="headerlink" title="替换更新"></a>替换更新</h3><p>替换更新相比之下更为简单，和滚动更新不同的是替换更新先对原有的 <code>ReplicaSet</code> 进行缩容操作，直到所有的 <code>Pod</code> 都退出后再创建新的 <code>ReplicaSet</code> 并进行扩容。</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/recreate.go:29</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> rolloutRecreate(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet, podMap <span class="keyword">map</span>[types.UID][]*v1.Pod) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 如果不存在新 RS，在缩容前不创建</span></span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">false</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line">activeOldRSs := controller.FilterActiveReplicaSets(oldRSs)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 对 RS 进行缩容</span></span><br><span class="line">scaledDown, err := dc.scaleDownOldReplicaSetsForRecreate(ctx, activeOldRSs, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledDown &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus.</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 缩容没完成，还有 Pod 在运行，先等着（先跳过）</span></span><br><span class="line"><span class="keyword">if</span> oldPodsRunning(newRS, oldRSs, podMap) &#123;</span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果不存在新 RS，就创建</span></span><br><span class="line"><span class="keyword">if</span> newRS == <span class="literal">nil</span> &#123;</span><br><span class="line">newRS, oldRSs, err = dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">true</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs = <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 扩容新 RS</span></span><br><span class="line"><span class="keyword">if</span> _, err := dc.scaleUpNewReplicaSetForRecreate(ctx, newRS, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> util.DeploymentComplete(d, &amp;d.Status) &#123;</span><br><span class="line"><span class="keyword">if</span> err := dc.cleanupDeployment(ctx, oldRSs, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Sync deployment status.</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="不更新情况"><a href="#不更新情况" class="headerlink" title="不更新情况"></a>不更新情况</h2><p>前面说过，<code>Deployment.Spec</code> 的其它字段变化只会更新 <code>Deployment</code> 和原有 <code>ReplicaSet</code> 的状态。这里回顾一下 <code>rolloutRolling</code> 和 <code>rolloutRecreate</code> 的代码，看看是怎么实现的。</p><p><strong>不更新情况</strong>：指  <code>Deployment.Spec.Template</code> 没有发生变化，<code>Deployment.Spec</code> 有变化。</p><h3 id="滚动更新-1"><a href="#滚动更新-1" class="headerlink" title="滚动更新"></a>滚动更新</h3><figure class="highlight go"><figcaption><span>pkg/controller/deployment/rolling.go:32</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> rolloutRolling(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 获取 RS，当不更新时，newRS 取原先的 RS</span></span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">true</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果容量不变，不扩容</span></span><br><span class="line">scaledUp, err := dc.reconcileNewReplicaSet(ctx, allRSs, newRS, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledUp &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 缩容操作。如果活跃的旧 RS 为空，不缩容</span></span><br><span class="line">scaledDown, err := dc.reconcileOldReplicaSets(ctx, allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledDown &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 更新状态</span></span><br><span class="line"><span class="keyword">if</span> deploymentutil.DeploymentComplete(d, &amp;d.Status) &#123;</span><br><span class="line"><span class="keyword">if</span> err := dc.cleanupDeployment(ctx, oldRSs, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Sync deployment status</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><h3 id="替换更新-1"><a href="#替换更新-1" class="headerlink" title="替换更新"></a>替换更新</h3><figure class="highlight go"><figcaption><span>pkg/controller/deployment/recreate.go:29</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> rolloutRecreate(ctx context.Context, d *apps.Deployment, rsList []*apps.ReplicaSet, podMap <span class="keyword">map</span>[types.UID][]*v1.Pod) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 当不更新时，newRS 取原先的 RS</span></span><br><span class="line">newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">false</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs := <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line"><span class="comment">// oldRSs 中没有活跃 RS，这里为空</span></span><br><span class="line">activeOldRSs := controller.FilterActiveReplicaSets(oldRSs)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 对 RS 进行缩容，activeOldRSs 为空，实际不缩容</span></span><br><span class="line">scaledDown, err := dc.scaleDownOldReplicaSetsForRecreate(ctx, activeOldRSs, d)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> scaledDown &#123;</span><br><span class="line"><span class="comment">// Update DeploymentStatus.</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 在进行的 Pod 属于当前 newRS 的，跳过</span></span><br><span class="line"><span class="keyword">if</span> oldPodsRunning(newRS, oldRSs, podMap) &#123;</span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// newRS 不会为 nil</span></span><br><span class="line"><span class="keyword">if</span> newRS == <span class="literal">nil</span> &#123;</span><br><span class="line">newRS, oldRSs, err = dc.getAllReplicaSetsAndSyncRevision(ctx, d, rsList, <span class="literal">true</span>)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">allRSs = <span class="built_in">append</span>(oldRSs, newRS)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 副本数没变化，实际没扩容</span></span><br><span class="line"><span class="keyword">if</span> _, err := dc.scaleUpNewReplicaSetForRecreate(ctx, newRS, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 更新状态</span></span><br><span class="line"><span class="keyword">if</span> util.DeploymentComplete(d, &amp;d.Status) &#123;</span><br><span class="line"><span class="keyword">if</span> err := dc.cleanupDeployment(ctx, oldRSs, d); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Sync deployment status.</span></span><br><span class="line"><span class="keyword">return</span> dc.syncRolloutStatus(ctx, allRSs, newRS, d)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="引申"><a href="#引申" class="headerlink" title="引申"></a>引申</h2><h3 id="实现-Deployment-重启"><a href="#实现-Deployment-重启" class="headerlink" title="实现 Deployment 重启"></a>实现 Deployment 重启</h3><p>平时使用 <code>Deployment</code> 部署开发环境时，想要重启应用很多时候都是直接把 <code>Pod</code> 删除来达到重启的目的。有没有一种更优雅的方法呢？</p><p>从上面对 <code>DeploymentController</code> 对于更新的实现可以知道，当 <code>Deployment</code> 的 <code>.Spec.Template</code> 发生变化时会触发更新流程。</p><p>我们可以在 <code>.Spec.Template</code> 里加个条注解（<code>Annotations</code>）记录，值设定为当前的时间，这样就能触发更新流程，实现重启功能。</p><p>这样的做法其实是 <code>kubectl rollout restart</code> 命令的实现原理，实际上 POD Template 并没有改变，只是通过在 <code>.spec.template.metadata.annotations</code> 注解里增加或修改 <code>kubectl.kubernetes.io/restartedAt</code> 的时间戳来实现重启，并不会修改副本数。</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/kubectl/pkg/polymorphichelpers/objectrestarter.go:32</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">defaultObjectRestarter</span><span class="params">(obj runtime.Object)</span></span> ([]<span class="type">byte</span>, <span class="type">error</span>) &#123;  </span><br><span class="line">    <span class="keyword">switch</span> obj := obj.(<span class="keyword">type</span>) &#123;  </span><br><span class="line">    <span class="keyword">case</span> *extensionsv1beta1.Deployment:  </span><br><span class="line">       <span class="keyword">if</span> obj.Spec.Paused &#123;  </span><br><span class="line">          <span class="keyword">return</span> <span class="literal">nil</span>, errors.New(<span class="string">&quot;can&#x27;t restart paused deployment (run rollout resume first)&quot;</span>)  </span><br><span class="line">       &#125;  </span><br><span class="line">       <span class="keyword">if</span> obj.Spec.Template.ObjectMeta.Annotations == <span class="literal">nil</span> &#123;  </span><br><span class="line">          obj.Spec.Template.ObjectMeta.Annotations = <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>)  </span><br><span class="line">       &#125;  </span><br><span class="line">       obj.Spec.Template.ObjectMeta.Annotations[<span class="string">&quot;kubectl.kubernetes.io/restartedAt&quot;</span>] = time.Now().Format(time.RFC3339)  </span><br><span class="line">       <span class="keyword">return</span> runtime.Encode(scheme.Codecs.LegacyCodec(extensionsv1beta1.SchemeGroupVersion), obj)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p><code>Deployment</code> 的 <code>Pod</code> 管理是通过 <code>ReplicaSet</code> 来进行的，<code>DeploymentController</code> 的代码也不涉及 <code>Pod</code> 的直接操作。</p><p><code>DeploymentController</code> 对 <code>Deployment</code> 和 <code>ReplicaSet</code> 的操作并不是立即完成的，而是在控制循环中反复执行、收敛、修正，最终达到期望状态，完成更新。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/controller/Kubernetes%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0-Controller-P3-Controller%E5%88%86%E7%B1%BB%E4%B8%8EDeployment%20Controller.md">P3-Controller 分类与 Deployment Controller</a></li><li><a href="https://juejin.cn/post/7012446367528255519">当你创建了一个 Deployment 时，Kubernetes 内部发生了什么？</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;&lt;code&gt;Deployment&lt;/code&gt; 是 Kubernetes 三个常用工作负载中最常用的。&lt;code&gt;Deployment&lt;/code&gt; 用于管理应用的部署情况，可以实现应用的滚动升级和回滚，还能实现应用的扩缩容。&lt;/p&gt;
&lt;p&gt;&lt;code&gt;Deployment&lt;/code&gt; 通过 &lt;code&gt;ReplicaSet&lt;/code&gt; 来管理 &lt;code&gt;Pod&lt;/code&gt;。一个完整的 &lt;code&gt;Deployment&lt;/code&gt; 创建到 &lt;code&gt;Pod&lt;/code&gt; 被拉起的流程由多个控制器协同完成：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;//images.imoe.tech/blog/bTM7PH.jpg&quot; alt=&quot;Deployment 处理流程&quot;&gt;&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes 代码中的 UpgradeAwareHandler</title>
    <link href="https://blog.imoe.tech/2023/12/13/UpgradeAwareHandler/"/>
    <id>https://blog.imoe.tech/2023/12/13/UpgradeAwareHandler/</id>
    <published>2023-12-13T06:32:48.000Z</published>
    <updated>2024-01-23T04:30:07.250Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p><code>UpgradeAwareHandler</code> 是 Kubernetes 里很重要的一个代码组件，在 Kubernetes 中用于代理和转发请求。</p><p>只要是有转发请求的地方都可以见到他的身影：</p><ol><li>kubectl 的命令 exec/attach/log/port-forward 等需要连接到容器的长连接；</li><li>APIServer Aggregation 功能，需要将请求转发到外部 APIServer。</li></ol><p>第三方的集群网关组件也会利用这个组件来实现转发代理，如：Karmada、KubeVela Cluster Gateway 等。</p><p>为什么都使用这个组件来转发请求？本文通过阅读源码，深入研究这个组件的实现原理以及使用方式。</p><span id="more"></span><h2 id="源码"><a href="#源码" class="headerlink" title="源码"></a>源码</h2><p><code>UpgradeAwareHandler</code> 是一个代理转发组件，从名字就可以知道这个组件不只是转发 HTTP 请求，同时对于需要执行 Upgrade 操作的请求如 WebSocket 或 SPDY 的协议也可以很好地支持。</p><p><code>UpgradeAwareHandler</code> 最简单的使用方式是通过 <code>NewUpgradeAwareHandler()</code> 方法构造，下面是 APIServer 里实现请求转发时调用的方法：</p><figure class="highlight go"><figcaption><span>pkg/registry/core/pod/rest/subresources.go:216</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">newThrottledUpgradeAwareProxyHandler</span><span class="params">(location *url.URL, transport http.RoundTripper, wrapTransport, upgradeRequired <span class="type">bool</span>, responder rest.Responder)</span></span> *proxy.UpgradeAwareHandler &#123;</span><br><span class="line">handler := proxy.NewUpgradeAwareHandler(location, transport, wrapTransport, upgradeRequired, proxy.NewErrorResponder(responder))</span><br><span class="line">handler.MaxBytesPerSec = capabilities.Get().PerConnectionBandwidthLimitBytesPerSec</span><br><span class="line"><span class="keyword">return</span> handler</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>NewUpgradeAwareHandler()</code> 有以下几个参数：</p><ul><li><code>location</code>：上游地址，如果是一个升级请求，会使用这个地址进行 dial；</li><li><code>Transport</code>：用来提供自定义的 <code>RoundTripper</code>，通常用户认证等信息都是通过这个参数的 <code>RoundTripper</code> 来配置的；</li><li><code>wrapTransport</code>：是否使用默认 <code>RoundTripper</code> 对 <code>Transport</code> 进行包裹；</li><li><code>upgradeRequired</code>：请求是否需要升级，如果为 <code>true</code> 但升级失败会返回错误；</li><li><code>responder</code>：错误时的响应输出。</li></ul><p>构造完成后，调用 handler 的 <code>handler.ServeHTTP(w, req)</code> 方法将请求交由 <code>UpgradeAwareHandler</code> 处理就可以实现代理请求的转发。</p><h3 id="请求处理"><a href="#请求处理" class="headerlink" title="请求处理"></a>请求处理</h3><p><code>ServeHTTP</code> 方法分为两类处理：需升级的请求和其它请求，代码如下：</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:213</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *UpgradeAwareHandler)</span></span> ServeHTTP(w http.ResponseWriter, req *http.Request) &#123;</span><br><span class="line"><span class="keyword">if</span> h.tryUpgrade(w, req) &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 强制升级但升级失败返回错误响应</span></span><br><span class="line"><span class="keyword">if</span> h.UpgradeRequired &#123;</span><br><span class="line">h.Responder.Error(w, req, errors.NewBadRequest(<span class="string">&quot;Upgrade request required&quot;</span>))</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">loc := *h.Location</span><br><span class="line">loc.RawQuery = req.URL.RawQuery</span><br><span class="line"></span><br><span class="line"><span class="comment">// If original request URL ended in &#x27;/&#x27;, append a &#x27;/&#x27; at the end of the</span></span><br><span class="line"><span class="comment">// of the proxy URL</span></span><br><span class="line"><span class="keyword">if</span> !strings.HasSuffix(loc.Path, <span class="string">&quot;/&quot;</span>) &amp;&amp; strings.HasSuffix(req.URL.Path, <span class="string">&quot;/&quot;</span>) &#123;</span><br><span class="line">loc.Path += <span class="string">&quot;/&quot;</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 处理重定向</span></span><br><span class="line">proxyRedirect := proxyRedirectsforRootPath(loc.Path, w, req)</span><br><span class="line"><span class="keyword">if</span> proxyRedirect &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果没配置自定义 Transport 或开启了 WrapTransport 就使用默认 Transport</span></span><br><span class="line"><span class="keyword">if</span> h.Transport == <span class="literal">nil</span> || h.WrapTransport &#123;</span><br><span class="line">h.Transport = h.defaultProxyTransport(req.URL, h.Transport)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// WithContext creates a shallow clone of the request with the same context.</span></span><br><span class="line">newReq := req.WithContext(req.Context())</span><br><span class="line">newReq.Header = utilnet.CloneHeader(req.Header)</span><br><span class="line"><span class="keyword">if</span> !h.UseRequestLocation &#123;</span><br><span class="line">newReq.URL = &amp;loc</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> h.UseLocationHost &#123;</span><br><span class="line"><span class="comment">// exchanging req.Host with the backend location is necessary for backends that act on the HTTP host header (e.g. API gateways),</span></span><br><span class="line"><span class="comment">// because req.Host has preference over req.URL.Host in filling this header field</span></span><br><span class="line">newReq.Host = h.Location.Host</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// create the target location to use for the reverse proxy</span></span><br><span class="line">reverseProxyLocation := &amp;url.URL&#123;Scheme: h.Location.Scheme, Host: h.Location.Host&#125;</span><br><span class="line"><span class="keyword">if</span> h.AppendLocationPath &#123;</span><br><span class="line">reverseProxyLocation.Path = h.Location.Path</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 构造代理</span></span><br><span class="line">proxy := httputil.NewSingleHostReverseProxy(reverseProxyLocation)</span><br><span class="line">proxy.Transport = h.Transport</span><br><span class="line">proxy.FlushInterval = h.FlushInterval</span><br><span class="line">proxy.ErrorLog = log.New(noSuppressPanicError&#123;&#125;, <span class="string">&quot;&quot;</span>, log.LstdFlags)</span><br><span class="line"><span class="keyword">if</span> h.RejectForwardingRedirects &#123;</span><br><span class="line">oldModifyResponse := proxy.ModifyResponse</span><br><span class="line">proxy.ModifyResponse = <span class="function"><span class="keyword">func</span><span class="params">(response *http.Response)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">code := response.StatusCode</span><br><span class="line"><span class="keyword">if</span> code &gt;= <span class="number">300</span> &amp;&amp; code &lt;= <span class="number">399</span> &amp;&amp; <span class="built_in">len</span>(response.Header.Get(<span class="string">&quot;Location&quot;</span>)) &gt; <span class="number">0</span> &#123;</span><br><span class="line"><span class="comment">// close the original response</span></span><br><span class="line">response.Body.Close()</span><br><span class="line">msg := <span class="string">&quot;the backend attempted to redirect this request, which is not permitted&quot;</span></span><br><span class="line"><span class="comment">// replace the response</span></span><br><span class="line">*response = http.Response&#123;</span><br><span class="line">StatusCode:    http.StatusBadGateway,</span><br><span class="line">Status:        fmt.Sprintf(<span class="string">&quot;%d %s&quot;</span>, response.StatusCode, http.StatusText(response.StatusCode)),</span><br><span class="line">Body:          io.NopCloser(strings.NewReader(msg)),</span><br><span class="line">ContentLength: <span class="type">int64</span>(<span class="built_in">len</span>(msg)),</span><br><span class="line">&#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line"><span class="keyword">if</span> oldModifyResponse != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err := oldModifyResponse(response); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> h.Responder != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// if an optional error interceptor/responder was provided wire it</span></span><br><span class="line"><span class="comment">// the custom responder might be used for providing a unified error reporting</span></span><br><span class="line"><span class="comment">// or supporting retry mechanisms by not sending non-fatal errors to the clients</span></span><br><span class="line">proxy.ErrorHandler = h.Responder.Error</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 转发代理请求</span></span><br><span class="line">proxy.ServeHTTP(w, newReq)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码中首先就先调用 <code>tryUpgrade()</code> 尝试进行升级。对于不升级的请求，复制一个 Request 对象，使用 Golang 的内置代理工具 <code>httputil.NewSingleHostReverseProxy</code> 代理。</p><h3 id="请求升级"><a href="#请求升级" class="headerlink" title="请求升级"></a>请求升级</h3><p>所谓请求升级指的是协议升级机制，是 HTTP/1.1 提供的特殊机制，允许将一个已建立的连接升级成新的、不相容的协议。</p><article class="message is-warning">        <div class="message-header"><p>注意点</p></div>        <div class="message-body">            <p>HTTP/2 明确禁止使用此机制；这个机制只属于 HTTP/1.1。</p>        </div>    </article><p>在实践中，这种机制主要用于引导 WebSocket 连接，在 Kubernetes 中还用于引导 SPDY/3.1 协议。</p><p>包含 Upgrade 的典型请求类似于：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">GET /index.html HTTP/1.1</span><br><span class="line">Host: www.example.com</span><br><span class="line">Connection: upgrade</span><br><span class="line">Upgrade: example/1, foo/2</span><br></pre></td></tr></table></figure><p>Kubernetes 中判断一个请求是 Upgrade 请求使用以下方法：</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/apimachinery/pkg/util/httpstream/httpstream.go:99</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">IsUpgradeRequest</span><span class="params">(req *http.Request)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line"><span class="keyword">for</span> _, h := <span class="keyword">range</span> req.Header[http.CanonicalHeaderKey(HeaderConnection)] &#123;</span><br><span class="line"><span class="keyword">if</span> strings.Contains(strings.ToLower(h), strings.ToLower(HeaderUpgrade)) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>该方法只判断 <code>Connection</code> 头内容是不是 <code>Upgrade</code>。在 <code>UpgradeAwareHandler</code> 中只有该方法返回 <code>true</code> 时才执行升级逻辑：</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:309</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *UpgradeAwareHandler)</span></span> tryUpgrade(w http.ResponseWriter, req *http.Request) <span class="type">bool</span> &#123;</span><br><span class="line"><span class="keyword">if</span> !httpstream.IsUpgradeRequest(req) &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Request was not an upgrade&quot;</span>)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">clone := utilnet.CloneRequest(req)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">backendConn, err = h.DialForUpgrade(clone)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Proxy connection error: %v&quot;</span>, err)</span><br><span class="line">h.Responder.Error(w, req, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> backendConn.Close()</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果是升级请求就调用 <code>h.DialForUpgrade(clone)</code> 发起连接，<code>DialForUpgrade</code> 方法内还会根据是否配置了 <code>UpgradeTransport</code> 来确定是否将其加入 <code>Transport</code> 处理中：</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:470</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *UpgradeAwareHandler)</span></span> DialForUpgrade(req *http.Request) (net.Conn, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> h.UpgradeTransport == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> dial(req, h.Transport)</span><br><span class="line">&#125;</span><br><span class="line">updatedReq, err := h.UpgradeTransport.WrapRequest(req)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> dial(updatedReq, h.UpgradeTransport)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>UpgradeTransport</code> 是 <code>UpgradeRequestRoundTripper</code> 接口的实例，这类实例用于在升级前对 Request 进行包装。这里和前面构造方法里提到的 <code>Transport</code> 是类似的：</p><ul><li><code>h.UpgradeTransport</code>：只在 Upgrade 请求时使用到，如果为 <code>nil</code> 则使用 <code>h.Transport</code> 代替；</li><li><code>h.Transport</code>：通常传给 <code>httputil.NewSingleHostReverseProxy</code> 在非 Upgrade 请求使用，<code>h.UpgradeTransport</code> 未设置时也给 Upgrade 请求使用。</li></ul><p>通过 <code>dialURL</code> 获取到连接后，调用 <code>Write()</code> 写入当前请求的数据：</p><figure class="highlight go"><figcaption><span>vendor/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:495</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">dial</span><span class="params">(req *http.Request, transport http.RoundTripper)</span></span> (net.Conn, <span class="type">error</span>) &#123;</span><br><span class="line">conn, err := dialURL(req.Context(), req.URL, transport)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;error dialing backend: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> err = req.Write(conn); err != <span class="literal">nil</span> &#123;</span><br><span class="line">conn.Close()</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;error sending request: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> conn, err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>回到 <code>tryUpgrade()</code> 方法，建立连接后，先读取部分 Response 信息用来判断响应代码是否正确：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *UpgradeAwareHandler)</span></span> tryUpgrade(w http.ResponseWriter, req *http.Request) <span class="type">bool</span> &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">backendConn, err = h.DialForUpgrade(clone)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Proxy connection error: %v&quot;</span>, err)</span><br><span class="line">h.Responder.Error(w, req, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> backendConn.Close()</span><br><span class="line"></span><br><span class="line"><span class="comment">// determine the http response code from the backend by reading from rawResponse+backendConn</span></span><br><span class="line">backendHTTPResponse, headerBytes, err := getResponse(io.MultiReader(bytes.NewReader(rawResponse), backendConn))</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Proxy connection error: %v&quot;</span>, err)</span><br><span class="line">h.Responder.Error(w, req, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(headerBytes) &gt; <span class="built_in">len</span>(rawResponse) &#123;</span><br><span class="line"><span class="comment">// we read beyond the bytes stored in rawResponse, update rawResponse to the full set of bytes read from the backend</span></span><br><span class="line">rawResponse = headerBytes</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>getResponse</code> 方法会以 HTTP 的方式读取响应并返回 Response 对象，接着通过状态码判断是否升级。同时，<code>getResponse</code> 方法会读取一份 Response 的原始字节码，用于后面写回客户端通知客户端执行 Upgrade 动作切换协议。</p><p>通常对于处理升级的请求，正常第一个响应的内容只有请求头，内容是告知客户端 Upgrade 的结果，是否需要切换协议，后面的内容使用新协议进行传输。</p><p>如果是升级失败，后面需要将 Response 回写回客户端。所以获取 Response 后，首先对请求结果进行判断处理，如果响应码不是升级且是正常的响应码，就直接返回错误。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:362</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> backendHTTPResponse.StatusCode != http.StatusSwitchingProtocols &amp;&amp; backendHTTPResponse.StatusCode &lt; <span class="number">400</span> &#123;</span><br><span class="line">err := fmt.Errorf(<span class="string">&quot;invalid upgrade response: status code %d&quot;</span>, backendHTTPResponse.StatusCode)</span><br><span class="line">klog.Errorf(<span class="string">&quot;Proxy upgrade error: %v&quot;</span>, err)</span><br><span class="line">h.Responder.Error(w, req, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果响应码是错误，需要将错误的响应写回客户端：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:385</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> backendHTTPResponse.StatusCode != http.StatusSwitchingProtocols &#123;</span><br><span class="line"><span class="comment">// If the backend did not upgrade the request, echo the response from the backend to the client and return, closing the connection.</span></span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Proxy upgrade error, status code %d&quot;</span>, backendHTTPResponse.StatusCode)</span><br><span class="line"><span class="comment">// set read/write deadlines</span></span><br><span class="line">deadline := time.Now().Add(<span class="number">10</span> * time.Second)</span><br><span class="line">backendConn.SetReadDeadline(deadline)</span><br><span class="line">requestHijackedConn.SetWriteDeadline(deadline)</span><br><span class="line"><span class="comment">// write the response to the client</span></span><br><span class="line">err := backendHTTPResponse.Write(requestHijackedConn)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &amp;&amp; !strings.Contains(err.Error(), <span class="string">&quot;use of closed network connection&quot;</span>) &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Error proxying data from backend to client: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// Indicate we handled the request</span></span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>写回客户端是通过对原连接进行劫持实现的，使用 <code>http.Hijacker</code> 对原连接进行劫持后，可以得到原始的 <code>net.Conn</code>。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:371</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">requestHijacker, ok := w.(http.Hijacker)</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Unable to hijack response writer: %T&quot;</span>, w)</span><br><span class="line">h.Responder.Error(w, req, fmt.Errorf(<span class="string">&quot;request connection cannot be hijacked: %T&quot;</span>, w))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line">requestHijackedConn, _, err := requestHijacker.Hijack()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Unable to hijack response: %v&quot;</span>, err)</span><br><span class="line">h.Responder.Error(w, req, fmt.Errorf(<span class="string">&quot;error hijacking connection: %v&quot;</span>, err))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> requestHijackedConn.Close()</span><br></pre></td></tr></table></figure><article class="message is-warning">        <div class="message-header"><p>注意点</p></div>        <div class="message-body">            <p>注意，连接被劫持后，原有的 <code>response</code> 对象就不能使用了。</p>        </div>    </article><p>正常升级的请求，成功劫持连接后，将第一次读取到的响应，写回客户端，通知客户端切换协议：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go:402</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(rawResponse) &gt; <span class="number">0</span> &#123;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Writing %d bytes to hijacked connection&quot;</span>, <span class="built_in">len</span>(rawResponse))</span><br><span class="line"><span class="keyword">if</span> _, err = requestHijackedConn.Write(rawResponse); err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;Error proxying response from backend to client: %v&quot;</span>, err))</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>由于双方的传输协议正常升级后不再是 HTTP 协议，内容也不再需要关心，所以后面的工作就变成双向数据对拷：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line">writerComplete := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line">readerComplete := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line"></span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">var</span> writer io.WriteCloser</span><br><span class="line"><span class="keyword">if</span> h.MaxBytesPerSec &gt; <span class="number">0</span> &#123;</span><br><span class="line">writer = flowrate.NewWriter(backendConn, h.MaxBytesPerSec)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">writer = backendConn</span><br><span class="line">&#125;</span><br><span class="line">_, err := io.Copy(writer, requestHijackedConn)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &amp;&amp; !strings.Contains(err.Error(), <span class="string">&quot;use of closed network connection&quot;</span>) &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Error proxying data from client to backend: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="built_in">close</span>(writerComplete)</span><br><span class="line">&#125;()</span><br><span class="line"></span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">var</span> reader io.ReadCloser</span><br><span class="line"><span class="keyword">if</span> h.MaxBytesPerSec &gt; <span class="number">0</span> &#123;</span><br><span class="line">reader = flowrate.NewReader(backendConn, h.MaxBytesPerSec)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">reader = backendConn</span><br><span class="line">&#125;</span><br><span class="line">_, err := io.Copy(requestHijackedConn, reader)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &amp;&amp; !strings.Contains(err.Error(), <span class="string">&quot;use of closed network connection&quot;</span>) &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Error proxying data from backend to client: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="built_in">close</span>(readerComplete)</span><br><span class="line">&#125;()</span><br><span class="line"></span><br><span class="line"><span class="comment">// Wait for one half the connection to exit. Once it does the defer will</span></span><br><span class="line"><span class="comment">// clean up the other half of the connection.</span></span><br><span class="line"><span class="keyword">select</span> &#123;</span><br><span class="line"><span class="keyword">case</span> &lt;-writerComplete:</span><br><span class="line"><span class="keyword">case</span> &lt;-readerComplete:</span><br><span class="line">&#125;</span><br><span class="line">klog.V(<span class="number">6</span>).Infof(<span class="string">&quot;Disconnecting from backend proxy %s\n  Headers: %v&quot;</span>, &amp;location, clone.Header)</span><br></pre></td></tr></table></figure><p>数据双向对拷用的是 <code>io.Copy</code> 方法，该方法只有在流出现错误只才会返回，所以使用了两个协程来实现。</p><p>当有一个方向的流被关闭时方法返回，<code>requestHijackedConn.Close()</code> 这个 <code>defer</code> 会关闭连接，所有关联的 IO 流都会 EOF 退出。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p><code>UpgradeAwareHandler</code> 作为一个代理转发组件，在 <code>httputil.NewSingleHostReverseProxy</code> 的基础上增加了对可升级请求的转发处理。</p><p>我们都知道，访问 API Server 是需要一些权限的，但代理转发的全程没有任何关于权限相关地处理，却又能将请求正确送达。这得益于 Golang 中的 RoundTripper 的设计模式，能将自定义逻辑插入到请求处理链中，实现不同模块的解耦。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Protocol_upgrade_mechanism">协议升级机制</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;&lt;code&gt;UpgradeAwareHandler&lt;/code&gt; 是 Kubernetes 里很重要的一个代码组件，在 Kubernetes 中用于代理和转发请求。&lt;/p&gt;
&lt;p&gt;只要是有转发请求的地方都可以见到他的身影：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;kubectl 的命令 exec/attach/log/port-forward 等需要连接到容器的长连接；&lt;/li&gt;
&lt;li&gt;APIServer Aggregation 功能，需要将请求转发到外部 APIServer。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;第三方的集群网关组件也会利用这个组件来实现转发代理，如：Karmada、KubeVela Cluster Gateway 等。&lt;/p&gt;
&lt;p&gt;为什么都使用这个组件来转发请求？本文通过阅读源码，深入研究这个组件的实现原理以及使用方式。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes 集群的大脑 Controller Manager</title>
    <link href="https://blog.imoe.tech/2023/10/11/kube-controller-manager-the-brain-of-cluster/"/>
    <id>https://blog.imoe.tech/2023/10/11/kube-controller-manager-the-brain-of-cluster/</id>
    <published>2023-10-11T15:25:44.000Z</published>
    <updated>2023-10-18T16:09:41.738Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p>Kubernetes 是一个声明式的系统。我们在使用 Kubernetes 管理应用、部署服务时，通常会使用一个 YAML 格式的文件去描述期望应用部署后的最终状态。</p><p>当这个文件被提交到 Kubernetes 后，我们神奇地发现 Kubernetes 在不停地创建各种资源，直到达到我们所描述的状态。实现这个功能的组件就是我们今天讨论的 kube-controller-manager，Kubernetes 集群的大脑。</p><p>我们平时所见到的 Kubernetes 集群中的节点（<code>Node</code>）、<code>Pod</code>、服务（<code>Service</code>）、端点（<code>Endpoint</code>）、命名空间（<code>Namespace</code>）、服务账户（<code>ServiceAccount</code>）、资源定额（<code>ResourceQuota</code>） 等资源都是由 kube-controller-manager 管理的。</p><span id="more"></span><p>kube-controller-manager 为了管理这些资源，内置了一些控制器来实现，如 <code>DeploymentControllers</code> 控制器、<code>StatefulSet</code> 控制器、<code>Namespace</code> 控制器及 <code>PersistentVolume</code> 控制器等这些控制器都分别对应着对应各自的资源 <code>Deployment</code>、<code>StatefulSet</code>、<code>Namespace</code> 和 <code>PersistentVolume</code> 等。</p><p>这些 Controller 使用 <a href="/2023/02/15/kubernetes-informer-mechanism/">Informer 机制</a>，从 kube-apiserver 实时监控着某些特定的资源对象，获取它们当前的状态，对它们进行对比、修正、收敛，来使这些对象的状态不断靠近、直至达成在它们的声明语义中所期望的目标状态。</p><h2 id="高可用"><a href="#高可用" class="headerlink" title="高可用"></a>高可用</h2><p>kube-controller-manager 在 Kubernetes 集群中以多实例运行实现高可用，但多个实例中只有作为 Leader 的实现会执行 Controller 逻辑。非 Leader 节点作为热备节点，只有当 Leader 节点因为某种原因故障后，非 Leader 节点经过选举成为新 Leader 接替执行 Controller 逻辑。</p><p>kube-controller-manager 选举的方式就使用了《<a href="/2023/01/18/principle-of-kubernetes-leaderelection/">Kubernetes 核心组件 Leader 选举机制</a>》所介绍的选举机制。</p><h2 id="启动"><a href="#启动" class="headerlink" title="启动"></a>启动</h2><p>kube-controller-manager 的入口在 <code>cmd/kube-controller-manager/app/controllermanager.go</code> 文件中，是 <code>cobra.Command</code> 类型的入口。</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:104</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewControllerManagerCommand</span><span class="params">()</span></span> *cobra.Command &#123;</span><br><span class="line">s, err := options.NewKubeControllerManagerOptions()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Fatalf(<span class="string">&quot;unable to initialize command options: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">cmd := &amp;cobra.Command&#123;</span><br><span class="line">Use: <span class="string">&quot;kube-controller-manager&quot;</span>,</span><br><span class="line"><span class="comment">// ...略过</span></span><br><span class="line">RunE: <span class="function"><span class="keyword">func</span><span class="params">(cmd *cobra.Command, args []<span class="type">string</span>)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 略过</span></span><br><span class="line"><span class="keyword">return</span> Run(c.Complete(), wait.NeverStop)</span><br><span class="line">&#125;,</span><br><span class="line"><span class="comment">// ...略过</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">fs := cmd.Flags()</span><br><span class="line"><span class="comment">// 命令行参数注册，略过</span></span><br><span class="line"><span class="keyword">return</span> cmd</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这段代码会进入 <code>Run</code> 方法中执行真正的 kube-controller-manager 逻辑，<code>Run</code> 方法中实际执行业务的代码是内部的 <code>run</code> 匿名方法。</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:226</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">run := <span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context, startSATokenController InitFunc, initializersFunc ControllerInitializersFunc)</span></span> &#123;</span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>run</code> 代码后面的部分是进行选主处理的，如果没开启选主功能，直接运行 <code>run</code>。</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:244</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// No leader election, run directly</span></span><br><span class="line"><span class="keyword">if</span> !c.ComponentConfig.Generic.LeaderElection.LeaderElect &#123;</span><br><span class="line">ctx, _ := wait.ContextForChannel(stopCh)</span><br><span class="line">run(ctx, saTokenControllerInitFunc, NewControllerInitializers)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果启用的选主，则会启动协程，使用 client-go 的<a href="/2023/01/18/principle-of-kubernetes-leaderelection/">选主组件</a>进行选主，并在成为主节点后也运行 <code>run</code>，主流程使用 <code>&lt;-stopCh</code> 阻塞：</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:280</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Start the main lock</span></span><br><span class="line"><span class="keyword">go</span> leaderElectAndRun(c, id, electionChecker,</span><br><span class="line">c.ComponentConfig.Generic.LeaderElection.ResourceLock,</span><br><span class="line">c.ComponentConfig.Generic.LeaderElection.ResourceName,</span><br><span class="line">leaderelection.LeaderCallbacks&#123;</span><br><span class="line">OnStartedLeading: <span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context)</span></span> &#123;</span><br><span class="line">initializersFunc := NewControllerInitializers</span><br><span class="line"><span class="keyword">if</span> leaderMigrator != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// If leader migration is enabled, we should start only non-migrated controllers</span></span><br><span class="line"><span class="comment">//  for the main lock.</span></span><br><span class="line">initializersFunc = createInitializersFunc(leaderMigrator.FilterFunc, leadermigration.ControllerNonMigrated)</span><br><span class="line">klog.Info(<span class="string">&quot;leader migration: starting main controllers.&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line">run(ctx, startSATokenController, initializersFunc)</span><br><span class="line">&#125;,</span><br><span class="line">OnStoppedLeading: <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">klog.ErrorS(<span class="literal">nil</span>, <span class="string">&quot;leaderelection lost&quot;</span>)</span><br><span class="line">klog.FlushAndExit(klog.ExitFlushTimeout, <span class="number">1</span>)</span><br><span class="line">&#125;,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line"><span class="comment">//...省略</span></span><br><span class="line">&lt;-stopCh</span><br></pre></td></tr></table></figure><p><code>run</code> 方法实现如下：</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:226</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">run := <span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context, startSATokenController InitFunc, initializersFunc ControllerInitializersFunc)</span></span> &#123;</span><br><span class="line">controllerContext, err := CreateControllerContext(c, rootClientBuilder, clientBuilder, ctx.Done())</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Fatalf(<span class="string">&quot;error building controller context: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 初始化要运行的 Controller</span></span><br><span class="line">controllerInitializers := initializersFunc(controllerContext.LoopMode)</span><br><span class="line"><span class="comment">// 启动所有 Controller</span></span><br><span class="line"><span class="keyword">if</span> err := StartControllers(ctx, controllerContext, startSATokenController, controllerInitializers, unsecuredMux, healthzHandler); err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Fatalf(<span class="string">&quot;error starting controllers: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 启动 informer</span></span><br><span class="line">controllerContext.InformerFactory.Start(stopCh)</span><br><span class="line">controllerContext.ObjectOrMetadataInformerFactory.Start(stopCh)</span><br><span class="line"><span class="built_in">close</span>(controllerContext.InformersStarted)</span><br><span class="line"></span><br><span class="line">&lt;-ctx.Done()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>run</code> 方法使用传入的 <code>initializersFunc</code> 函数获取要启动的 Controller 列表，默认的 <code>initializersFunc</code> 指向的是 <code>NewControllerInitializers</code> 函数，这个函数会构造一个 <code>map[string]InitFunc</code> 类型的 Map，Key 是 Controller 的名字，Value 是 Controller 的启动函数。</p><p><code>StartControllers</code> 方法遍历上面构造的 <code>map[string]InitFunc</code>，启动所有 Controller。</p><article class="message is-info">        <div class="message-header"><p>Informer 注意点</p></div>        <div class="message-body">            <p>Informer 的启动是在 Controller 启动完后才启动的，Controller 会在 Informer 同步完后才会真正的启动。如下是 <code>DeploymentController</code> 启动时，会等待 Informer 同步完才执行 worker 逻辑：</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/deployment_controller.go:157</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> !cache.WaitForNamedCacheSync(<span class="string">&quot;deployment&quot;</span>, ctx.Done(), dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>        </div>    </article><h2 id="启动-Controller"><a href="#启动-Controller" class="headerlink" title="启动 Controller"></a>启动 Controller</h2><p>启动的代码很简单：</p><figure class="highlight go"><figcaption><span>cmd/kube-controller-manager/app/controllermanager.go:567</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">StartControllers</span><span class="params">(ctx context.Context, controllerCtx ControllerContext, startSATokenController InitFunc, controllers <span class="keyword">map</span>[<span class="type">string</span>]InitFunc,</span></span></span><br><span class="line"><span class="params"><span class="function">unsecuredMux *mux.PathRecorderMux, healthzHandler *controllerhealthz.MutableHealthzHandler)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 先启动 SATokenController，因为这个 Controller 会为其它 Controller 构造 Token</span></span><br><span class="line"><span class="comment">// 如果启动失败则中止启动流程</span></span><br><span class="line"><span class="keyword">if</span> startSATokenController != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> _, _, err := startSATokenController(ctx, controllerCtx); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Initialize the cloud provider with a reference to the clientBuilder only after token controller</span></span><br><span class="line"><span class="comment">// has started in case the cloud provider uses the client builder.</span></span><br><span class="line"><span class="keyword">if</span> controllerCtx.Cloud != <span class="literal">nil</span> &#123;</span><br><span class="line">controllerCtx.Cloud.Initialize(controllerCtx.ClientBuilder, ctx.Done())</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> controllerChecks []healthz.HealthChecker</span><br><span class="line"></span><br><span class="line"><span class="comment">// 遍历 Controller 列表</span></span><br><span class="line"><span class="keyword">for</span> controllerName, initFn := <span class="keyword">range</span> controllers &#123;</span><br><span class="line"><span class="keyword">if</span> !controllerCtx.IsControllerEnabled(controllerName) &#123;</span><br><span class="line">klog.Warningf(<span class="string">&quot;%q is disabled&quot;</span>, controllerName)</span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 加点时间间隔</span></span><br><span class="line">time.Sleep(wait.Jitter(controllerCtx.ComponentConfig.Generic.ControllerStartInterval.Duration, ControllerStartJitter))</span><br><span class="line"></span><br><span class="line"><span class="comment">// 启动 Controller </span></span><br><span class="line">klog.V(<span class="number">1</span>).Infof(<span class="string">&quot;Starting %q&quot;</span>, controllerName)</span><br><span class="line">ctrl, started, err := initFn(ctx, controllerCtx)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Error starting %q&quot;</span>, controllerName)</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> !started &#123;</span><br><span class="line">klog.Warningf(<span class="string">&quot;Skipping %q&quot;</span>, controllerName)</span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ...省略配置 Check 代码</span></span><br><span class="line">controllerChecks = <span class="built_in">append</span>(controllerChecks, check)</span><br><span class="line"></span><br><span class="line">klog.Infof(<span class="string">&quot;Started %q&quot;</span>, controllerName)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">healthzHandler.AddHealthChecker(controllerChecks...)</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在启动 Controller 前，先启动的 SATokenController（实际上是 <code>TokensController</code>，用来管理 ServiceAccount 的 ServiceAccountToken），因为这个 Controller 会为其它 Controller 构造 Token，如果启动失败则中止启动流程，因为其它 Controller 取不到 Token 没必要启动了。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>kube-controller-manager 最主要的逻辑还是管理众多 Kubernetes 资源的 Controller，作为引子本文只讨论了启动相关的内容。感兴趣可以在 <code>cmd/kube-controller-manager/app/controllermanager.go:428</code> 的 <code>NewControllerInitializers()</code> 方法查看 kube-controller-manager 管理的 Controller 列表。</p><p>后面的文章会基于最经典的 <code>DeploymentController</code> 源码，看看 Deployment 是怎么实现的。</p>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;Kubernetes 是一个声明式的系统。我们在使用 Kubernetes 管理应用、部署服务时，通常会使用一个 YAML 格式的文件去描述期望应用部署后的最终状态。&lt;/p&gt;
&lt;p&gt;当这个文件被提交到 Kubernetes 后，我们神奇地发现 Kubernetes 在不停地创建各种资源，直到达到我们所描述的状态。实现这个功能的组件就是我们今天讨论的 kube-controller-manager，Kubernetes 集群的大脑。&lt;/p&gt;
&lt;p&gt;我们平时所见到的 Kubernetes 集群中的节点（&lt;code&gt;Node&lt;/code&gt;）、&lt;code&gt;Pod&lt;/code&gt;、服务（&lt;code&gt;Service&lt;/code&gt;）、端点（&lt;code&gt;Endpoint&lt;/code&gt;）、命名空间（&lt;code&gt;Namespace&lt;/code&gt;）、服务账户（&lt;code&gt;ServiceAccount&lt;/code&gt;）、资源定额（&lt;code&gt;ResourceQuota&lt;/code&gt;） 等资源都是由 kube-controller-manager 管理的。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>删除 PVE 中已失效的存储</title>
    <link href="https://blog.imoe.tech/2023/09/25/delete-storage-removed-in-pve/"/>
    <id>https://blog.imoe.tech/2023/09/25/delete-storage-removed-in-pve/</id>
    <published>2023-09-25T07:14:09.000Z</published>
    <updated>2023-10-12T07:09:07.535Z</updated>
    
    <content type="html"><![CDATA[<p>当把硬盘从 PVE 主机直接拆除而没提前从 PVE 面板中删除，会在存储面板显示错误图标，这时手动删除存储会直接报 <code>not a valid block device</code> 的错。</p><p><img width="600" alt="删除存储报错" src="https://images.imoe.tech/blog/Np3PoE.jpg" /></p><p>很多人觉得 PVE 复杂的原因主要就是因为 PVE 有很多操作需要用命令去完成，就比如上面这个删除失效存储的情况，可以简单地提示个强制删除就能解决问题，而 PVE 却产生了一个不明所以的错误提示。</p><p>其实此时要想正常完成删除动作，只需要执行几条命令就行，但这个错误提示让人没有头绪，我还是在官方论坛找到的解决方法。</p><span id="more"></span><h2 id="PVE-Mount"><a href="#PVE-Mount" class="headerlink" title="PVE Mount"></a>PVE Mount</h2><p>在 PVE 中，支持了四种存储类型：LVM、LVM Thin、Directory 和 ZFS。我最常用的还是 Directory，虽然基于文件的存储会带来一些性能的损失，但真的简单实用。Directory 类型是万能的，所有东西都可以在 Directory 中创建。</p><p>Directory 使用的 systemd 管理的挂载点，配置文件都存储在 <code>/etc/systemd/system</code> 目录， 我们打开一个配置文件看看：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">root@svc:/etc/systemd/system<span class="comment"># cat mnt-pve-storage.mount </span></span><br><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br><span class="line"></span><br><span class="line">[Mount]</span><br><span class="line">Options=defaults</span><br><span class="line">Type=xfs</span><br><span class="line">What=/dev/disk/by-uuid/0c389bee-ddd9-4010-800f-99dfdb6104d7</span><br><span class="line">Where=/mnt/pve/storage</span><br><span class="line"></span><br><span class="line">[Unit]</span><br><span class="line">Description=Mount storage <span class="string">&#x27;storage&#x27;</span> under /mnt/pve</span><br></pre></td></tr></table></figure><p>真的非常简单，以后有挂载需要可以参考这种方式，不用再搞 <code>fstab</code> 那一套了。</p><h2 id="删除存储"><a href="#删除存储" class="headerlink" title="删除存储"></a>删除存储</h2><p>既然知道是使用 systemd 进行管理，那是不是可以使用 <code>systemctl</code> 来进行操作？没错，是这样的。</p><p>我们使用 <code>systemctl</code> 停用挂载并禁用自动启动后，就可以删除掉配置文件了，命令如下：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">systemctl stop mnt-pve-&lt;name&gt;.mount</span><br><span class="line">systemctl <span class="built_in">disable</span> mnt-pve-&lt;name&gt;.mount</span><br><span class="line"><span class="built_in">rm</span> /etc/systemd/system/mnt-pve-&lt;name&gt;.mount</span><br></pre></td></tr></table></figure><p>执行完后，Directory 列表就不存在个这个目录了。</p><p><img width="600" alt="执行命令后的 Directory 列表" src="https://images.imoe.tech/blog/24rdHE.jpg" /></p><p>此时如果左边的存储列表还存在该存储，只要去 Storage 页面删除掉对应的存储就行。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://forum.proxmox.com/threads/how-i-can-remove-directory-entry-from-gui.50006/page-2">How i can remove directory-entry from gui?</a></li><li><a href="https://www.jinbuguo.com/systemd/systemd.mount.html">systemd.mount 中文手册</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;当把硬盘从 PVE 主机直接拆除而没提前从 PVE 面板中删除，会在存储面板显示错误图标，这时手动删除存储会直接报 &lt;code&gt;not a valid block device&lt;/code&gt; 的错。&lt;/p&gt;
&lt;p&gt;
&lt;img width=&quot;600&quot; alt=&quot;删除存储报错&quot; src=&quot;https://images.imoe.tech/blog/Np3PoE.jpg&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;很多人觉得 PVE 复杂的原因主要就是因为 PVE 有很多操作需要用命令去完成，就比如上面这个删除失效存储的情况，可以简单地提示个强制删除就能解决问题，而 PVE 却产生了一个不明所以的错误提示。&lt;/p&gt;
&lt;p&gt;其实此时要想正常完成删除动作，只需要执行几条命令就行，但这个错误提示让人没有头绪，我还是在官方论坛找到的解决方法。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="Linux" scheme="https://blog.imoe.tech/categories/Tech/Linux/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Linux" scheme="https://blog.imoe.tech/tags/Linux/"/>
    
    <category term="PVE" scheme="https://blog.imoe.tech/tags/PVE/"/>
    
  </entry>
  
  <entry>
    <title>联芸 1602 主控的国产固态在 PVE 中的识别问题</title>
    <link href="https://blog.imoe.tech/2023/09/04/build-pve-kernel-to-fix-nvme-duplicated-ids-and-device-not-ready/"/>
    <id>https://blog.imoe.tech/2023/09/04/build-pve-kernel-to-fix-nvme-duplicated-ids-and-device-not-ready/</id>
    <published>2023-09-04T04:02:20.000Z</published>
    <updated>2023-09-07T07:59:26.277Z</updated>
    
    <content type="html"><![CDATA[<p>家中 HomeLab 的主力是一台自组的 AMD 机器（以下简称 PRD 主机），使用 PVE 作为虚拟化系统，在系统中装黑群和 PCDN 的虚机，同时使用 PVE 中的 lxc 容器来装服务。</p><p>作为主力机器，我给机器配了致态的 TiPlus 7100 做系统盘，并且用 TiPlus 5000 做虚拟机系统数据盘。</p><p>最近 618 国产长江存储颗粒的 NVMe 的固态价格实在是非常便宜，而且个个都是 PCIe4.0 的满速盘。从认识 SSD 以来就没见过这么香的价格，所以我也在活动期间买了几块。主要是 2TB 的爱国者 P7000z、2 TB 的梵想 S790 和 4TB 的 HP FX900 Plus。</p><p>到货后都在 Windows 中使用 CDI 进行了检测，都是全新盘，没什么问题。接着将硬盘插到 PRD 主机中启动后发生了很诡异的事，P7000z 在 PVE 中无法识别出来。</p><span id="more"></span><h2 id="诡异的硬盘冲突"><a href="#诡异的硬盘冲突" class="headerlink" title="诡异的硬盘冲突"></a>诡异的硬盘冲突</h2><p>当 P7000z 无法识别后，我立即就开始怀疑是不是 P7000z 掉盘了。毕竟前一天我才看了小飞机修的 P7000z 掉盘视频，而且爱国者和梵想都算是杂牌了，只是没想到这么快就掉盘。</p><p>把 P7000z 拿出来再次插到 Windows 中发现硬盘其实一切正常，好得很。我又开始怀疑这个新买的 UPQI PCIex16 转 nvme 4 盘位的转接板是不是不能兼容主板的 PCIe 拆分。</p><p>PRD 主机使用的是华硕的 TUF B550m Plus 主板，支持对第一个 PCIe 插糟进行拆分，虽然 BIOS 的拆分选项只有 8x8 和 PCIe RAID Mode。</p><p><img src="https://images.imoe.tech/blog/hL5Swq.png" alt="B550m Plus 主板 PCIe 拆分说明"></p><p>但我从 NGA 的贴里找到有人说 PCIe RAID Mode 就是 4x4x4x4 拆分。那理论上应该没问题，也许是转接板的问题？或者是插糟接触不良？</p><p>带着这些疑问，我不停地拨插和交换 NVMe 的顺序，发现只要梵想 s790 和 P7000z 这两个硬盘同时插到 PRD 主机，P7000z 就无法识别。我还是第一次见到，还有两个硬盘互相冲突的情况存在。</p><h3 id="冲突的原因"><a href="#冲突的原因" class="headerlink" title="冲突的原因"></a>冲突的原因</h3><p>带着这个问题进行一轮检索后，在 chiphell 论坛发现了这个问题的<a href="https://www.chiphell.com/thread-2524660-1-1.html">原因</a>。</p><p>总的来说，出现这个问题的原因是这些硬盘都使用了相同的联芸 MAP1602 主控且 VID/DID 相同，导致 Linux 内核只识别一个。致态的 TiPlus 7100 也是使用的这个同样的主控，但是并没有问题。说明这些杂牌就直接使用的公版固件，连 ID 都不改。</p><p>Windows 也会报硬盘 ID 重复的错误，只是没阻止正常加载。所以解决办法就是忽略 ID 重复的错误，Linux 内核代码里提供了相应的处理方法。CHH 贴子里提供了一个内核 patch 可以解决这个问题：</p><figure class="highlight diff"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="comment">--- a/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="comment">+++ b/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="meta">@@ -3424,6 +3424,8 @@</span> static const struct pci_device_id nvme_id_table[] = &#123;</span><br><span class="line">                 .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span><br><span class="line">         &#123; PCI_DEVICE(0x1e4B, 0x1202),   /* MAXIO MAP1202 */</span><br><span class="line">                 .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span><br><span class="line"><span class="addition">+        &#123; PCI_DEVICE(0x1e4B, 0x1602),   /* MAXIO MAP1602 */</span></span><br><span class="line"><span class="addition">+                .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span></span><br><span class="line">         &#123; PCI_DEVICE(0x1cc1, 0x5350),   /* ADATA XPG GAMMIX S50 */</span><br><span class="line">                 .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span><br><span class="line">         &#123; PCI_DEVICE(0x1dbe, 0x5236),   /* ADATA XPG GAMMIX S70 */</span><br></pre></td></tr></table></figure><p>这个 <a href="https://www.spinics.net/lists/kernel/msg4860395.html">Patch</a> 已经提交到 Linux 内核，6.4 版本的内核会包括该 Patch。</p><p>不过 PVE 8 的内核版本 6.2.16-5 以后就应用了解决冲突的 Patch，直接升级到 PVE 的最新版本就可以解决这个问题。如果低于这个版本又不想升级，可以去 CHH 下载楼主编译好的包。</p><p>如果想自行编译可以参考本文后面章节，提供了编译的详细步骤。</p><h3 id="PVE-8-的-Patch"><a href="#PVE-8-的-Patch" class="headerlink" title="PVE 8 的 Patch"></a>PVE 8 的 Patch</h3><p>前面说了 PVE 8 新内核修复了硬盘重复 ID 无法识别的问题，我们可以看一下官方是怎样解决的：</p><figure class="highlight diff"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line">Fixes: 2079f41ec6ff (&quot;nvme: check that EUI/GUID/UUID are globally unique&quot;)</span><br><span class="line"><span class="comment">---</span></span><br><span class="line"> drivers/nvme/host/core.c | 36 +++++++++++++++++++++++++++++++++---</span><br><span class="line"> 1 file changed, 33 insertions(+), 3 deletions(-)</span><br><span class="line"></span><br><span class="line"><span class="comment">diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">index d567762545b0..f350df252d27 100644</span></span><br><span class="line"><span class="comment">--- a/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">+++ b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="meta">@@ -4162,10 +4162,40 @@</span> static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)</span><br><span class="line"> </span><br><span class="line"> ret = nvme_global_check_duplicate_ids(ctrl-&gt;subsys, &amp;info-&gt;ids);</span><br><span class="line"> if (ret) &#123;</span><br><span class="line"><span class="deletion">-dev_err(ctrl-&gt;device,</span></span><br><span class="line"><span class="deletion">-&quot;globally duplicate IDs for nsid %d\n&quot;, info-&gt;nsid);</span></span><br><span class="line"><span class="addition">+/*</span></span><br><span class="line"><span class="addition">+ * We&#x27;ve found two different namespaces on two different</span></span><br><span class="line"><span class="addition">+ * subsystems that report the same ID.  This is pretty nasty</span></span><br><span class="line"><span class="addition">+ * for anything that actually requires unique device</span></span><br><span class="line"><span class="addition">+ * identification.  In the kernel we need this for multipathing,</span></span><br><span class="line"><span class="addition">+ * and in user space the /dev/disk/by-id/ links rely on it.</span></span><br><span class="line"><span class="addition">+ *</span></span><br><span class="line"><span class="addition">+ * If the device also claims to be multi-path capable back off</span></span><br><span class="line"><span class="addition">+ * here now and refuse the probe the second device as this is a</span></span><br><span class="line"><span class="addition">+ * recipe for data corruption.  If not this is probably a</span></span><br><span class="line"><span class="addition">+ * cheap consumer device if on the PCIe bus, so let the user</span></span><br><span class="line"><span class="addition">+ * proceed and use the shiny toy, but warn that with changing</span></span><br><span class="line"><span class="addition">+ * probing order (which due to our async probing could just be</span></span><br><span class="line"><span class="addition">+ * device taking longer to startup) the other device could show</span></span><br><span class="line"><span class="addition">+ * up at any time.</span></span><br><span class="line"><span class="addition">+ */</span></span><br><span class="line"> nvme_print_device_info(ctrl);</span><br><span class="line"><span class="deletion">-return ret;</span></span><br><span class="line"><span class="addition">+if ((ns-&gt;ctrl-&gt;ops-&gt;flags &amp; NVME_F_FABRICS) || /* !PCIe */</span></span><br><span class="line"><span class="addition">+    ((ns-&gt;ctrl-&gt;subsys-&gt;cmic &amp; NVME_CTRL_CMIC_MULTI_CTRL) &amp;&amp;</span></span><br><span class="line"><span class="addition">+     info-&gt;is_shared)) &#123;</span></span><br><span class="line"><span class="addition">+dev_err(ctrl-&gt;device,</span></span><br><span class="line"><span class="addition">+&quot;ignoring nsid %d because of duplicate IDs\n&quot;,</span></span><br><span class="line"><span class="addition">+info-&gt;nsid);</span></span><br><span class="line"><span class="addition">+return ret;</span></span><br><span class="line"><span class="addition">+&#125;</span></span><br><span class="line"><span class="addition">+</span></span><br><span class="line"><span class="addition">+dev_err(ctrl-&gt;device,</span></span><br><span class="line"><span class="addition">+&quot;clearing duplicate IDs for nsid %d\n&quot;, info-&gt;nsid);</span></span><br><span class="line"><span class="addition">+dev_err(ctrl-&gt;device,</span></span><br><span class="line"><span class="addition">+&quot;use of /dev/disk/by-id/ may cause data corruption\n&quot;);</span></span><br><span class="line"><span class="addition">+memset(&amp;info-&gt;ids.nguid, 0, sizeof(info-&gt;ids.nguid));</span></span><br><span class="line"><span class="addition">+memset(&amp;info-&gt;ids.uuid, 0, sizeof(info-&gt;ids.uuid));</span></span><br><span class="line"><span class="addition">+memset(&amp;info-&gt;ids.eui64, 0, sizeof(info-&gt;ids.eui64));</span></span><br><span class="line"><span class="addition">+ctrl-&gt;quirks |= NVME_QUIRK_BOGUS_NID;</span></span><br><span class="line"> &#125;</span><br><span class="line"> </span><br><span class="line"> mutex_lock(&amp;ctrl-&gt;subsys-&gt;lock);</span><br></pre></td></tr></table></figure><p>PVE 8 的 patch 对 ID 重复问题进行了通用处理。</p><h2 id="硬盘再次消失"><a href="#硬盘再次消失" class="headerlink" title="硬盘再次消失"></a>硬盘再次消失</h2><p>升级了 PVE 版本后，硬盘冲突没再出现了。期待已久 的 HP FX900 Plus 4TB 也终于收到了，按习惯上机、检测、跑分后，装到 PRD 主机中启动，没问题，正常识别。</p><p>在某次重启后发现硬盘消失了！而且多次重启都没再出现。检查日志发现，报了这个错：</p><figure class="highlight bash"><figcaption><span>HP FX900 Plus 4TB 识别报错</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvme nvme2: Device not ready; aborting initialisation, CSTS=0x0</span><br></pre></td></tr></table></figure><p>使用 PCIe 命令是可以看到正常识别的：</p><figure class="highlight bash"><figcaption><span>查看 PCIe 是否识别硬盘</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ lspci -vv | grep MAP1602</span><br><span class="line">pcilib: sysfs_read_vpd: <span class="built_in">read</span> failed: No such device</span><br><span class="line">08:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 (rev 01) (prog-if 02 [NVM Express])</span><br><span class="line">        Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602</span><br><span class="line">09:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 (rev 01) (prog-if 02 [NVM Express])</span><br><span class="line">        Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602</span><br><span class="line">0a:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 (rev 01) (prog-if 02 [NVM Express])</span><br><span class="line">        Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602</span><br></pre></td></tr></table></figure><p>爬文后，有<a href="https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_suspend_support">文档</a>建议增加内核启动参数：<code>iommu=soft</code>，试了一下没有任何效果。</p><p>最后又回到 CHH 的贴子，发现挺多人遇到同样的问题，都是 4TB 的才会有这个问题。可能是固件有 BUG，初始化的超时时间是 0，导致内核等待 0 秒超时，直接中断了初始化。</p><p>针对这个情况，用以下的 Patch，把超时时间乘 2 也能解决：</p><figure class="highlight diff"><figcaption><span>增加 NVME 超时时间</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">index d567762545b0..1ca3d78da5b9 100644</span></span><br><span class="line"><span class="comment">--- a/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">+++ b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="meta">@@ -2407,6 +2407,7 @@</span> int nvme_enable_ctrl(struct nvme_ctrl *ctrl)</span><br><span class="line"> &#125; else &#123;</span><br><span class="line"> timeout = NVME_CAP_TIMEOUT(ctrl-&gt;cap);</span><br><span class="line"> &#125;</span><br><span class="line"><span class="addition">+dev_info(ctrl-&gt;device, &quot;[PATCH] nvme core got timeout %u\n&quot;,timeout);</span></span><br><span class="line"></span><br><span class="line"> ctrl-&gt;ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) &lt;&lt; NVME_CC_MPS_SHIFT;</span><br><span class="line"> ctrl-&gt;ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;</span><br><span class="line"><span class="meta">@@ -2424,8 +2425,9 @@</span> int nvme_enable_ctrl(struct nvme_ctrl *ctrl)</span><br><span class="line"> ret = ctrl-&gt;ops-&gt;reg_write32(ctrl, NVME_REG_CC, ctrl-&gt;ctrl_config);</span><br><span class="line"> if (ret)</span><br><span class="line"> return ret;</span><br><span class="line"><span class="addition">+dev_info(ctrl-&gt;device, &quot;[PATCH] nvme_wait_ready now wait for %u, previously %u\n&quot;,(timeout + 1) * 2, (timeout + 1)/2);</span></span><br><span class="line"> return nvme_wait_ready(ctrl, NVME_CSTS_RDY, NVME_CSTS_RDY,</span><br><span class="line"><span class="deletion">-       (timeout + 1) / 2, &quot;initialisation&quot;);</span></span><br><span class="line"><span class="addition">+       (timeout + 1) * 2, &quot;initialisation&quot;);</span></span><br><span class="line"> &#125;</span><br><span class="line"> EXPORT_SYMBOL_GPL(nvme_enable_ctrl);</span><br><span class="line"> </span><br></pre></td></tr></table></figure><p>目前我应用这个 Patch 后可以正常识别硬盘了，日志输出如下：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">[    1.303728] nvme nvme2: [PATCH] nvme core got <span class="built_in">timeout</span> 0</span><br><span class="line">[    1.303732] nvme nvme2: [PATCH] nvme_wait_ready now <span class="built_in">wait</span> <span class="keyword">for</span> 2, previously 0</span><br></pre></td></tr></table></figure><p>CHH 的楼主也针对这个问题，提交了一个 Patch 给 Linux，不过不是使用 Timeout 增加的方法来解决的，而是在 <code>pci.c</code> 里配置 <code>.driver_data</code>，增加 <code>NVME_QUIRK_DELAY_BEFORE_CHK_RDY</code> 的方式。</p><figure class="highlight diff"><figcaption><span>CHH 的修复方法</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">Problem and fix are verified with my MAP1602 controller SSD device.</span><br><span class="line"></span><br><span class="line">Signed-off-by: Han Gao &lt;highenthalpyh@xxxxxxxxx&gt;</span><br><span class="line">Signed-off-by: David Xu &lt;xuwd1@xxxxxxxxxxx&gt;</span><br><span class="line"><span class="comment">---</span></span><br><span class="line">  drivers/nvme/host/pci.c | 3 ++-</span><br><span class="line">  1 file changed, 2 insertions(+), 1 deletion(-)</span><br><span class="line"></span><br><span class="line"><span class="comment">diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="comment">index 492f319ebdf3..f75c27730bde 100644</span></span><br><span class="line"><span class="comment">--- a/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="comment">+++ b/drivers/nvme/host/pci.c</span></span><br><span class="line"><span class="meta">@@ -3425,7 +3425,8 @@</span> static const struct pci_device_id nvme_id_table[] = &#123;</span><br><span class="line">  &#123; PCI_DEVICE(0x1e4B, 0x1202),   /* MAXIO MAP1202 */</span><br><span class="line">  .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span><br><span class="line">  &#123; PCI_DEVICE(0x1e4B, 0x1602),   /* MAXIO MAP1602 */</span><br><span class="line"><span class="deletion">-.driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span></span><br><span class="line"><span class="addition">+.driver_data = NVME_QUIRK_BOGUS_NID |</span></span><br><span class="line"><span class="addition">+NVME_QUIRK_DELAY_BEFORE_CHK_RDY, &#125;,</span></span><br><span class="line">  &#123; PCI_DEVICE(0x1cc1, 0x5350),   /* ADATA XPG GAMMIX S50 */</span><br><span class="line">  .driver_data = NVME_QUIRK_BOGUS_NID, &#125;,</span><br><span class="line">  &#123; PCI_DEVICE(0x1dbe, 0x5236),   /* ADATA XPG GAMMIX S70 */</span><br></pre></td></tr></table></figure><p>不过，在内核的 <a href="https://www.spinics.net/lists/kernel/msg4903065.html">MailList</a> 讨论中，有人认为这种方式并不能解决问题，因为 <code>NVME_QUIRK_DELAY_BEFORE_CHK_RDY</code> 影响的代码并不是 timeout 发生的代码。</p><h2 id="编译内核"><a href="#编译内核" class="headerlink" title="编译内核"></a>编译内核</h2><p>由于对应的 Patch 还在讨论，而且合并到 Kernel 后 PVE 最终能用上还需要一些时间。CHH 楼主编译的 PVE 内核版本（6.2.16-4）比我更新后的小，内核降级感觉不太合适，所以还得自己应用一下 Patch 进行编译。</p><h3 id="环境准备"><a href="#环境准备" class="headerlink" title="环境准备"></a>环境准备</h3><p>为了防止对 PRD 主机造成影响，我使用 Ubuntu 23.04 的 Template 创建了一个 LXC 容器，准备在里面进行内核构建。</p><p>我们先装一下构建需要的依赖：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">wget -qO - http://download.proxmox.com/debian/proxmox-ve-release-6.x.gpg | apt-key add</span><br><span class="line"><span class="built_in">echo</span> <span class="string">&quot;deb http://mirrors.ustc.edu.cn/proxmox/debian buster pve-no-subscription &quot;</span> | <span class="built_in">tee</span> /etc/apt/sources.list.d/buster-pvetest.list</span><br><span class="line">apt update &amp;&amp; apt install libpve-common-perl</span><br><span class="line">apt install git make dpkg-dev dh-python dh-make python3-sphinx lintian asciidoc-base bison dwarves flex libdw-dev libelf-dev libiberty-dev libnuma-dev libslang2-dev libssl-dev lz4 xmlto zlib1g-dev bc zstd </span><br></pre></td></tr></table></figure><p>注意，<code>libpve-common-perl</code> 在 PVE 的软件源中才有，所以需要先安装 PVE 的软件源，这里使用 ustc 源。装完依赖后，就可以拉代码开始构建了。</p><p>构建 PVE 的内核只需要拉取 pve-kernel 就行，pve-kernel 通过 git submodule 的方式依赖 ubuntu 的 Kernel，构建的时候会自动拉取。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> git://git.proxmox.com/git/pve-kernel.git</span><br></pre></td></tr></table></figure><article class="message is-info">        <div class="message-header"><p>注意</p></div>        <div class="message-body">            <p>ubuntu kernel 有 1GB 多，所以建议把 <code>git.proxmox.com</code> 加入代理加速，不然下半天断了还得重下。</p>        </div>    </article><h3 id="增加-Patch"><a href="#增加-Patch" class="headerlink" title="增加 Patch"></a>增加 Patch</h3><p>在 pve-kernel 目录的 <code>patches/kernel</code> 子目录中增加文件 <code>0013-nvme-multiple-timeout-nvme_wait_ready.patch</code>，内容如下：</p><figure class="highlight diff"><figcaption><span>patches/kernel/0013-nvme-multiple-timeout-nvme_wait_ready.patch</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">index d567762545b0..1ca3d78da5b9 100644</span></span><br><span class="line"><span class="comment">--- a/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="comment">+++ b/drivers/nvme/host/core.c</span></span><br><span class="line"><span class="meta">@@ -2407,6 +2407,7 @@</span> int nvme_enable_ctrl(struct nvme_ctrl *ctrl)</span><br><span class="line"> &#125; else &#123;</span><br><span class="line"> timeout = NVME_CAP_TIMEOUT(ctrl-&gt;cap);</span><br><span class="line"> &#125;</span><br><span class="line"><span class="addition">+dev_info(ctrl-&gt;device, &quot;[PATCH] nvme core got timeout %u\n&quot;,timeout);</span></span><br><span class="line"></span><br><span class="line"> ctrl-&gt;ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) &lt;&lt; NVME_CC_MPS_SHIFT;</span><br><span class="line"> ctrl-&gt;ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;</span><br><span class="line"><span class="meta">@@ -2424,8 +2425,9 @@</span> int nvme_enable_ctrl(struct nvme_ctrl *ctrl)</span><br><span class="line"> ret = ctrl-&gt;ops-&gt;reg_write32(ctrl, NVME_REG_CC, ctrl-&gt;ctrl_config);</span><br><span class="line"> if (ret)</span><br><span class="line"> return ret;</span><br><span class="line"><span class="addition">+dev_info(ctrl-&gt;device, &quot;[PATCH] nvme_wait_ready now wait for %u, previously %u\n&quot;,(timeout + 1) * 2, (timeout + 1)/2);</span></span><br><span class="line"> return nvme_wait_ready(ctrl, NVME_CSTS_RDY, NVME_CSTS_RDY,</span><br><span class="line"><span class="deletion">-       (timeout + 1) / 2, &quot;initialisation&quot;);</span></span><br><span class="line"><span class="addition">+       (timeout + 1) * 2, &quot;initialisation&quot;);</span></span><br><span class="line"> &#125;</span><br><span class="line"> EXPORT_SYMBOL_GPL(nvme_enable_ctrl);</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>这里我们只需要增加一个超时的 Patch 既可，重复 ID 的在官方的 <code>0010-nvme-don-t-reject-probe-due-to-duplicate-IDs-for-sin.patch</code> 中已经包含。</p><h3 id="开始构建"><a href="#开始构建" class="headerlink" title="开始构建"></a>开始构建</h3><p>增加完 Patch 后，切换到 pve-kernel 根目录就可以执行构建了：</p><figure class="highlight bash"><figcaption><span>开始构建</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ make</span><br></pre></td></tr></table></figure><p>大概等半个小时，构建完后会在 pve-kernel 目录生成几个 deb 文件：</p><figure class="highlight bash"><figcaption><span>查看构建结果</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">root@builder:~/pve/pve-kernel<span class="comment"># ls *.deb</span></span><br><span class="line">linux-tools-6.2_6.2.16-11_amd64.deb                proxmox-headers-6.2_6.2.16-11_all.deb             proxmox-kernel-6.2_6.2.16-11_all.deb</span><br><span class="line">proxmox-headers-6.2.16-11-pve_6.2.16-11_amd64.deb  proxmox-kernel-6.2.16-11-pve_6.2.16-11_amd64.deb  proxmox-kernel-libc-dev_6.2.16-11_amd64.deb</span><br></pre></td></tr></table></figure><p>我们把文件复制到 PVE 主机下，使用以下命令就可以安装新内核：</p><figure class="highlight shell"><figcaption><span>安装内核</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">dpkg -i *.deb</span><br></pre></td></tr></table></figure><p>重启后就生效。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>便宜是有代价的。买之前没想到会因为硬盘遇到两个 Linux 内核问题，而且这也不算内核的 BUG，只是这些杂牌没有固件开发能力，直接使用公版固件导致的问题。硬盘 ID 改都不改，直接就用，就像 PVE patch 里备注说的很 nasty。</p><p>致态使用的相同的主控，这些奇怪问题就一个没有。最后，建议直接买致态，便宜是有代价的。</p><h2 id="下载"><a href="#下载" class="headerlink" title="下载"></a>下载</h2><p>如果不想自己编译，我这里汇总了编译好的文件，直接安装既可。可以通过 <code>uname -srm</code> 命令检查自己当前内核版本。</p><article class="message is-info">        <div class="message-header"><p>文件</p></div>        <div class="message-body">            <table><thead><tr><th>PVE 版本</th><th>内核版本</th><th>路径</th><th>提取码</th><th>来源</th></tr></thead><tbody><tr><td>PVE 8.0</td><td>6.2.16-11</td><td><a href="https://www.123pan.com/s/9hzA-45krA.html">点此下载</a>  <font size="1">[Wonchong 赞助]</font> <br><a href="https://cloud.imoe.tech:4/d/s/v7KQowF6dpypHlC0hBNGAi3jmaRNH73F/ma3oh0M0hvP7_TA7ZB1hDZtiAlYt2eyh-QbcATE4Cuwo">点此下载</a></td><td><code>iHaa</code></td><td>本站构建</td></tr><tr><td>PVE 8.0 beta</td><td>6.2.16-1</td><td><a href="https://pan.baidu.com/s/1qdF8AVyjUOX_gL8Fxe9g6A?pwd=CHH1">点此下载</a></td><td><code>CHH1</code></td><td>CHH</td></tr><tr><td>PVE 8.0</td><td>6.2.16-3</td><td><a href="https://pan.baidu.com/s/1juvlOR6uA7G53eiCJTAzGw?pwd=CHH2">点此下载</a></td><td><code>CHH2</code></td><td>CHH</td></tr><tr><td>PVE 8.0</td><td>6.2.16-4</td><td><a href="https://pan.baidu.com/s/1xHShA5LBaVl2uWH4cIUHGg?pwd=CHH3">点此下载</a></td><td><code>CHH3</code></td><td>CHH</td></tr></tbody></table>        </div>    </article><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://www.chiphell.com/thread-2524660-1-1.html">联芸MAP1602主控的可以入了，掉坑里刚爬出来，P7000Z晚班车拿了四块，附内核</a></li><li><a href="https://www.spinics.net/lists/kernel/msg4860395.html">nvme-pci: add NVME_QUIRK_DELAY_BEFORE_CHK_RDY for MAXIO MAP1602</a></li><li><a href="https://www.spinics.net/lists/kernel/msg4903065.html">Re: nvme-pci: add NVME_QUIRK_DELAY_BEFORE_CHK_RDY for MAXIO MAP1602</a></li><li><a href="https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Controller_failure_due_to_broken_suspend_support">Solid state drive/NVMe: Controller failure due to broken suspend support</a></li><li><a href="https://dlcdnets.asus.com.cn/pub/ASUS/mb/SocketAM4/TUF_GAMING_B550M-PLUS/C16519_TUF_GAMING_B550M-PLUS_UM_12P_WEB.pdf">TUF B550m PLUS Manual</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;家中 HomeLab 的主力是一台自组的 AMD 机器（以下简称 PRD 主机），使用 PVE 作为虚拟化系统，在系统中装黑群和 PCDN 的虚机，同时使用 PVE 中的 lxc 容器来装服务。&lt;/p&gt;
&lt;p&gt;作为主力机器，我给机器配了致态的 TiPlus 7100 做系统盘，并且用 TiPlus 5000 做虚拟机系统数据盘。&lt;/p&gt;
&lt;p&gt;最近 618 国产长江存储颗粒的 NVMe 的固态价格实在是非常便宜，而且个个都是 PCIe4.0 的满速盘。从认识 SSD 以来就没见过这么香的价格，所以我也在活动期间买了几块。主要是 2TB 的爱国者 P7000z、2 TB 的梵想 S790 和 4TB 的 HP FX900 Plus。&lt;/p&gt;
&lt;p&gt;到货后都在 Windows 中使用 CDI 进行了检测，都是全新盘，没什么问题。接着将硬盘插到 PRD 主机中启动后发生了很诡异的事，P7000z 在 PVE 中无法识别出来。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="Linux" scheme="https://blog.imoe.tech/categories/Tech/Linux/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Linux" scheme="https://blog.imoe.tech/tags/Linux/"/>
    
    <category term="PVE" scheme="https://blog.imoe.tech/tags/PVE/"/>
    
  </entry>
  
  <entry>
    <title>无法连接到 LXC 容器里的 docker 网络</title>
    <link href="https://blog.imoe.tech/2023/08/15/how-to-route-to-lxc-docker-network/"/>
    <id>https://blog.imoe.tech/2023/08/15/how-to-route-to-lxc-docker-network/</id>
    <published>2023-08-15T15:07:09.000Z</published>
    <updated>2023-09-03T04:00:35.914Z</updated>
    
    <content type="html"><![CDATA[<p>继去年 HomeLab PRD 节点全面使用 lxc 容器运行应用后，最近兴趣使然，又开始折腾主机的网络。主要折腾的内容是 vlan、SR-IOV 和万兆，等整理完再把折腾笔记发出来，本文主要解决折腾过程中遇到的一个问题。</p><p>为方便管理容器，我为每个 lxc 里的容器（docker in lxc）创建一个 Network，分配独立的网段。在主路由中将该子网的请求路由到该 lxc 上，就可以实现通过子网 IP 直接访问容器。</p><p>配置好后发现，<strong>在 lxc 之外根本 ping 不通容器</strong>，请求始终到不了容器。</p><span id="more"></span><h2 id="现象"><a href="#现象" class="headerlink" title="现象"></a>现象</h2><p>相同的配置方式在群晖的 Docker 是可以如期使用的，而且在 lxc 节点又能正常 Ping 通容器。</p><p>使用 <code>tcpdump</code> 抓包 icmp，观察网络情况。 下面的 <code>10.0.3.11</code> 是容器 IP，<code>10.0.3.1</code> 是 lxc 的 IP（Docker Network 的网关 IP）， <code>10.0.1.26</code> 是节点外电脑的 IP：</p><figure class="highlight bash"><figcaption><span>tcpdump 抓包</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">$ tcpdump -i eth0 host 10.0.3.11</span><br><span class="line">tcpdump: verbose output suppressed, use -v[v]... <span class="keyword">for</span> full protocol decode</span><br><span class="line">listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes</span><br><span class="line">23:25:25.432364 IP 10.0.1.26 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 3197, <span class="built_in">seq</span> 38, length 64</span><br><span class="line">23:25:26.434915 IP 10.0.1.26 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 3197, <span class="built_in">seq</span> 39, length 64</span><br><span class="line">23:25:27.440254 IP 10.0.1.26 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 3197, <span class="built_in">seq</span> 40, length 64</span><br><span class="line">23:25:28.441107 IP 10.0.1.26 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 3197, <span class="built_in">seq</span> 41, length 64</span><br><span class="line"></span><br><span class="line">$ tcpdump -i vethef07fa1 host 10.0.3.11</span><br><span class="line">tcpdump: verbose output suppressed, use -v[v]... <span class="keyword">for</span> full protocol decode</span><br><span class="line">listening on vethef07fa1, link-type EN10MB (Ethernet), snapshot length 262144 bytes</span><br><span class="line">23:40:26.510201 IP 10.0.3.1 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 2245, <span class="built_in">seq</span> 58, length 64</span><br><span class="line">23:40:26.510216 IP 10.0.3.11 &gt; 10.0.3.1: ICMP <span class="built_in">echo</span> reply, <span class="built_in">id</span> 2245, <span class="built_in">seq</span> 58, length 64</span><br><span class="line">23:40:27.534200 IP 10.0.3.1 &gt; 10.0.3.11: ICMP <span class="built_in">echo</span> request, <span class="built_in">id</span> 2245, <span class="built_in">seq</span> 59, length 64</span><br><span class="line">23:40:27.534211 IP 10.0.3.11 &gt; 10.0.3.1: ICMP <span class="built_in">echo</span> reply, <span class="built_in">id</span> 2245, <span class="built_in">seq</span> 59, length 64</span><br></pre></td></tr></table></figure><p>可以发现：</p><ul><li>lxc 的接口能接受到 icmp 的 request 包，无响应回包；</li><li>lxc 内 docker 容器的网络接口无数据包；</li><li>只有从 lxc 容器外才不能访问 lxc 里的 docker 容器。</li></ul><h2 id="原因"><a href="#原因" class="headerlink" title="原因"></a>原因</h2><p>从上面的测试可以基本确定，问题出在 lxc 转发数据上。我使用 lxc 的容器是基于 Ubuntu 23.04 Template 创建而来，怀疑是不是默认没打开 ipv4 的转发功能：</p><figure class="highlight bash"><figcaption><span>ipv4 转发</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ sysctl net.ipv4.ip_forward</span><br><span class="line">1</span><br><span class="line">$ <span class="built_in">cat</span> /proc/sys/net/ipv4/ip_forward</span><br><span class="line">1</span><br></pre></td></tr></table></figure><p>参数都表示打开的，在对群晖和 lxc 的路由和防火墙规则的详细对比，发现问题在防火墙规则上。</p><figure class="highlight bash"><figcaption><span>lxc iptables 规则</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ iptables --list-rules</span><br><span class="line">-P INPUT ACCEPT</span><br><span class="line">-P FORWARD DROP</span><br><span class="line">-P OUTPUT ACCEPT</span><br><span class="line">-N DOCKER</span><br><span class="line"><span class="comment"># ...</span></span><br></pre></td></tr></table></figure><p>可以发现，这里有一个 <code>-P FORWARD DROP</code> 的默认规则，把转发流量全部丢弃掉了。</p><h2 id="解决"><a href="#解决" class="headerlink" title="解决"></a>解决</h2><p>找到了问题的根源，解决也就很简单了，只需要用 <code>iptables</code> 命令设置开启就可以。</p><figure class="highlight bash"><figcaption><span>修改规则</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ /sbin/iptables -P FORWARD ACCEPT</span><br></pre></td></tr></table></figure><p>这样的修改是临时性的，要想将这个规则持久化，可以使用 <code>iptables-persistent</code> 或者自己写启动脚本。</p><article class="message is-warning">        <div class="message-header"><p>注意</p></div>        <div class="message-body">            <p><code>iptables-persistent</code> 的启动顺序比 <code>docker.service</code> 早，会导致 <code>iptables</code> 规则被 docker 默认的覆盖，需要手动调整顺序。</p>        </div>    </article><p>我这里使用的启动脚本方式解决。首先创建一个 shell 脚本（编辑完记得用 <code>chmod</code> 授权可执行权限）：</p><figure class="highlight bash"><figcaption><span>/etc/iptables/accept-forward-iptables.sh</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/bin/sh</span></span><br><span class="line">/sbin/iptables -P FORWARD ACCEPT</span><br></pre></td></tr></table></figure><blockquote><p><code>/etc/iptables/</code> 目录可能不存在，报错的话先创建好。</p></blockquote><p>然后创建一个 <code>service</code> 启动配置：</p><figure class="highlight bash"><figcaption><span>/etc/systemd/system/iptables-persistent.service</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">[Unit]</span><br><span class="line">Description=runs iptables restore on boot</span><br><span class="line">ConditionFileIsExecutable=/etc/iptables/accept-forward-iptables.sh</span><br><span class="line">After=network.target docker.service</span><br><span class="line">[Service]</span><br><span class="line">Type=forking</span><br><span class="line">ExecStart=/etc/iptables/accept-forward-iptables.sh</span><br><span class="line">start TimeoutSec=0</span><br><span class="line">RemainAfterExit=<span class="built_in">yes</span></span><br><span class="line">GuessMainPID=no</span><br><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br></pre></td></tr></table></figure><p>这里注意，<code>After</code> 配置里一定要有 <code>docker.service</code>，不然规则会被覆盖。推测默认 <code>DROP</code> 的应该是 Docker 创建的规则，这个没有深究。</p><p>修改完成后，执行以下命令开机启动并生效：</p><figure class="highlight bash"><figcaption><span>开机启动并生效</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">systemctl daemon-reload</span><br><span class="line">systemctl <span class="built_in">enable</span> iptables-persistent.service</span><br><span class="line">systemctl start iptables-persistent.service</span><br></pre></td></tr></table></figure><p>配置完后，应该就可以在内网的其它机器成功 Ping 通 <code>10.0.3.11</code> 主机了。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;继去年 HomeLab PRD 节点全面使用 lxc 容器运行应用后，最近兴趣使然，又开始折腾主机的网络。主要折腾的内容是 vlan、SR-IOV 和万兆，等整理完再把折腾笔记发出来，本文主要解决折腾过程中遇到的一个问题。&lt;/p&gt;
&lt;p&gt;为方便管理容器，我为每个 lxc 里的容器（docker in lxc）创建一个 Network，分配独立的网段。在主路由中将该子网的请求路由到该 lxc 上，就可以实现通过子网 IP 直接访问容器。&lt;/p&gt;
&lt;p&gt;配置好后发现，&lt;strong&gt;在 lxc 之外根本 ping 不通容器&lt;/strong&gt;，请求始终到不了容器。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Docker" scheme="https://blog.imoe.tech/tags/Docker/"/>
    
    <category term="HomeLab" scheme="https://blog.imoe.tech/tags/HomeLab/"/>
    
    <category term="PVE" scheme="https://blog.imoe.tech/tags/PVE/"/>
    
  </entry>
  
  <entry>
    <title>Java 里锁的简单使用回顾</title>
    <link href="https://blog.imoe.tech/2023/08/06/simple-preview-locks-in-java/"/>
    <id>https://blog.imoe.tech/2023/08/06/simple-preview-locks-in-java/</id>
    <published>2023-08-06T03:51:09.000Z</published>
    <updated>2023-08-21T07:19:14.086Z</updated>
    
    <content type="html"><![CDATA[<p>当代码中在多个线程中访问一个数据时，该数据就需要进行保护，保证在查询和修改时不会因为其它线程的操作而产生不可预料的异常。</p><p>下面简单总结了 Java 多线程开发中几个不同场景下的线程安全类和锁的使用样例。</p><span id="more"></span><h2 id="JUC-工具包"><a href="#JUC-工具包" class="headerlink" title="JUC 工具包"></a>JUC 工具包</h2><p>对于临界资源访问，最先考虑使用的是 java.util.concurrent 包下的工具类，如 CountDownLatch、ConcurrentMap 和 BlockingQueue 等，还有一众原子类 AtomicInteger、AtomicBoolean 和 AtomicLong 等。</p><p>这些类的实现是线程安全的，所以可以很放心地在多线程环境中使用。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">private</span> <span class="keyword">final</span> ConcurrentMap&lt;String,CachedResource&gt; resourceCache = <span class="keyword">new</span> <span class="title class_">ConcurrentHashMap</span>&lt;&gt;();</span><br></pre></td></tr></table></figure><p>注意：虽说这些类是线程安全的，但指的是操作这些类的时候是线程安全的，并不是说取出来的数据是线程安全的。</p><p>取出来的数据在操作的时候和 ConcurrentMap 并无关系，所以当线程在修改这些数据时，其它线程可能也在修改同样的一个对象，此时可能就会有冲突。</p><p>这个时候就需要锁的帮忙。</p><h2 id="Java-中的锁"><a href="#Java-中的锁" class="headerlink" title="Java 中的锁"></a>Java 中的锁</h2><p>Java 中的锁常分成两类，synchronized 锁和 JUC 包中的锁，这两种锁的实现方式不太相同，具体细节这里不深究，感兴趣的同学可以自行深入。总得来说，通常认为 synchronized 锁比 JUC 锁更重，JUC 锁更灵活，可以根据需要选择。</p><h3 id="synchronized-锁"><a href="#synchronized-锁" class="headerlink" title="synchronized 锁"></a>synchronized 锁</h3><p>使用 synchronized 锁时，不要将这个关键字放在类方法上，这样加锁的范围太大，容易导致其它线程等待过久。</p><p>下面是错误的用法：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">synchronized</span> ClusterClient <span class="title function_">getClient</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>正确应该是</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">synchronized</span> ClusterClient <span class="title function_">getClient</span><span class="params">()</span> &#123;</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">synchronized</span> (ClusterClientBuilder.class) &#123;</span><br><span class="line">        <span class="comment">// ...</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="JUC-工具包中的锁"><a href="#JUC-工具包中的锁" class="headerlink" title="JUC 工具包中的锁"></a>JUC 工具包中的锁</h3><p><code>java.util.concurrent.locks</code> 包下提供了一些非常常用的工具，如： <code>ReentrantLock</code> 和 <code>ReentrantReadWriteLock</code> 等。</p><p>使用 ReentrantLock 这些锁时，要特别注意的是解锁时间，最好是把解锁操作放到 finally 代码块中，防止临界区出现异常而导致跳过解锁而死锁。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">reentrantLock.lock();</span><br><span class="line"><span class="keyword">try</span> &#123;</span><br><span class="line">    <span class="comment">// do sth</span></span><br><span class="line">&#125; <span class="keyword">finally</span> &#123;</span><br><span class="line">    reentrantLock.unlock();</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="分布式锁"><a href="#分布式锁" class="headerlink" title="分布式锁"></a>分布式锁</h2><p>当涉及到的数据不是内存中的数据，而是某个公共服务或数据库中的数据，那么单纯的内存锁就不能满足需求了。</p><p>常见的分布式锁主要有：数据库锁、Redis 锁和 Zookeeper 锁（etcd 锁）。</p><p>这几种锁的可靠程度、性能和实现难度都是不一样的。具体选用哪种实现应该取决于业务于数据一致性的要求。</p><ol><li>如果对数据一致性要求非常高，优先选择 zk/etcd 方式实现的锁；</li><li>对于一般的一致性要求，可以选用 Redis 锁实现；</li><li>一般不建议使用数据库锁实现。</li></ol><p>要使用分布式锁时，还是建议使用开源的成熟方案，而不是自己根据原理造轮子。自己造的轮子要满足重入、独占、等待、超时等功能并且代码健壮可靠不是件容易的事，很容易踩坑。</p><h3 id="ZK"><a href="#ZK" class="headerlink" title="ZK"></a>ZK</h3><p>ZK 锁的原理是基于 ZK 中的<strong>临时顺序节点</strong>实现，这里简单看一下如何使用 Curator 框架来使用锁。</p><p>首先创建一个 Curator client，用于和 ZK 通信：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">RetryPolicy</span> <span class="variable">policy</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">ExponentialBackoffRetry</span>(<span class="number">3000</span>, <span class="number">3</span>);</span><br><span class="line"></span><br><span class="line"><span class="type">CuratorFramework</span> <span class="variable">client</span> <span class="operator">=</span> CuratorFrameworkFactory.builder()</span><br><span class="line">                    .connectString(connectString)</span><br><span class="line">                    .connectionTimeoutMs(connectionTimeout)</span><br><span class="line">                    .sessionTimeoutMs(sessionTimeout)</span><br><span class="line">                    .retryPolicy(policy).build();</span><br><span class="line">client.start();</span><br></pre></td></tr></table></figure><p>使用的时候只需要创建一个 <code>InterProcessMutex</code> 实例就可以使用分布式锁，调用 <code>acquire()</code> 方法可以获取锁，<code>release()</code> 方法可以释放锁：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">InterProcessLock</span> <span class="variable">lock</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">InterProcessMutex</span>(getCuratorFramework(), rootNode);</span><br><span class="line"></span><br><span class="line">lock.acquire();</span><br><span class="line">lock.release();</span><br></pre></td></tr></table></figure><h3 id="Redis"><a href="#Redis" class="headerlink" title="Redis"></a>Redis</h3><p>Redis 锁常用的组件是 redisson，该组件封装了众多接口，使用起来很方便。下面是官方文档的示例：</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">RLock</span> <span class="variable">lock</span> <span class="operator">=</span> redisson.getLock(<span class="string">&quot;myLock&quot;</span>);</span><br><span class="line"></span><br><span class="line"><span class="comment">// 传统的取锁方法</span></span><br><span class="line">lock.lock();</span><br><span class="line"></span><br><span class="line"><span class="comment">// 等待锁，取锁后 10s 自动解锁</span></span><br><span class="line">lock.lock(<span class="number">10</span>, TimeUnit.SECONDS);</span><br><span class="line"></span><br><span class="line"><span class="comment">// 等待 100s 获取锁，10s 后自动解锁</span></span><br><span class="line"><span class="type">boolean</span> <span class="variable">res</span> <span class="operator">=</span> lock.tryLock(<span class="number">100</span>, <span class="number">10</span>, TimeUnit.SECONDS);</span><br><span class="line"><span class="keyword">if</span> (res) &#123;</span><br><span class="line">   <span class="keyword">try</span> &#123;</span><br><span class="line">     ...</span><br><span class="line">   &#125; <span class="keyword">finally</span> &#123;</span><br><span class="line">       lock.unlock();</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="数据库锁"><a href="#数据库锁" class="headerlink" title="数据库锁"></a>数据库锁</h3><p>对于数据库锁，基本是没有成熟的开源组件可以用。大多数都是根据业务场景来定制的，不过大致还可以分为两种类型：</p><ul><li>乐观锁</li><li>悲观锁</li></ul><h4 id="数据库乐观锁"><a href="#数据库乐观锁" class="headerlink" title="数据库乐观锁"></a>数据库乐观锁</h4><p>乐观锁是假定操作时无其它人同时操作，直接将操作提交到数据库处理，如果处理出现冲突就说明有争用。</p><p>乐观锁也可以通过在变更数据时指定数据版本号实现，如：</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">update</span> data <span class="keyword">set</span> field<span class="operator">=</span>value1 <span class="keyword">where</span> id<span class="operator">=</span><span class="number">111</span></span><br></pre></td></tr></table></figure><p>这是一种 CAS 变更数据的方式，如果更新失败说明有人在更新这个数据，此时需要执行重试逻辑重新更新。</p><h4 id="数据库悲观锁"><a href="#数据库悲观锁" class="headerlink" title="数据库悲观锁"></a>数据库悲观锁</h4><p>悲观锁是假定每次操作都有人在抢锁，所以在进行数据操作前先进行加锁操作。</p><p>如使用 <code>insert</code> 语句进行加锁操作可以这样实现：</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">insert</span> <span class="keyword">into</span> methodLock(method_name, <span class="keyword">desc</span>) <span class="keyword">values</span>(<span class="string">&#x27;method_name&#x27;</span>, <span class="string">&#x27;desc&#x27;</span>)</span><br></pre></td></tr></table></figure><p>由于 <code>method_name</code> 字段有唯一约束，多个请求同时提交到数据库的话，数据库会保证只有一个操作可以成功。解锁时把记录删除既可。</p><p>这种方式的加锁，数据库层面不会存在阻塞，只有应用层面存在因为加锁失败（创建记录失败）而采取的重试和等待。这种方式有死锁风险，需要额外的逻辑保证锁的解除。</p><p>也可以在查询的数据后加 <code>for update</code> 对关联行进行加锁。</p><p>注意，<code>for update</code> 这里是数据库层面的加锁，对应数据行的其它加锁操作都会被阻塞，直到当前事务结束。</p><p>这种方式算是比较常见且容易实现的，在一些跑批处理的系统，会简单地使用 <code>for update</code> 对满足条件的记录进行加锁，然后执行业务处理，更新状态。</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> <span class="keyword">order</span> <span class="keyword">where</span> status<span class="operator">=</span><span class="string">&#x27;02&#x27;</span> <span class="keyword">for</span> <span class="keyword">update</span></span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>本文简单回顾了多线程中常见的线程安全类和几种加锁方式，同时介绍了三种常用分布式锁。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;当代码中在多个线程中访问一个数据时，该数据就需要进行保护，保证在查询和修改时不会因为其它线程的操作而产生不可预料的异常。&lt;/p&gt;
&lt;p&gt;下面简单总结了 Java 多线程开发中几个不同场景下的线程安全类和锁的使用样例。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="工程技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%B7%A5%E7%A8%8B%E6%8A%80%E6%9C%AF/"/>
    
    <category term="编程语言" scheme="https://blog.imoe.tech/categories/Tech/%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80/"/>
    
    
    <category term="并发" scheme="https://blog.imoe.tech/tags/%E5%B9%B6%E5%8F%91/"/>
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Java" scheme="https://blog.imoe.tech/tags/Java/"/>
    
    <category term="知识回顾" scheme="https://blog.imoe.tech/tags/%E7%9F%A5%E8%AF%86%E5%9B%9E%E9%A1%BE/"/>
    
  </entry>
  
  <entry>
    <title>用户伪装功能在 KubeVela 中的应用</title>
    <link href="https://blog.imoe.tech/2023/07/18/kubevela-user-impersonate/"/>
    <id>https://blog.imoe.tech/2023/07/18/kubevela-user-impersonate/</id>
    <published>2023-07-18T04:19:01.000Z</published>
    <updated>2023-08-16T04:33:41.795Z</updated>
    
    <content type="html"><![CDATA[<p>KubeVela 中使用用户伪装功能的主要有两个模块：KubeVela Controller 和 KubeVela API Server。</p><ul><li>KubeVela Controller：实现了 KubeVela 的主要逻辑；</li><li>KubeVela API Server：提供 API 接口给 VelaUX。</li></ul><p>在 KubeVela 核心组件里有两个和用户伪装相关的功能：应用认证和 <code>ServiceAccount</code> 伪装。VelaUX 由于自身带了一套用户权限相关的功能，当开启用户伪装后，会注入登录的用户信息作为伪装用户。</p><span id="more"></span><h2 id="KubeVela-Controller"><a href="#KubeVela-Controller" class="headerlink" title="KubeVela Controller"></a>KubeVela Controller</h2><h3 id="应用认证"><a href="#应用认证" class="headerlink" title="应用认证"></a>应用认证</h3><p>KubeVela 提供了一个叫做应用认证（<code>AuthenticateApplication</code>）的特性开关（<code>featuregate</code>），当该功能开启时 KubeVela Controller 会使用最后创建或修改的身份权限作为创建资源的用户身份。这个功能可以防止用户通过创建包含自身权限之外的应用来提权。</p><p><img src="https://images.imoe.tech/blog/kXjDiK.jpg" alt="应用认证"></p><p>应用认证有两个阶段：应用绑定身份和伪装身份。</p><h4 id="应用绑定身份"><a href="#应用绑定身份" class="headerlink" title="应用绑定身份"></a>应用绑定身份</h4><p>当用户创建应用请求（例如创建一个新应用或修改现有应用） 请求将首先由 KubeVela 中的 Application <code>MutatingAdmissionWebhook</code> 处理。Webhook 将从请求中提取用户信息并记录到应用的注解中。</p><p>应用身份绑定的逻辑由 <code>application.MutatingHandler.handleIdentity()</code> 方法处理：</p><figure class="highlight go"><figcaption><span>pkg/webhook/core.oam.dev/v1alpha2/application/mutating_handler.go:52</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *MutatingHandler)</span></span> handleIdentity(ctx context.Context, req admission.Request, _ *v1beta1.Application, app *v1beta1.Application) (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// 检查是否启用应用认证</span></span><br><span class="line"><span class="keyword">if</span> !utilfeature.DefaultMutableFeatureGate.Enabled(features.AuthenticateApplication) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> slices.Contains(h.skipUsers, req.UserInfo.Username) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> metav1.HasAnnotation(app.ObjectMeta, oam.AnnotationApplicationServiceAccountName) &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, errors.New(<span class="string">&quot;service-account annotation is not permitted when authentication enabled&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line">klog.Infof(<span class="string">&quot;[ApplicationMutatingHandler] Setting UserInfo into Application, UserInfo: %v, Application: %s/%s&quot;</span>, req.UserInfo, app.GetNamespace(), app.GetName())</span><br><span class="line"><span class="comment">// 保存用户信息到应用注解中</span></span><br><span class="line">auth.SetUserInfoInAnnotation(&amp;app.ObjectMeta, req.UserInfo)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span>, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果开启了应用认证，则不能使用 ServiceAccount 伪装指定应用运行的 SA，这两个功能互斥。</p><p>应用认证功能的 Webhook 其实是将当前 APIServer 的用户信息保存到 Application 以下注解中：</p><ul><li><code>app.oam.dev/username</code></li><li><code>app.oam.dev/group</code></li></ul><h4 id="伪装身份"><a href="#伪装身份" class="headerlink" title="伪装身份"></a>伪装身份</h4><p>上面注入的伪装注解在 <code>GetUserInfoInAnnotation()</code> 函数中提取出来：</p><figure class="highlight go"><figcaption><span>pkg/auth/userinfo.go:83</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">GetUserInfoInAnnotation</span><span class="params">(obj *metav1.ObjectMeta)</span></span> user.Info &#123;</span><br><span class="line">annotations := obj.GetAnnotations()</span><br><span class="line"><span class="keyword">if</span> annotations == <span class="literal">nil</span> &#123;</span><br><span class="line">annotations = <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">name := annotations[oam.AnnotationApplicationUsername]</span><br><span class="line"><span class="keyword">if</span> serviceAccountName := annotations[oam.AnnotationApplicationServiceAccountName]; serviceAccountName != <span class="string">&quot;&quot;</span> &amp;&amp; name == <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line">name = fmt.Sprintf(<span class="string">&quot;system:serviceaccount:%s:%s&quot;</span>, obj.GetNamespace(), serviceAccountName)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> name == <span class="string">&quot;&quot;</span> &amp;&amp; utilfeature.DefaultMutableFeatureGate.Enabled(features.AuthenticateApplication) &#123;</span><br><span class="line">name = AuthenticationDefaultUser</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> &amp;user.DefaultInfo&#123;</span><br><span class="line">Name: name,</span><br><span class="line">Groups: slices.Filter(</span><br><span class="line">[]<span class="type">string</span>&#123;&#125;,</span><br><span class="line">strings.Split(annotations[oam.AnnotationApplicationGroup], groupSeparator),</span><br><span class="line"><span class="function"><span class="keyword">func</span><span class="params">(s <span class="type">string</span>)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="built_in">len</span>(strings.TrimSpace(s)) &gt; <span class="number">0</span></span><br><span class="line">&#125;),</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看出，提取时会同时处理应用认证和 ServiceAccount 伪装两个方式注入的注解。提取后交由外层两个方法使用，写入提供的 Context 中：</p><figure class="highlight go"><figcaption><span>pkg/auth/userinfo.go:45</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// ContextWithUserInfo inject username &amp; group from app annotations into context</span></span><br><span class="line"><span class="comment">// If serviceAccount is set and username is empty, identity will user the serviceAccount</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ContextWithUserInfo</span><span class="params">(ctx context.Context, app *v1beta1.Application)</span></span> context.Context &#123;</span><br><span class="line"><span class="keyword">if</span> app == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> ctx</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> request.WithUser(ctx, GetUserInfoInAnnotation(&amp;app.ObjectMeta))</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// MonitorContextWithUserInfo inject username &amp; group from app annotations into monitor context</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">MonitorContextWithUserInfo</span><span class="params">(ctx monitorContext.Context, app *v1beta1.Application)</span></span> monitorContext.Context &#123;</span><br><span class="line">_ctx := ctx.GetContext()</span><br><span class="line">authCtx := ContextWithUserInfo(_ctx, app)</span><br><span class="line">ctx.SetContext(authCtx)</span><br><span class="line"><span class="keyword">return</span> ctx</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在 KubeVela Controller 的 <code>Reconcile()</code> 方法用于处理 Application 资源对象的事件，在调用 Workflow 执行应用的工作流前注入到 Context 中。</p><figure class="highlight go"><figcaption><span>pkg/controller/core.oam.dev/v1alpha2/application/application_controller.go:110</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Reconciler)</span></span> Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, <span class="type">error</span>) &#123;</span><br><span class="line">ctx, cancel := ctrlrec.NewReconcileContext(ctx)</span><br><span class="line"><span class="keyword">defer</span> cancel()</span><br><span class="line">logCtx := monitorContext.NewTraceContext(ctx, <span class="string">&quot;&quot;</span>).AddTag(<span class="string">&quot;application&quot;</span>, req.String(), <span class="string">&quot;controller&quot;</span>, <span class="string">&quot;application&quot;</span>)</span><br><span class="line">logCtx.Info(<span class="string">&quot;Start reconcile application&quot;</span>)</span><br><span class="line"><span class="keyword">defer</span> logCtx.Commit(<span class="string">&quot;End reconcile application&quot;</span>)</span><br><span class="line">app := <span class="built_in">new</span>(v1beta1.Application)</span><br><span class="line"></span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">executor := executor.New(workflowInstance, r.Client, <span class="literal">nil</span>)</span><br><span class="line">authCtx := logCtx.Fork(<span class="string">&quot;execute application workflow&quot;</span>)</span><br><span class="line"><span class="keyword">defer</span> authCtx.Commit(<span class="string">&quot;finish execute application workflow&quot;</span>)</span><br><span class="line">authCtx = auth.MonitorContextWithUserInfo(authCtx, app)</span><br><span class="line">workflowState, err := executor.ExecuteRunners(authCtx, runners)</span><br><span class="line"></span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在 KubeVela Controller 启动的 <code>run()</code> 函数中，使用 <code>auth.NewImpersonatingRoundTripper</code> 包装了一下请求：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">run</span><span class="params">(ctx context.Context, s *options.CoreOptions)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line"></span><br><span class="line">restConfig := ctrl.GetConfigOrDie()</span><br><span class="line">restConfig.UserAgent = types.KubeVelaName + <span class="string">&quot;/&quot;</span> + version.GitRevision</span><br><span class="line">restConfig.QPS = <span class="type">float32</span>(s.QPS)</span><br><span class="line">restConfig.Burst = s.Burst</span><br><span class="line">restConfig.Wrap(auth.NewImpersonatingRoundTripper)</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>auth.NewImpersonatingRoundTripper</code> 实现了 <code>RoundTripper</code> 接口，当 <code>Client</code> 请求发出时会调用 <code>RoundTrip()</code> 方法对请求进行处理，<code>impersonatingRoundTripper</code> 就是 KubeVela 对发给 Cluster Gateway 伪装请求头进行处理的地方。</p><figure class="highlight go"><figcaption><span>pkg/auth/round_trippers.go:50</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rt *impersonatingRoundTripper)</span></span> RoundTrip(req *http.Request) (*http.Response, <span class="type">error</span>) &#123;</span><br><span class="line">ctx := req.Context()</span><br><span class="line">req = req.Clone(ctx)</span><br><span class="line">userInfo, exists := request.UserFrom(ctx)</span><br><span class="line">klog.V(<span class="number">7</span>).Infof(<span class="string">&quot;impersonation request log. path: %s method: %s user info: %+v&quot;</span>, req.URL.String(), req.Method, userInfo)</span><br><span class="line"><span class="keyword">if</span> exists &amp;&amp; userInfo != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> name := userInfo.GetName(); name != <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line">req.Header.Set(transport.ImpersonateUserHeader, name)</span><br><span class="line"><span class="keyword">for</span> _, group := <span class="keyword">range</span> userInfo.GetGroups() &#123;</span><br><span class="line">req.Header.Add(transport.ImpersonateGroupHeader, group)</span><br><span class="line">&#125;</span><br><span class="line">q := req.URL.Query()</span><br><span class="line">q.Add(impersonateKey, <span class="string">&quot;true&quot;</span>)</span><br><span class="line">req.URL.RawQuery = q.Encode()</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> rt.rt.RoundTrip(req)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码逻辑：</p><ul><li>从 Context 取出用户信息，保存到请求头；</li><li>给 URL Query 增加一个 <code>impersonate=true</code> 参数（给 Cluster Gateway 作判断）</li></ul><p>接下来请求将交给后面的 <code>RoundTripper</code> 处理，最终转发给 Cluster Gateway，下一步逻辑请看前面文章《<a href="https://blog.imoe.tech/2023/06/16/user-impersonation-in-kubernetes/">Kubernetes 中的用户伪装功能</a>》的 <strong>APIServer 伪装处理</strong>和 <strong>Cluster Gateway 伪装处理</strong>的内容。</p><h4 id="交互流程"><a href="#交互流程" class="headerlink" title="交互流程"></a>交互流程</h4><p><img src="https://images.imoe.tech/blog/SItcXf.jpg" alt="交互流程"></p><h3 id="ServiceAccount-伪装"><a href="#ServiceAccount-伪装" class="headerlink" title="ServiceAccount 伪装"></a>ServiceAccount 伪装</h3><p>当应用认证特性关闭时可以使用 ServiceAccount 伪装功能，让应用以指定的 ServiceAccount 运行，只需要配置 <code>app.oam.dev/service-account-name</code> 注解既可。</p><p>使用 ServiceAccount 伪装功能指定的 ServiceAccount 需要在纳管集群事先创建，否则会报错。</p><p>比如以下名为 deployer 的 ServiceAccount 需要进行角色绑定后才能使用：</p><figure class="highlight yaml"><figcaption><span>deployer 授权</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ServiceAccount</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">deployer</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">demo-service</span></span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">rbac.authorization.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Role</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">deployments:admin</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">demo-service-prod</span></span><br><span class="line"><span class="attr">rules:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;apps&quot;</span>]</span><br><span class="line">    <span class="attr">resources:</span> [<span class="string">&quot;deployments&quot;</span>]</span><br><span class="line">    <span class="attr">verbs:</span> [<span class="string">&quot;*&quot;</span>]</span><br><span class="line"><span class="meta">---</span></span><br><span class="line"><span class="attr">apiVersion:</span> <span class="string">rbac.authorization.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">RoleBinding</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">deployments:admin</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">demo-service-prod</span></span><br><span class="line"><span class="attr">roleRef:</span></span><br><span class="line">  <span class="attr">apiGroup:</span> <span class="string">rbac.authorization.k8s.io</span></span><br><span class="line">  <span class="attr">kind:</span> <span class="string">Role</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">deployments:admin</span></span><br><span class="line"><span class="attr">subjects:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">kind:</span> <span class="string">ServiceAccount</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">deployer</span></span><br><span class="line">    <span class="attr">namespace:</span> <span class="string">demo-service</span></span><br></pre></td></tr></table></figure><p>然后在 Application 中使用：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">core.oam.dev/v1beta1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Application</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">multi-env-demo-with-service-account</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">demo-service</span></span><br><span class="line">  <span class="attr">annotations:</span></span><br><span class="line">    <span class="attr">app.oam.dev/service-account-name:</span> <span class="string">deployer</span> <span class="comment"># the name of the ServiceAccount we created</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">components:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">nginx-server</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">webservice</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="attr">image:</span> <span class="string">nginx:1.21</span></span><br><span class="line">        <span class="attr">port:</span> <span class="number">80</span></span><br><span class="line">  <span class="attr">policies:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">env</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">env-binding</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="attr">created:</span> <span class="literal">false</span></span><br><span class="line">        <span class="attr">envs:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">prod</span></span><br><span class="line">            <span class="attr">patch:</span></span><br><span class="line">              <span class="attr">components:</span></span><br><span class="line">                <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">nginx-server</span></span><br><span class="line">                  <span class="attr">type:</span> <span class="string">webservice</span></span><br><span class="line">                  <span class="attr">properties:</span></span><br><span class="line">                    <span class="attr">image:</span> <span class="string">nginx:1.20</span></span><br><span class="line">                    <span class="attr">port:</span> <span class="number">80</span></span><br><span class="line">            <span class="attr">placement:</span></span><br><span class="line">              <span class="attr">namespaceSelector:</span></span><br><span class="line">                <span class="attr">name:</span> <span class="string">demo-service-prod</span></span><br><span class="line">  <span class="attr">workflow:</span></span><br><span class="line">    <span class="attr">steps:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">deploy-prod-server</span></span><br><span class="line">        <span class="attr">type:</span> <span class="string">deploy2env</span></span><br><span class="line">        <span class="attr">properties:</span></span><br><span class="line">          <span class="attr">policy:</span> <span class="string">env</span></span><br><span class="line">          <span class="attr">env:</span> <span class="string">prod</span></span><br></pre></td></tr></table></figure><p>应用认证注入的 <code>app.oam.dev/username</code> 和 <code>app.oam.dev/group</code> 注解其实我们可以手动在 Application 注解指定，可以完成和上述一样的功能。</p><h2 id="KubeVela-API-Server"><a href="#KubeVela-API-Server" class="headerlink" title="KubeVela API Server"></a>KubeVela API Server</h2><p>当 KubeVela 的 API Server 启动时，运行通过以下调用链后，来到 <code>setKubeConfig()</code> 函数：</p><p><code>server.Run() -&gt; run() -&gt; restServer.Run() -&gt; s.buildIoCContainer() -&gt; clients.SetKubeConfig() -&gt; setKubeConfig()</code></p><p>这个函数使用 <code>auth.NewImpersonatingRoundTripper</code> 包装了 <code>RoundTripper</code> 以支持用户伪装的处理，前面 <a href="#%E4%BC%AA%E8%A3%85%E8%BA%AB%E4%BB%BD">伪装身份</a> 已经介绍过这个结构体，这里不再赘述。</p><p>另外 <code>s.buildIoCContainer()</code> 中还使用 <code>NewAuthClient</code> 创建了一个包装 <code>Client</code>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *restServer)</span></span> buildIoCContainer() <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// infrastructure</span></span><br><span class="line"></span><br><span class="line">err := clients.SetKubeConfig(s.cfg)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">kubeConfig, err := clients.GetKubeConfig()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">kubeClient, err := clients.GetKubeClient()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">authClient := utils.NewAuthClient(kubeClient)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>authClient</code> 作用是自动将 VelaUX 当前登录的用户信息提取出来，以支持伪装用户的方式保存到 Context 中：</p><ul><li>使用原生的 <code>request.WithUser()</code> 方法包装 ctx；</li><li><code>auth.impersonatingRoundTripper&#123;&#125;</code> 提取 ctx 保存到请求头。</li></ul><p><code>authClient</code> 是以代理模式方式实现的，比如 <code>Client</code> 的 <code>Get()</code> 方法，调用前调用 <code>ContextWithUserInfo()</code> 处理用户信息：</p><figure class="highlight go"><figcaption><span>pkg/apiserver/utils/auth.go:90</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *authClient)</span></span> Get(ctx context.Context, key client.ObjectKey, obj client.Object) <span class="type">error</span> &#123;</span><br><span class="line">ctx = ContextWithUserInfo(ctx)</span><br><span class="line"><span class="keyword">return</span> c.Client.Get(ctx, key, obj)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>ContextWithUserInfo()</code> 方法在提取用户信息前先检查是否开启 <code>EnableImpersonation</code> 特性，开启了才提取用户信息并转换成 Kubernetes 的用户对象，默认 <code>EnableImpersonation</code> 是关闭的。</p><figure class="highlight go"><figcaption><span>pkg/apiserver/utils/auth.go:48</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ContextWithUserInfo</span><span class="params">(ctx context.Context)</span></span> context.Context &#123;</span><br><span class="line"><span class="comment">// 检查是否开启 API Server 的用户伪装</span></span><br><span class="line"><span class="keyword">if</span> !features.APIServerFeatureGate.Enabled(features.APIServerEnableImpersonation) &#123;</span><br><span class="line"><span class="keyword">return</span> ctx</span><br><span class="line">&#125;</span><br><span class="line">userInfo := &amp;user.DefaultInfo&#123;Name: user.Anonymous&#125;</span><br><span class="line"><span class="keyword">if</span> username, ok := UsernameFrom(ctx); ok &#123;</span><br><span class="line">userInfo.Name = username</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> project, ok := ProjectFrom(ctx); ok &amp;&amp; project != <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line">userInfo.Groups = []<span class="type">string</span>&#123;KubeVelaProjectGroupPrefix + project, auth.KubeVelaClientGroup&#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">userInfo.Groups = []<span class="type">string</span>&#123;UXDefaultGroup&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> userInfo.Name == model.DefaultAdminUserName &amp;&amp; features.APIServerFeatureGate.Enabled(features.APIServerEnableAdminImpersonation) &#123;</span><br><span class="line">userInfo.Groups = []<span class="type">string</span>&#123;UXDefaultGroup&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> request.WithUser(ctx, userInfo)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>KubeVela 作为一个应用交付平台，保证应用运行权限与创建用户的权限一致、防止应用越权是重中之重的功能。KubeVela 结合 Cluster Gateway 很好地利用了 Kubernetes 原生的用户伪装功能实现了这个需求，从更低层防止应用越权，是权限控制上比较好的实践思路。</p><p>借此思路，我也在我们公司的多集群管理系统中使用了用户伪装的功能，解决了权限控制总是浮于表面的问题。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;KubeVela 中使用用户伪装功能的主要有两个模块：KubeVela Controller 和 KubeVela API Server。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;KubeVela Controller：实现了 KubeVela 的主要逻辑；&lt;/li&gt;
&lt;li&gt;KubeVela API Server：提供 API 接口给 VelaUX。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;在 KubeVela 核心组件里有两个和用户伪装相关的功能：应用认证和 &lt;code&gt;ServiceAccount&lt;/code&gt; 伪装。VelaUX 由于自身带了一套用户权限相关的功能，当开启用户伪装后，会注入登录的用户信息作为伪装用户。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="KubeVela" scheme="https://blog.imoe.tech/tags/KubeVela/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes 中的用户伪装功能</title>
    <link href="https://blog.imoe.tech/2023/06/16/user-impersonation-in-kubernetes/"/>
    <id>https://blog.imoe.tech/2023/06/16/user-impersonation-in-kubernetes/</id>
    <published>2023-06-16T07:54:46.000Z</published>
    <updated>2023-07-05T04:15:19.975Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p>用户伪装是 Kubernetes 原生提供的 <a href="https://kubernetes.io/docs/reference/access-authn-authz/authentication/#user-impersonation">User impersonation</a> 功能，这个功能在管理集群时非常有用。</p><p>通常在管理系统中管理集群时，使用的都是集群管理员（cluster-admin）这样的高权限用户。当用户使用系统进行操作集群时，实际操作身份和集群权限并不匹配，这样很容易造成安全问题。</p><p>比如用户实际权限只有 Namespace 的操作，但通过集群管理系统部署 Helm 时，由于管理系统使用的集群管理员用户，如果 Chart 包里创建多个 Namespace 甚至是 ServiceAccount 就会造成越权。</p><p>通常会考虑在管理系统中做权限，但相当于有两套权限，很难保证做得面面具到，如果使用 Kubernetes 的用户伪装功能就可以完美解决这个问题。</p><span id="more"></span><h2 id="原生用户伪装功能"><a href="#原生用户伪装功能" class="headerlink" title="原生用户伪装功能"></a>原生用户伪装功能</h2><p>Kubernetes 原生提供了以下请求头用于支持用户伪装：</p><ul><li><code>Impersonate-User</code>：伪装的用户名；</li><li><code>Impersonate-Group</code>：伪装的组，可以同时传多个表示多个组，依赖 <code>Impersonate-User</code>；</li><li><code>Impersonate-Extra-( extra name )</code>：动态请求头，用于关联用户的 extra 字段，可选项。注意请求头的字段需要符合 HTTP Header 编码的要求，非法字符需要转换成 UTF-8 并进行 Percent-Encoding 编码；</li><li><code>Impersonate-Uid</code>: 伪装用户的 Uid，可选项但依赖 <code>Impersonate-User</code>，对格式无要求但 <code>1.22.0</code> 以后版本才可用。</li></ul><p>使用示例：</p><figure class="highlight txt"><figcaption><span>Impersonate 使用示例</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">Impersonate-User: jane.doe@example.com</span><br><span class="line">Impersonate-Group: developers</span><br><span class="line">Impersonate-Group: admins</span><br><span class="line">Impersonate-Extra-dn: cn=jane,ou=engineers,dc=example,dc=com</span><br><span class="line">Impersonate-Extra-acme.com%2Fproject: some-project</span><br><span class="line">Impersonate-Extra-scopes: view</span><br><span class="line">Impersonate-Extra-scopes: development</span><br><span class="line">Impersonate-Uid: 06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b</span><br></pre></td></tr></table></figure><p><code>kubectl</code> 可以使用 <code>--as</code> 和 <code>--as-group</code> 来配置使用 <code>Impersonate-User</code> 和 <code>Impersonate-Group</code> 请求头：</p><figure class="highlight shell"><figcaption><span>kubectl 使用用户伪装</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kubectl drain mynode --as=superman --as-group=system:masters</span><br></pre></td></tr></table></figure><h3 id="伪装权限授权"><a href="#伪装权限授权" class="headerlink" title="伪装权限授权"></a>伪装权限授权</h3><p>要使用伪装功能需要有 <code>user</code>、<code>group</code> 或 <code>uid</code> 等资源的 <code>impersonate</code> 权限：</p><figure class="highlight yaml"><figcaption><span>伪装 user 和 group</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">rbac.authorization.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ClusterRole</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">impersonator</span></span><br><span class="line"><span class="attr">rules:</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;users&quot;</span>, <span class="string">&quot;groups&quot;</span>, <span class="string">&quot;serviceaccounts&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br></pre></td></tr></table></figure><p><code>extra</code> 字段和 <code>uid</code> 都是在 <code>authentication.k8s.io</code> 这个 APIGroup 下的，而且 <code>extra</code> 字段是 <code>userextras</code> 资源下的子资源。</p><p>比如下面角色配置允许伪装用户的 <code>scopes</code> 字段和 <code>Uid</code>：</p><figure class="highlight yaml"><figcaption><span>伪装 extra 和 uid 配置</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">rbac.authorization.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ClusterRole</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">scopes-and-uid-impersonator</span></span><br><span class="line"><span class="attr">rules:</span></span><br><span class="line"><span class="comment"># Can set &quot;Impersonate-Extra-scopes&quot; header and the &quot;Impersonate-Uid&quot; header.</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;authentication.k8s.io&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;userextras/scopes&quot;</span>, <span class="string">&quot;uids&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br></pre></td></tr></table></figure><p>除了可以针对功能进行的权限限制外，RBAC 还支持对角色的伪装内容进行限制，比如限制只能伪装成某个用户：</p><figure class="highlight yaml"><figcaption><span>对伪装内容进行限制</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">rbac.authorization.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ClusterRole</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">limited-impersonator</span></span><br><span class="line"><span class="attr">rules:</span></span><br><span class="line"><span class="comment"># Can impersonate the user &quot;jane.doe@example.com&quot;</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;users&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br><span class="line">  <span class="attr">resourceNames:</span> [<span class="string">&quot;jane.doe@example.com&quot;</span>]</span><br><span class="line"></span><br><span class="line"><span class="comment"># Can impersonate the groups &quot;developers&quot; and &quot;admins&quot;</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;groups&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br><span class="line">  <span class="attr">resourceNames:</span> [<span class="string">&quot;developers&quot;</span>,<span class="string">&quot;admins&quot;</span>]</span><br><span class="line"></span><br><span class="line"><span class="comment"># Can impersonate the extras field &quot;scopes&quot; with the values &quot;view&quot; and &quot;development&quot;</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;authentication.k8s.io&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;userextras/scopes&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br><span class="line">  <span class="attr">resourceNames:</span> [<span class="string">&quot;view&quot;</span>, <span class="string">&quot;development&quot;</span>]</span><br><span class="line"></span><br><span class="line"><span class="comment"># Can impersonate the uid &quot;06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b&quot;</span></span><br><span class="line"><span class="bullet">-</span> <span class="attr">apiGroups:</span> [<span class="string">&quot;authentication.k8s.io&quot;</span>]</span><br><span class="line">  <span class="attr">resources:</span> [<span class="string">&quot;uids&quot;</span>]</span><br><span class="line">  <span class="attr">verbs:</span> [<span class="string">&quot;impersonate&quot;</span>]</span><br><span class="line">  <span class="attr">resourceNames:</span> [<span class="string">&quot;06f6ce97-e2c5-4ab8-7ba5-7654dd08d52b&quot;</span>]</span><br></pre></td></tr></table></figure><h3 id="APIServer-伪装处理"><a href="#APIServer-伪装处理" class="headerlink" title="APIServer 伪装处理"></a>APIServer 伪装处理</h3><p>这里以 KubeVela 使用的 <strong>impersonate</strong> 组件为例，Cluster Gateway 基于 APIServer 的 Builder 实现的。</p><p><code>builder.APIServer.*.Build()</code> 方法会调用 <code>NewCommandStartWardleServer</code> 生成一个 <code>cobra.Command</code> 实例，这个实例在运行时调用 <code>RunWardleServer</code> 启动 APIServer。</p><figure class="highlight go"><figcaption><span>启动 APIServer</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(o WardleServerOptions)</span></span> RunWardleServer(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) <span class="type">error</span> &#123;</span><br><span class="line">config, err := o.Config()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">server, err := config.Complete().New()</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">server.GenericAPIServer.AddPostStartHookOrDie(<span class="string">&quot;start-sample-server-informers&quot;</span>, <span class="function"><span class="keyword">func</span><span class="params">(context genericapiserver.PostStartHookContext)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="keyword">if</span> config.GenericConfig.SharedInformerFactory != <span class="literal">nil</span> &#123;</span><br><span class="line">config.GenericConfig.SharedInformerFactory.Start(context.StopCh)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> server.GenericAPIServer.PrepareRun().Run(stopCh)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这里第一行的 <code>o.Config()</code> 会初始化 HTTP 服务器的默认 <code>Filter</code> 和 <code>Handler</code> 信息，我们关心的用户伪装处理就在里面生成，调用链如下：</p><p><code>o.Config() -&gt; genericapiserver.NewRecommendedConfig() -&gt; NewConfig() -&gt; DefaultBuildHandlerChain</code></p><p><code>DefaultBuildHandlerChain</code> 会用于初始化 <code>GenericAPIServer</code>：</p><figure class="highlight go"><figcaption><span>k8s.io/apiserver@v0.25.3/pkg/server/config.go:598</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c completedConfig)</span></span> New(name <span class="type">string</span>, delegationTarget DelegationTarget) (*GenericAPIServer, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">handlerChainBuilder := <span class="function"><span class="keyword">func</span><span class="params">(handler http.Handler)</span></span> http.Handler &#123;</span><br><span class="line"><span class="keyword">return</span> c.BuildHandlerChainFunc(handler, c.Config)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">apiServerHandler := NewAPIServerHandler(name, c.Serializer, handlerChainBuilder, delegationTarget.UnprotectedHandler())</span><br><span class="line"></span><br><span class="line">s := &amp;GenericAPIServer&#123;</span><br><span class="line">Handler:                    apiServerHandler,</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>DefaultBuildHandlerChain</code> 中通过 <code>WithImpersonation</code> 方法将 <code>impersonation</code> 注入 APIServer 的请求处理链中：</p><figure class="highlight go"><figcaption><span>k8s.io/apiserver@v0.25.3/pkg/server/config.go:808</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">DefaultBuildHandlerChain</span><span class="params">(apiHandler http.Handler, c *Config)</span></span> http.Handler &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">handler = genericapifilters.WithImpersonation(handler, c.Authorization.Authorizer, c.Serializer)</span><br><span class="line">handler = filterlatency.TrackStarted(handler, <span class="string">&quot;impersonation&quot;</span>)</span><br><span class="line"></span><br><span class="line">handler = filterlatency.TrackCompleted(handler)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">handler = genericapifilters.WithAuthentication(handler, c.Authentication.Authenticator, failedHandler, c.Authentication.APIAudiences, c.Authentication.RequestHeaderConfig)  </span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>WithImpersonation</code> 函数中会使用 <code>Authorizer</code> 对当前用户进行权限校验：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:41</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WithImpersonation</span><span class="params">(handler http.Handler, a authorizer.Authorizer, s runtime.NegotiatedSerializer)</span></span> http.Handler &#123;</span><br><span class="line"><span class="keyword">return</span> http.HandlerFunc(<span class="function"><span class="keyword">func</span><span class="params">(w http.ResponseWriter, req *http.Request)</span></span> &#123;</span><br><span class="line"><span class="comment">// 从请求头中读取当前伪装信息</span></span><br><span class="line">impersonationRequests, err := buildImpersonationRequests(req.Header)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">4</span>).Infof(<span class="string">&quot;%v&quot;</span>, err)</span><br><span class="line">responsewriters.InternalError(w, req, err)</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 不存在伪装信息直接跳过</span></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(impersonationRequests) == <span class="number">0</span> &#123;</span><br><span class="line">handler.ServeHTTP(w, req)</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 读取当前的用户信息</span></span><br><span class="line">ctx := req.Context()</span><br><span class="line">requestor, exists := request.UserFrom(ctx)</span><br><span class="line"><span class="keyword">if</span> !exists &#123;</span><br><span class="line">responsewriters.InternalError(w, req, errors.New(<span class="string">&quot;no user found for request&quot;</span>))</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// if groups are not specified, then we need to look them up differently depending on the type of user</span></span><br><span class="line"><span class="comment">// if they are specified, then they are the authority (including the inclusion of system:authenticated/system:unauthenticated groups)</span></span><br><span class="line">groupsSpecified := <span class="built_in">len</span>(req.Header[authenticationv1.ImpersonateGroupHeader]) &gt; <span class="number">0</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// make sure we&#x27;re allowed to impersonate each thing we&#x27;re requesting.  While we&#x27;re iterating through, start building username</span></span><br><span class="line"><span class="comment">// and group information</span></span><br><span class="line">username := <span class="string">&quot;&quot;</span></span><br><span class="line">groups := []<span class="type">string</span>&#123;&#125;</span><br><span class="line">userExtra := <span class="keyword">map</span>[<span class="type">string</span>][]<span class="type">string</span>&#123;&#125;</span><br><span class="line">uid := <span class="string">&quot;&quot;</span></span><br><span class="line"><span class="keyword">for</span> _, impersonationRequest := <span class="keyword">range</span> impersonationRequests &#123;</span><br><span class="line">gvk := impersonationRequest.GetObjectKind().GroupVersionKind()</span><br><span class="line">actingAsAttributes := &amp;authorizer.AttributesRecord&#123;</span><br><span class="line">User:            requestor,</span><br><span class="line">Verb:            <span class="string">&quot;impersonate&quot;</span>,</span><br><span class="line">APIGroup:        gvk.Group,</span><br><span class="line">APIVersion:      gvk.Version,</span><br><span class="line">Namespace:       impersonationRequest.Namespace,</span><br><span class="line">Name:            impersonationRequest.Name,</span><br><span class="line">ResourceRequest: <span class="literal">true</span>,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// ... 太长略过，这里组装权限校验请求</span></span><br><span class="line"><span class="comment">// 进行权限校验</span></span><br><span class="line">decision, reason, err := a.Authorize(ctx, actingAsAttributes)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> || decision != authorizer.DecisionAllow &#123;</span><br><span class="line">klog.V(<span class="number">4</span>).InfoS(<span class="string">&quot;Forbidden&quot;</span>, <span class="string">&quot;URI&quot;</span>, req.RequestURI, <span class="string">&quot;Reason&quot;</span>, reason, <span class="string">&quot;Error&quot;</span>, err)</span><br><span class="line">responsewriters.Forbidden(ctx, actingAsAttributes, w, req, reason, s)</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// ...略掉部分逻辑</span></span><br><span class="line"><span class="comment">// 替换用户信息到请求的 ctx 中</span></span><br><span class="line">newUser := &amp;user.DefaultInfo&#123;</span><br><span class="line">Name:   username,</span><br><span class="line">Groups: groups,</span><br><span class="line">Extra:  userExtra,</span><br><span class="line">UID:    uid,</span><br><span class="line">&#125;</span><br><span class="line">req = req.WithContext(request.WithUser(ctx, newUser))</span><br><span class="line"></span><br><span class="line">oldUser, _ := request.UserFrom(ctx)</span><br><span class="line">httplog.LogOf(req, w).Addf(<span class="string">&quot;%v is acting as %v&quot;</span>, oldUser, newUser)</span><br><span class="line"></span><br><span class="line">ae := audit.AuditEventFrom(ctx)</span><br><span class="line">audit.LogImpersonatedUser(ae, newUser)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 清除伪装头</span></span><br><span class="line"><span class="comment">// clear all the impersonation headers from the request</span></span><br><span class="line">req.Header.Del(authenticationv1.ImpersonateUserHeader)</span><br><span class="line">req.Header.Del(authenticationv1.ImpersonateGroupHeader)</span><br><span class="line">req.Header.Del(authenticationv1.ImpersonateUIDHeader)</span><br><span class="line"><span class="keyword">for</span> headerName := <span class="keyword">range</span> req.Header &#123;</span><br><span class="line"><span class="keyword">if</span> strings.HasPrefix(headerName, authenticationv1.ImpersonateUserExtraHeaderPrefix) &#123;</span><br><span class="line">req.Header.Del(headerName)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">handler.ServeHTTP(w, req)</span><br><span class="line">&#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>由代码中可以理解到 APIServer 实现用户伪装的原理：</p><ul><li>首先从请求头中读取当前伪装信息，不存在伪装信息直接跳过；</li><li>读取当前的用户信息，并对当前用户进行 <code>impersonate</code> 权限的校验，权限不满足会报错；</li><li>替换伪装用户信息到请求的 ctx 中；</li><li>记录审计信息并删除伪装请求头。</li></ul><p>当请求到达用户的资源处理代码时，使用 <code>request.UserFrom(ctx)</code> 获取到的用户既是已经伪装过的用户信息。如果没有配置伪装信息，那取得的就是原有用户信息。</p><p>执行完 <code>WithImpersonation</code> 的 Handler 后，后面会调用 <code>WithAuthentication</code> 对最终的用户权限进行校验。</p><h2 id="利用用户伪装功能"><a href="#利用用户伪装功能" class="headerlink" title="利用用户伪装功能"></a>利用用户伪装功能</h2><p>除了 APIServer 本身对用户伪装功能的实现外，其它的一些开源组件也基于用户伪装功能做了一些功能扩展，就比如 KubeVela 的 Cluster Gateway 网关代理。</p><p>Cluster Gateway 基于 APIServer 框架开发，使用的 Kubernetes APIServer 内部的 Web 组件，同时利用 apiserver 的 APIServer aggregation 功能实现了 API 网关请求代理。</p><p>Cluster Gateway 在将代理的请求进行伪装前还会判断一下两个条件，满足其中一个才会进行伪装：</p><ul><li>URL Query 是否提供了 <code>impersonate=true</code> 参数；</li><li>启动时是否启用了 <code>ClientIdentityPenetration</code> 特性。</li></ul><figure class="highlight go"><figcaption><span>pkg/apis/cluster/v1alpha1/clustergateway_proxy.go:200</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *proxyHandler)</span></span> ServeHTTP(writer http.ResponseWriter, request *http.Request) &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">cfg, err := NewConfigFromCluster(request.Context(), cluster)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">responsewriters.InternalError(writer, request, errors.Wrapf(err, <span class="string">&quot;failed creating cluster proxy client config %s&quot;</span>, cluster.Name))</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 判断是否进行伪装</span></span><br><span class="line"><span class="keyword">if</span> p.impersonate || utilfeature.DefaultFeatureGate.Enabled(featuregates.ClientIdentityPenetration) &#123;</span><br><span class="line">cfg.Impersonate = p.getImpersonationConfig(request)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 使用配置构造 RoundTripper</span></span><br><span class="line">rt, err := restclient.TransportFor(cfg)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">responsewriters.InternalError(writer, request, errors.Wrapf(err, <span class="string">&quot;failed creating cluster proxy client %s&quot;</span>, cluster.Name))</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 生成代理</span></span><br><span class="line">proxy := apiproxy.NewUpgradeAwareHandler(</span><br><span class="line">&amp;url.URL&#123;</span><br><span class="line">Scheme:   urlAddr.Scheme,</span><br><span class="line">Path:     newReq.URL.Path,</span><br><span class="line">Host:     urlAddr.Host,</span><br><span class="line">RawQuery: request.URL.RawQuery,</span><br><span class="line">&#125;,</span><br><span class="line">rt,</span><br><span class="line"><span class="literal">false</span>,</span><br><span class="line"><span class="literal">false</span>,</span><br><span class="line"><span class="literal">nil</span>)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>getImpersonationConfig</code> 方法中实现了 Cluster Gateway 的用户信息交换逻辑，可以基于配置的规则将伪装的用户换成另外的用户。</p><p><code>restclient.TransportFor()</code> 方法根据提供的配置来构造 <code>RoundTripper</code>，内部最后调用 <code>HTTPWrappersForConfig</code> 实现：</p><figure class="highlight go"><figcaption><span>k8s.io/client-go@v0.25.3/transport/round_trippers.go:39</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">HTTPWrappersForConfig</span><span class="params">(config *Config, rt http.RoundTripper)</span></span> (http.RoundTripper, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(config.Impersonate.UserName) &gt; <span class="number">0</span> ||</span><br><span class="line"><span class="built_in">len</span>(config.Impersonate.UID) &gt; <span class="number">0</span> ||</span><br><span class="line"><span class="built_in">len</span>(config.Impersonate.Groups) &gt; <span class="number">0</span> ||</span><br><span class="line"><span class="built_in">len</span>(config.Impersonate.Extra) &gt; <span class="number">0</span> &#123;</span><br><span class="line">rt = NewImpersonatingRoundTripper(config.Impersonate, rt)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> rt, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>HTTPWrappersForConfig()</code> 函数判断配置是否有 <code>Impersonate</code> 的伪装信息，如果有的话使用 <code>NewImpersonatingRoundTripper</code> 进行包装，内部实现处理的代码如下：</p><figure class="highlight go"><figcaption><span>k8s.io/client-go@v0.25.3/transport/round_trippers.go:241</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rt *impersonatingRoundTripper)</span></span> RoundTrip(req *http.Request) (*http.Response, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// use the user header as marker for the rest.</span></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(req.Header.Get(ImpersonateUserHeader)) != <span class="number">0</span> &#123;</span><br><span class="line"><span class="keyword">return</span> rt.delegate.RoundTrip(req)</span><br><span class="line">&#125;</span><br><span class="line">req = utilnet.CloneRequest(req)</span><br><span class="line">req.Header.Set(ImpersonateUserHeader, rt.impersonate.UserName)</span><br><span class="line"><span class="keyword">if</span> rt.impersonate.UID != <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line">req.Header.Set(ImpersonateUIDHeader, rt.impersonate.UID)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">for</span> _, group := <span class="keyword">range</span> rt.impersonate.Groups &#123;</span><br><span class="line">req.Header.Add(ImpersonateGroupHeader, group)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">for</span> k, vv := <span class="keyword">range</span> rt.impersonate.Extra &#123;</span><br><span class="line"><span class="keyword">for</span> _, v := <span class="keyword">range</span> vv &#123;</span><br><span class="line">req.Header.Add(ImpersonateUserExtraHeaderPrefix+headerKeyEscape(k), v)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> rt.delegate.RoundTrip(req)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>impersonatingRoundTripper</code> 的实现原理是将传入的 <code>impersonate</code> 配置设置到请求头中，最终请求发往纳管集群的 APIServer。</p><p>Cluster Gateway 的 <code>impersonatingRoundTripper</code> 和 KubeVela 中使用的 <code>impersonatingRoundTripper</code> 并不是同一个。KubeVela 中的 <code>impersonatingRoundTripper</code> 会修改 URL Query 信息且不支持 Extra 字段伪装；而这里的单纯配置请求头，是 client-go 原生提供的实现。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>本文简单介绍了 Kubernetes 中用户伪装功能和 APIServer 中的实现，以及在 KubeVela Cluster Gateway 中利用这个功能做的扩展。</p><p>伪装用户功能不仅能用于权限的限制，还可以在 <code>extra</code> 字段中加入用户的额外信息，并通过审计日志功能记录当前操作人，实现全链路操作审计。</p><p>下篇文章我们看一下 KubeVela 是如何利用用户伪装功能的。</p>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;用户伪装是 Kubernetes 原生提供的 &lt;a href=&quot;https://kubernetes.io/docs/reference/access-authn-authz/authentication/#user-impersonation&quot;&gt;User impersonation&lt;/a&gt; 功能，这个功能在管理集群时非常有用。&lt;/p&gt;
&lt;p&gt;通常在管理系统中管理集群时，使用的都是集群管理员（cluster-admin）这样的高权限用户。当用户使用系统进行操作集群时，实际操作身份和集群权限并不匹配，这样很容易造成安全问题。&lt;/p&gt;
&lt;p&gt;比如用户实际权限只有 Namespace 的操作，但通过集群管理系统部署 Helm 时，由于管理系统使用的集群管理员用户，如果 Chart 包里创建多个 Namespace 甚至是 ServiceAccount 就会造成越权。&lt;/p&gt;
&lt;p&gt;通常会考虑在管理系统中做权限，但相当于有两套权限，很难保证做得面面具到，如果使用 Kubernetes 的用户伪装功能就可以完美解决这个问题。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="KubeVela" scheme="https://blog.imoe.tech/tags/KubeVela/"/>
    
  </entry>
  
  <entry>
    <title>KubeVela 的集群是如何管理的</title>
    <link href="https://blog.imoe.tech/2023/05/31/kubevela-cluster-management/"/>
    <id>https://blog.imoe.tech/2023/05/31/kubevela-cluster-management/</id>
    <published>2023-05-31T01:32:55.000Z</published>
    <updated>2025-06-24T14:56:43.232Z</updated>
    
    <content type="html"><![CDATA[<p>KubeVela 是多集群应用管理组件，所以在使用之前需要将集群纳管到 KubeVela 中，让 KubeVela 能感知并维护集群信息。在应用下发到指定集群时，KubeVela 能知道如何连接到目标集群并进行操作。</p><p>KubeVela 使用的是 <code>Secret</code> 来保存集群信息的，和 Cluster Gateway 共享的同一套 <code>Secret</code> 进行集群管理。当进行集群纳管时，KubeVela 会创建名字和集群名相同的 <code>Secret</code>，用于存储集群的连接信息。</p><p>当请求从 APIServer 转发到 Cluster Gateway 时，使用路径中提供的集群名去查询 Secret 并获取到纳管集群的连接信息。</p><p>Cluster Gateway 处理流程如下：</p><p><img src="https://images.imoe.tech/blog/Pasted%20image%2020230131170622.png" alt="Cluster Gateway 处理流程图"></p><span id="more"></span><h2 id="集群信息-Secret"><a href="#集群信息-Secret" class="headerlink" title="集群信息 Secret"></a>集群信息 Secret</h2><p>Secret 数据类似下面这样：</p><figure class="highlight yaml"><figcaption><span>集群 Secret</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Secret</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">managed1</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">cluster.core.oam.dev/cluster-credential-type:</span> <span class="string">ServiceAccountToken</span></span><br><span class="line"><span class="attr">type:</span> <span class="string">Opaque</span> <span class="comment"># &lt;--- Has to be opaque</span></span><br><span class="line"><span class="attr">data:</span></span><br><span class="line">  <span class="attr">endpoint:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># Should NOT be 127.0.0.1</span></span><br><span class="line">  <span class="attr">ca.crt:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># ca cert for cluster &quot;managed1&quot;</span></span><br><span class="line">  <span class="attr">token:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># working jwt token</span></span><br></pre></td></tr></table></figure><h2 id="纳管集群实现"><a href="#纳管集群实现" class="headerlink" title="纳管集群实现"></a>纳管集群实现</h2><p>纳管集群时，KubeVela 有两套非常相似的处理逻辑，VelaUX 和 vela-cli：</p><ul><li>VelaUX 业务逻辑入口在 <code>clusterServiceImpl.createKubeCluster</code>，在进行一些判断后通过 <code>joinClusterByKubeConfigString()</code> 函数注册纳管集群，最终调用 <code>multicluster.JoinClusterByKubeConfig()</code> 函数进行集群纳管处理；</li><li>vela-cli 处理入口在 <code>NewClusterJoinCommand</code> 中，直接调用 <code>multicluster.JoinClusterByKubeConfig()</code> 函数进行集群纳管处理。</li></ul><p><code>JoinClusterByKubeConfig()</code> 函数通过 <code>Secret</code> 来管理集群，而在些之上 VelaUX 更进一步。</p><p>VelaUX 的 <code>createKubeCluster()</code> 方法还会在 Store 创建 <code>model.Cluster&#123;&#125;</code> 结构的数据，保存集群信息。Store 如果是 <code>kubeapi</code>，则是存储到 <code>ConfigMap</code> 中，否则存储到 MongoDB。</p><h3 id="JoinClusterByKubeConfig"><a href="#JoinClusterByKubeConfig" class="headerlink" title="JoinClusterByKubeConfig"></a>JoinClusterByKubeConfig</h3><p><code>multicluster.JoinClusterByKubeConfig()</code> 函数会创建前面提到的 <code>Secret</code> 用于管理集群，但是在创建后会做一些判断，如果条件不满足会删除创建的数据。</p><figure class="highlight go"><figcaption><span>pkg/multicluster/cluster_management.go:389</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// JoinClusterByKubeConfig add child cluster by kubeconfig path, return cluster info and error</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">JoinClusterByKubeConfig</span><span class="params">(ctx context.Context, cli client.Client, kubeconfigPath <span class="type">string</span>, clusterName <span class="type">string</span>, options ...JoinClusterOption)</span></span> (*KubeClusterConfig, <span class="type">error</span>) &#123;</span><br><span class="line">args := newJoinClusterArgs(options...)</span><br><span class="line"><span class="comment">// 读取纳管集群的 kubeconfig 文件</span></span><br><span class="line">clusterConfig, err := LoadKubeClusterConfigFromFile(kubeconfigPath)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> err := clusterConfig.SetClusterName(clusterName).SetCreateNamespace(args.createNamespace).Validate(); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 纳管处理</span></span><br><span class="line"><span class="keyword">switch</span> args.engine &#123;</span><br><span class="line"><span class="keyword">case</span> ClusterGateWayEngine:</span><br><span class="line"><span class="keyword">if</span> err = clusterConfig.RegisterByVelaSecret(ctx, cli); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> OCMEngine:</span><br><span class="line"><span class="keyword">if</span> args.inClusterBootstrap == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, errors.Wrapf(err, <span class="string">&quot;failed to determine the registration endpoint for the hub cluster &quot;</span>+</span><br><span class="line"><span class="string">&quot;when parsing --in-cluster-bootstrap flag&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> err = clusterConfig.RegisterClusterManagedByOCM(ctx, cli, args); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> clusterConfig, err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> cfg, ok := ctx.Value(KubeConfigContext).(*rest.Config); ok &#123;</span><br><span class="line"><span class="keyword">if</span> err = SetClusterVersionInfo(ctx, cfg, clusterConfig.ClusterName); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> clusterConfig, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>JoinClusterByKubeConfig()</code> 函数在纳管集群时需要从本地文件读取 <code>kubeconfig</code> 配置，所以在 VelaUX 的 <code>joinClusterByKubeConfigString()</code> 中会先创建临时文件再调用 <code>JoinClusterByKubeConfig()</code> 函数处理。</p><p>上面代码中有两个分支的处理逻辑。这里主要关注直接通过 Cluster Gateway 管理的集群，OCM 集群先跳过。ClusterGateway 纳管集群通过 <code>clusterConfig.RegisterByVelaSecret()</code> 方法进行：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(clusterConfig *KubeClusterConfig)</span></span> RegisterByVelaSecret(ctx context.Context, cli client.Client) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 检查集群是否已经存在</span></span><br><span class="line"><span class="keyword">if</span> err := ensureClusterNotExists(ctx, cli, clusterConfig.ClusterName); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> errors.Wrapf(err, <span class="string">&quot;cannot use cluster name %s&quot;</span>, clusterConfig.ClusterName)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 不存在，创建集群 Secret</span></span><br><span class="line"><span class="keyword">if</span> err := clusterConfig.createClusterSecret(ctx, cli, <span class="literal">true</span>); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> errors.Wrapf(err, <span class="string">&quot;failed to add cluster to kubernetes&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 后置检查</span></span><br><span class="line"><span class="keyword">return</span> clusterConfig.PostRegistration(ctx, cli)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>纳管逻辑很重要部分在后置检查这里，后置检查会保证配置的 <code>clusterConfig.CreateNamespace</code> 命名空间在纳管集群创建，此时会发起请求通过 Cluster Gateway 连接纳管集群进行操作。</p><article class="message is-warning">        <div class="message-header"><p><i class="fa fa-exclamation-circle mr-2"></i>注意</p></div>        <div class="message-body">            <p>这里请求会直接从本地向 APIServer 发起，而且请求 URL 是以下格式： <code>/apis/cluster.core.oam.dev/v1alpha1/clustergateways/{clusterName}/proxy/{api}</code></p>        </div>    </article><p>Namespace 操作失败会回退纳管操作，删除 <code>Secret</code>，返回纳管失败错误。</p><h3 id="Cluster-Gateway-获取集群信息"><a href="#Cluster-Gateway-获取集群信息" class="headerlink" title="Cluster Gateway 获取集群信息"></a>Cluster Gateway 获取集群信息</h3><p>当代理请求通过 Cluster Gateway 时，就需要去查询集群的信息。</p><p>在 Cluster Gateway 代码中，创建代理连接时的处理方法 <code>Connect()</code> 里会去获取集群信息。这个方法会通过 <code> parentStorage.Get()</code> 调用的父资源 <code>ClusterGateway</code> 的 <code>Get()</code> 方法：</p><figure class="highlight go"><figcaption><span>pkg/apis/cluster/v1alpha1/clustergateway_types_secret.go:46</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(in *ClusterGateway)</span></span> Get(ctx context.Context, name <span class="type">string</span>, _ *metav1.GetOptions) (runtime.Object, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> singleton.GetSecretControl() == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;loopback secret client are not inited&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">clusterSecret, err := singleton.GetSecretControl().Get(ctx, name)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Warningf(<span class="string">&quot;Failed getting secret %q/%q: %v&quot;</span>, config.SecretNamespace, name, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> options.OCMIntegration &#123;</span><br><span class="line"><span class="keyword">if</span> singleton.GetClusterControl() == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;loopback cluster client are not inited&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line">managedCluster, err := singleton.GetClusterControl().Get(ctx, name)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> convertFromSecret(clusterSecret)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> convertFromManagedClusterAndSecret(managedCluster, clusterSecret)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> convertFromSecret(clusterSecret)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个方法内部是通过 <code>SecretControl</code> 获取集群同名的 <code>Secret</code>，并转换成 <code>ClusterGateway</code> 对象，最终传给 <code>proxyHandler</code> 使用，具体参考之前的文章《<a href="https://blog.imoe.tech/2023/04/26/kubevela-cluster-gateway-implementation/">KubeVela 代理网关 Cluster Gateway 实现</a>》。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>KubeVela 这样使用 <code>Secret</code> 管理集群也是有好处的：</p><ol><li><code>Secret</code> 能进行权限管理，保证安全；</li><li>支持通过 <code>Label</code> 设置集群元信息，用于调度时基于 <code>Label</code> 进行应用分发。</li></ol><p>在代码中我们可以看到，如果使用的 vela-cli 进行纳管，请求会从当前主机向 API Server 发出，而很多时候 Kubernetes 的 API Server 是不开放到外面访问的。</p><p>所以在生产环境中，更多的是使用 vela-cli in pod 的方式进行管理。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;KubeVela 是多集群应用管理组件，所以在使用之前需要将集群纳管到 KubeVela 中，让 KubeVela 能感知并维护集群信息。在应用下发到指定集群时，KubeVela 能知道如何连接到目标集群并进行操作。&lt;/p&gt;
&lt;p&gt;KubeVela 使用的是 &lt;code&gt;Secret&lt;/code&gt; 来保存集群信息的，和 Cluster Gateway 共享的同一套 &lt;code&gt;Secret&lt;/code&gt; 进行集群管理。当进行集群纳管时，KubeVela 会创建名字和集群名相同的 &lt;code&gt;Secret&lt;/code&gt;，用于存储集群的连接信息。&lt;/p&gt;
&lt;p&gt;当请求从 APIServer 转发到 Cluster Gateway 时，使用路径中提供的集群名去查询 Secret 并获取到纳管集群的连接信息。&lt;/p&gt;
&lt;p&gt;Cluster Gateway 处理流程如下：&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://images.imoe.tech/blog/Pasted%20image%2020230131170622.png&quot; alt=&quot;Cluster Gateway 处理流程图&quot;&gt;&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="KubeVela" scheme="https://blog.imoe.tech/tags/KubeVela/"/>
    
  </entry>
  
  <entry>
    <title>负载和 Pod 的关联逻辑</title>
    <link href="https://blog.imoe.tech/2023/05/04/relation-of-workload-and-pod/"/>
    <id>https://blog.imoe.tech/2023/05/04/relation-of-workload-and-pod/</id>
    <published>2023-05-04T04:09:59.000Z</published>
    <updated>2023-05-04T07:47:11.718Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p>在 Kubernetes 中，负载指的是 Deployment、StatefulSet 和 DaemonSet 这三种资源：</p><ul><li>Deployment 用于管理无状态的 Pod，通过 ReplicaSet 进行管理；</li><li>StatefulSet 用于管理有状态的 Pod；</li><li>DaemonSet 管理的 Pod 会部署在所有集群节点中。</li></ul><p>如何确定哪些 Pod 是由哪个负载进行管理的？这些 Pod 是怎么与负载进行关联的？</p><span id="more"></span><h2 id="Deployment-与-Pod"><a href="#Deployment-与-Pod" class="headerlink" title="Deployment 与 Pod"></a>Deployment 与 Pod</h2><p>对 Kubernetes 有所了解的应该都知道，Deployment 通过 ReplicaSet 管理 Pod，一个 Pod 的管理调度流程如下图：</p><p><img src="https://images.imoe.tech/blog/Pasted%20image%2020220428221105.png" alt="Deployment 处理流程"></p><p>下面是一个 Helm 部署的 Chaos Mesh 的 Deployment 实例：</p><figure class="highlight yaml"><figcaption><span>Deployment</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">chaos-testing</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">progressDeadlineSeconds:</span> <span class="number">600</span></span><br><span class="line">  <span class="attr">replicas:</span> <span class="number">1</span></span><br><span class="line">  <span class="attr">revisionHistoryLimit:</span> <span class="number">10</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/component:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/instance:</span> <span class="string">chaos-mesh</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/name:</span> <span class="string">chaos-mesh</span></span><br><span class="line">  <span class="attr">strategy:</span></span><br><span class="line">    <span class="attr">rollingUpdate:</span></span><br><span class="line">      <span class="attr">maxSurge:</span> <span class="number">25</span><span class="string">%</span></span><br><span class="line">      <span class="attr">maxUnavailable:</span> <span class="number">25</span><span class="string">%</span></span><br><span class="line">    <span class="attr">type:</span> <span class="string">RollingUpdate</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">    <span class="attr">metadata:</span></span><br><span class="line">      <span class="attr">labels:</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/component:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/instance:</span> <span class="string">chaos-mesh</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/managed-by:</span> <span class="string">Helm</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/name:</span> <span class="string">chaos-mesh</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/part-of:</span> <span class="string">chaos-mesh</span></span><br><span class="line">        <span class="attr">app.kubernetes.io/version:</span> <span class="number">2.2</span><span class="number">.0</span></span><br><span class="line">        <span class="attr">helm.sh/chart:</span> <span class="string">chaos-mesh-2.2.0</span></span><br><span class="line">    <span class="attr">spec:</span></span><br><span class="line">      <span class="attr">containers:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">command:</span></span><br><span class="line">            <span class="bullet">-</span> <span class="string">/usr/local/bin/chaos-dashboard</span></span><br><span class="line">          <span class="attr">env:</span></span><br><span class="line">            <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">CLEAN_SYNC_PERIOD</span></span><br><span class="line">              <span class="attr">value:</span> <span class="string">12h</span></span><br><span class="line">          <span class="attr">image:</span> <span class="string">&#x27;chaos-mesh/chaos-dashboard:v2.2.0&#x27;</span></span><br><span class="line">          <span class="attr">imagePullPolicy:</span> <span class="string">IfNotPresent</span></span><br><span class="line">          <span class="attr">name:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">          <span class="attr">ports:</span></span><br><span class="line">            <span class="bullet">-</span> <span class="attr">containerPort:</span> <span class="number">2333</span></span><br><span class="line">              <span class="attr">name:</span> <span class="string">http</span></span><br><span class="line">              <span class="attr">protocol:</span> <span class="string">TCP</span></span><br><span class="line">            <span class="bullet">-</span> <span class="attr">containerPort:</span> <span class="number">2334</span></span><br><span class="line">              <span class="attr">name:</span> <span class="string">metric</span></span><br><span class="line">              <span class="attr">protocol:</span> <span class="string">TCP</span></span><br><span class="line">          <span class="attr">resources:</span></span><br><span class="line">            <span class="attr">requests:</span></span><br><span class="line">              <span class="attr">cpu:</span> <span class="string">25m</span></span><br><span class="line">              <span class="attr">memory:</span> <span class="string">256Mi</span></span><br><span class="line">          <span class="attr">terminationMessagePath:</span> <span class="string">/dev/termination-log</span></span><br><span class="line">          <span class="attr">terminationMessagePolicy:</span> <span class="string">File</span></span><br><span class="line">          <span class="attr">volumeMounts:</span></span><br><span class="line">            <span class="bullet">-</span> <span class="attr">mountPath:</span> <span class="string">/data</span></span><br><span class="line">              <span class="attr">name:</span> <span class="string">storage-volume</span></span><br><span class="line">      <span class="attr">dnsPolicy:</span> <span class="string">ClusterFirst</span></span><br><span class="line">      <span class="attr">restartPolicy:</span> <span class="string">Always</span></span><br><span class="line">      <span class="attr">schedulerName:</span> <span class="string">default-scheduler</span></span><br><span class="line">      <span class="attr">securityContext:</span> &#123;&#125;</span><br><span class="line">      <span class="attr">serviceAccount:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">      <span class="attr">serviceAccountName:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">      <span class="attr">terminationGracePeriodSeconds:</span> <span class="number">30</span></span><br><span class="line">      <span class="attr">volumes:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">emptyDir:</span> &#123;&#125;</span><br><span class="line">          <span class="attr">name:</span> <span class="string">storage-volume</span></span><br></pre></td></tr></table></figure><p>在 <code>DeploymentController</code> 代码中可以看到，ReplicaSet 由 Deployment 的 <code>.spec.selector</code> 关联：</p><figure class="highlight go"><figcaption><span>pkg/controller/deployment/deployment_controller.go:516</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dc *DeploymentController)</span></span> getReplicaSetsForDeployment(ctx context.Context, d *apps.Deployment) ([]*apps.ReplicaSet, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// List all ReplicaSets to find those we own but that no longer match our</span></span><br><span class="line"><span class="comment">// selector. They will be orphaned by ClaimReplicaSets().</span></span><br><span class="line">rsList, err := dc.rsLister.ReplicaSets(d.Namespace).List(labels.Everything())</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line">deploymentSelector, err := metav1.LabelSelectorAsSelector(d.Spec.Selector)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;deployment %s/%s has invalid label selector: %v&quot;</span>, d.Namespace, d.Name, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// If any adoptions are attempted, we should first recheck for deletion with</span></span><br><span class="line"><span class="comment">// an uncached quorum read sometime after listing ReplicaSets (see #42639).</span></span><br><span class="line">canAdoptFunc := controller.RecheckDeletionTimestamp(<span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context)</span></span> (metav1.Object, <span class="type">error</span>) &#123;</span><br><span class="line">fresh, err := dc.client.AppsV1().Deployments(d.Namespace).Get(ctx, d.Name, metav1.GetOptions&#123;&#125;)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> fresh.UID != d.UID &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;original Deployment %v/%v is gone: got uid %v, wanted %v&quot;</span>, d.Namespace, d.Name, fresh.UID, d.UID)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> fresh, <span class="literal">nil</span></span><br><span class="line">&#125;)</span><br><span class="line">cm := controller.NewReplicaSetControllerRefManager(dc.rsControl, d, deploymentSelector, controllerKind, canAdoptFunc)</span><br><span class="line"><span class="keyword">return</span> cm.ClaimReplicaSets(ctx, rsList)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在 <code>ReplicaSetController</code> 中，Pod 由 ReplicaSet 的 <code>.spec.selector</code> 关联，代码如下：</p><figure class="highlight go"><figcaption><span>pkg/controller/replicaset/replica_set.go:679</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rsc *ReplicaSetController)</span></span> syncReplicaSet(ctx context.Context, key <span class="type">string</span>) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">selector, err := metav1.LabelSelectorAsSelector(rs.Spec.Selector)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;error converting pod selector to selector for rs %v/%v: %v&quot;</span>, namespace, name, err))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// list all pods to include the pods that don&#x27;t match the rs`s selector</span></span><br><span class="line"><span class="comment">// anymore but has the stale controller ref.</span></span><br><span class="line"><span class="comment">// <span class="doctag">TODO:</span> Do the List and Filter in a single pass, or use an index.</span></span><br><span class="line">allPods, err := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// Ignore inactive pods.</span></span><br><span class="line">filteredPods := controller.FilterActivePods(allPods)</span><br><span class="line"></span><br><span class="line"><span class="comment">// <span class="doctag">NOTE:</span> filteredPods are pointing to objects from cache - if you need to</span></span><br><span class="line"><span class="comment">// modify them, you need to copy it first.</span></span><br><span class="line">filteredPods, err = rsc.claimPods(ctx, rs, selector, filteredPods)</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>ReplicaSet 对象如下：</p><figure class="highlight yaml"><figcaption><span>ReplicaSet</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ReplicaSet</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">annotations:</span></span><br><span class="line"><span class="comment"># ...略</span></span><br><span class="line">  <span class="attr">creationTimestamp:</span> <span class="string">&quot;2022-09-15T02:55:01Z&quot;</span></span><br><span class="line">  <span class="attr">generation:</span> <span class="number">3</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/component:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/instance:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/managed-by:</span> <span class="string">Helm</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/name:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/part-of:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/version:</span> <span class="number">2.2</span><span class="number">.0</span></span><br><span class="line">    <span class="attr">helm.sh/chart:</span> <span class="string">chaos-mesh-2.2.0</span></span><br><span class="line">    <span class="attr">pod-template-hash:</span> <span class="string">5f97f6658f</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">chaos-dashboard-5f97f6658f</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">chaos-testing</span></span><br><span class="line">  <span class="attr">ownerReferences:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line">    <span class="attr">blockOwnerDeletion:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">controller:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">    <span class="attr">uid:</span> <span class="string">5d58936a-7623-4129-9309-a49764400901</span></span><br><span class="line">  <span class="attr">resourceVersion:</span> <span class="string">&quot;1294668979&quot;</span></span><br><span class="line">  <span class="attr">uid:</span> <span class="string">2578a179-3262-48bf-8f73-c9b7efa3f70e</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">replicas:</span> <span class="number">1</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/component:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/instance:</span> <span class="string">chaos-mesh</span></span><br><span class="line">      <span class="attr">app.kubernetes.io/name:</span> <span class="string">chaos-mesh</span></span><br><span class="line">      <span class="attr">pod-template-hash:</span> <span class="string">5f97f6658f</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">  <span class="comment"># ...略</span></span><br></pre></td></tr></table></figure><p>在 ReplicaSet 的 <code>.spec.selector</code> 中，比 Deployment 多出了一个 <code>pod-template-hash</code>，这个 label 用于区别不同 ReplicaSet 管理的不同 Pod。</p><p>ReplicaSet 依靠 <code>spec.template</code> 来区分不同版本的 ReplicaSet，每个 ReplicaSet 都对 <code>.spec.template</code> 的内容的进行 hash 运算，并使用 hash 值来进行区分。当 Deployment 修改了自己 <code>.spec.template</code> 的内容后，会基于新的内容来创建新的 ReplicaSet 并进行滚动更新。</p><p>对应的 Pod 如下：</p><figure class="highlight yaml"><figcaption><span>Pod</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Pod</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">creationTimestamp:</span> <span class="string">&quot;2023-05-01T17:09:03Z&quot;</span></span><br><span class="line">  <span class="attr">generateName:</span> <span class="string">chaos-dashboard-5f97f6658f-</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/component:</span> <span class="string">chaos-dashboard</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/instance:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/managed-by:</span> <span class="string">Helm</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/name:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/part-of:</span> <span class="string">chaos-mesh</span></span><br><span class="line">    <span class="attr">app.kubernetes.io/version:</span> <span class="number">2.2</span><span class="number">.0</span></span><br><span class="line">    <span class="attr">helm.sh/chart:</span> <span class="string">chaos-mesh-2.2.0</span></span><br><span class="line">    <span class="attr">pod-template-hash:</span> <span class="string">5f97f6658f</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">chaos-dashboard-5f97f6658f-fjqvp</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">chaos-testing</span></span><br><span class="line">  <span class="attr">ownerReferences:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line">    <span class="attr">blockOwnerDeletion:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">controller:</span> <span class="literal">true</span></span><br><span class="line">    <span class="attr">kind:</span> <span class="string">ReplicaSet</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">chaos-dashboard-5f97f6658f</span></span><br><span class="line">    <span class="attr">uid:</span> <span class="string">2578a179-3262-48bf-8f73-c9b7efa3f70e</span></span><br><span class="line">  <span class="attr">resourceVersion:</span> <span class="string">&quot;1294668970&quot;</span></span><br><span class="line">  <span class="attr">uid:</span> <span class="string">db800641-28d9-4f15-ac0a-41ce7fec419c</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">containers:</span></span><br><span class="line">  <span class="comment"># ...略</span></span><br></pre></td></tr></table></figure><p>注意看 <code>ownerReferences</code> 字段，内容指向了管理当前对象的资源信息，这个字段对向上查找时很有帮助。</p><h2 id="StatefulSet-与-Pod"><a href="#StatefulSet-与-Pod" class="headerlink" title="StatefulSet 与 Pod"></a>StatefulSet 与 Pod</h2><p>与 Deployment 不同的是，StatefulSet 不存在 ReplicaSet 这样的中间层，StatefulSet 直接关联的 Pod。</p><p>StatefulSet 使用 <code>.spec.selector</code> 与 Pod 直接关联，代码如下：</p><figure class="highlight go"><figcaption><span>pkg/controller/statefulset/stateful_set.go:445</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ssc *StatefulSetController)</span></span> sync(ctx context.Context, key <span class="type">string</span>) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line"></span><br><span class="line">selector, err := metav1.LabelSelectorAsSelector(set.Spec.Selector)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;error converting StatefulSet %v selector: %v&quot;</span>, key, err))</span><br><span class="line"><span class="comment">// This is a non-transient error, so don&#x27;t retry.</span></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> err := ssc.adoptOrphanRevisions(ctx, set); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">pods, err := ssc.getPodsForStatefulSet(ctx, set, selector)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> ssc.syncStatefulSet(ctx, set, pods)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="DaemonSet-与-Pod"><a href="#DaemonSet-与-Pod" class="headerlink" title="DaemonSet 与 Pod"></a>DaemonSet 与 Pod</h2><p>DaemonSet 和 StatefulSet 相同，使用 <code>.spec.selector</code> 字段与 Pod 直接关联：</p><figure class="highlight go"><figcaption><span>pkg/controller/daemon/daemon_controller.go:719</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dsc *DaemonSetsController)</span></span> getDaemonPods(ctx context.Context, ds *apps.DaemonSet) ([]*v1.Pod, <span class="type">error</span>) &#123;</span><br><span class="line">selector, err := metav1.LabelSelectorAsSelector(ds.Spec.Selector)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// List all pods to include those that don&#x27;t match the selector anymore but</span></span><br><span class="line"><span class="comment">// have a ControllerRef pointing to this controller.</span></span><br><span class="line">pods, err := dsc.podLister.Pods(ds.Namespace).List(labels.Everything())</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// If any adoptions are attempted, we should first recheck for deletion with</span></span><br><span class="line"><span class="comment">// an uncached quorum read sometime after listing Pods (see #42639).</span></span><br><span class="line">dsNotDeleted := controller.RecheckDeletionTimestamp(<span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context)</span></span> (metav1.Object, <span class="type">error</span>) &#123;</span><br><span class="line">fresh, err := dsc.kubeClient.AppsV1().DaemonSets(ds.Namespace).Get(ctx, ds.Name, metav1.GetOptions&#123;&#125;)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> fresh.UID != ds.UID &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;original DaemonSet %v/%v is gone: got uid %v, wanted %v&quot;</span>, ds.Namespace, ds.Name, fresh.UID, ds.UID)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> fresh, <span class="literal">nil</span></span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line"><span class="comment">// Use ControllerRefManager to adopt/orphan as needed.</span></span><br><span class="line">cm := controller.NewPodControllerRefManager(dsc.podControl, ds, selector, controllerKind, dsNotDeleted)</span><br><span class="line"><span class="keyword">return</span> cm.ClaimPods(ctx, pods)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>在 Kubernetes 中，不存在直接通过 Deployment、StatefulSet 或 DaemonSet <code>name</code> 的方式查询管理的 Pod，只能先查出工作负载对象，再通过对象中的 <code>.spec.selector</code> 信息再去查询 Pod。</p><p>Deployment 由于存在管理部署版本的中间层 ReplicaSet，所以 Deployment 的 <code>.spec.selector</code> 查询到的是它全部版本 ReplicaSet 的 Pod，要精细到单个版本的 Pod 是需要使用 ReplicaSet 的 <code>.spec.selector</code> 来查询。</p>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;在 Kubernetes 中，负载指的是 Deployment、StatefulSet 和 DaemonSet 这三种资源：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deployment 用于管理无状态的 Pod，通过 ReplicaSet 进行管理；&lt;/li&gt;
&lt;li&gt;StatefulSet 用于管理有状态的 Pod；&lt;/li&gt;
&lt;li&gt;DaemonSet 管理的 Pod 会部署在所有集群节点中。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;如何确定哪些 Pod 是由哪个负载进行管理的？这些 Pod 是怎么与负载进行关联的？&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
  </entry>
  
  <entry>
    <title>KubeVela 代理网关 Cluster Gateway 实现</title>
    <link href="https://blog.imoe.tech/2023/04/26/kubevela-cluster-gateway-implementation/"/>
    <id>https://blog.imoe.tech/2023/04/26/kubevela-cluster-gateway-implementation/</id>
    <published>2023-04-26T04:00:09.000Z</published>
    <updated>2023-05-04T01:11:17.819Z</updated>
    
    <content type="html"><![CDATA[<p>KubeVela 的多集群管理依赖于 Cluster Gateway 组件，在 KubeVela 的 Helm Chart 中会自动安装。KubeVela 并不会直连集群，而是必须通过 Cluster Gateway 连接集群进行管理。</p><p>包括集群管理在内的功能都是依赖于 Cluster Gateway 实现的，所以 Cluster Gateway 是 KubeVela 多集群管理必不可少的一个组件。</p><p><img src="https://images.imoe.tech/blog/kubevela-cluster-gateway.png?200x200" alt="KubeVela 通过 Cluster Gateway 访问集群"></p><span id="more"></span><p>Cluster Gateway 使用的 apiserver 原生扩展接口 <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/">apiserver-aggregation</a> 实现的代理功能。通过向 API Server 注册外部 API，可以将管理平面的 API Server 作为纳管集群的访问的入口，还可以避免内部集群的暴露问题。</p><p>Cluster Gateway 向 API Server 注册了以下接口：</p><figure class="highlight txt"><figcaption><span>Cluster Gateway 代理路径</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/apis/cluster.core.oam.dev/v1alpha1/clustergateways/&#123;clusterName&#125;/proxy/&#123;api&#125;</span><br></pre></td></tr></table></figure><p>接口中的参数如下：</p><ul><li><code>clusterName</code>：目标集群名；</li><li><code>api</code>：转发到目标集群的 API Server 的请求路径。</li></ul><h2 id="入口和授权"><a href="#入口和授权" class="headerlink" title="入口和授权"></a>入口和授权</h2><p>通过查看应用注册的 Kubernetes 资源，可以大体了解到应用的权限要求、启动方式和依赖等，对理解应用原理和找到程序入口有很大帮助。</p><h3 id="注册-API-Server-Aggregation"><a href="#注册-API-Server-Aggregation" class="headerlink" title="注册 API Server Aggregation"></a>注册 API Server Aggregation</h3><p>注册 API Server 扩展接口很简单，只需要创建一个 APIService 对象就可以实现：</p><figure class="highlight yaml"><figcaption><span>注册 AA</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apiregistration.k8s.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">APIService</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">v1alpha1.cluster.core.oam.dev</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">api:</span> <span class="string">cluster-extension-apiserver</span></span><br><span class="line">    <span class="attr">apiserver:</span> <span class="string">&quot;true&quot;</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">version:</span> <span class="string">v1alpha1</span></span><br><span class="line">  <span class="attr">group:</span> <span class="string">cluster.core.oam.dev</span></span><br><span class="line">  <span class="attr">groupPriorityMinimum:</span> <span class="number">2000</span></span><br><span class="line">  <span class="attr">service:</span></span><br><span class="line">    <span class="attr">name:</span> <span class="string">gateway-service</span></span><br><span class="line">    <span class="attr">namespace:</span>  &#123;&#123; <span class="string">.Release.Namespace</span> &#125;&#125;</span><br><span class="line">    <span class="attr">port:</span> <span class="number">9443</span></span><br><span class="line">  <span class="attr">versionPriority:</span> <span class="number">10</span></span><br><span class="line">  <span class="attr">insecureSkipTLSVerify:</span> <span class="literal">true</span></span><br></pre></td></tr></table></figure><p>这个配置文件，注册了一个版本为 <code>v1alpha1</code> 的 <code>API Group</code>，组名为 <code>cluster.core.oam.dev</code>。在请求 API Server 的 <code>/apis/cluster.core.oam.dev/v1alpha1/</code> 接口时，请求会被转发到 <code>APIService</code> 声明的 <code>Service</code> 中（示例中是 <code>gateway-service</code> 服务）。</p><p>代码中，在构造 API Server 的 Builder 中注册了 <code>ClusterGateway</code> 这个资源：</p><figure class="highlight go"><figcaption><span>构造 ApiServer</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"></span><br><span class="line"><span class="comment">// registering metrics</span></span><br><span class="line">metrics.Register()</span><br><span class="line"></span><br><span class="line">cmd, err := builder.APIServer.</span><br><span class="line"><span class="comment">// +kubebuilder:scaffold:resource-register</span></span><br><span class="line">WithResource(&amp;clusterv1alpha1.ClusterGateway&#123;&#125;).</span><br><span class="line"><span class="comment">//...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>ClusterGateway</code> 注册了资源名 <code>clustergateways</code>，对应的处理代理请求的逻辑在 <code>ClusterGatewayProxy</code> 中：</p><figure class="highlight go"><figcaption><span>pkg/apis/cluster/v1alpha1/clustergateway_types.go:146</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(in *ClusterGateway)</span></span> GetGroupVersionResource() schema.GroupVersionResource &#123;</span><br><span class="line"><span class="keyword">return</span> schema.GroupVersionResource&#123;</span><br><span class="line">Group:    config.MetaApiGroupName,</span><br><span class="line">Version:  config.MetaApiVersionName,</span><br><span class="line">Resource: config.MetaApiResourceName,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">var</span> MetaApiResourceName = <span class="string">&quot;clustergateways&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(in *ClusterGateway)</span></span> GetArbitrarySubResources() []resource.ArbitrarySubResource &#123;  </span><br><span class="line">   <span class="keyword">return</span> []resource.ArbitrarySubResource&#123;  </span><br><span class="line">      &amp;ClusterGatewayProxy&#123;&#125;,  </span><br><span class="line">      &amp;ClusterGatewayHealth&#123;&#125;,  </span><br><span class="line">   &#125;  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="授权"><a href="#授权" class="headerlink" title="授权"></a>授权</h3><p>Cluster Gateway 还创建了以下授权配置：</p><ul><li><code>ClusterRole</code>：<code>cluster-gateway:proxy</code> 访问代理接口的角色；</li><li><code>ClusterRoleBinding</code>：将上面的角色授权给 <code>kubevela:client</code> 组和 SA。</li></ul><p>这个配置可以让 KubeVela 和 CLI 用户能使用这个资源接口访问集群。</p><h2 id="Cluster-Gateway-代码实现"><a href="#Cluster-Gateway-代码实现" class="headerlink" title="Cluster Gateway 代码实现"></a>Cluster Gateway 代码实现</h2><p>由前面注册 <code>APIService</code> 部分可以知道，<code>ClusterGatewayProxy</code> 结构是处理代理逻辑的地方，可以说是 Cluster Gateway 业务逻辑的入口。</p><p><code>ClusterGatewayProxy</code> 实现了 <code>Connecter</code> 接口，当连接进来时会调用对应的 <code>Connect()</code> 方法，在 <code>Connect()</code> 方法中根据请求信息获取 <code>ClusterGateway</code> 信息并构造一个 <code>proxyHandler</code> 对象：</p><figure class="highlight go"><figcaption><span>Connect</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *ClusterGatewayProxy)</span></span> Connect(ctx context.Context, id <span class="type">string</span>, options runtime.Object, r registryrest.Responder) (http.Handler, <span class="type">error</span>) &#123;</span><br><span class="line">proxyOpts, ok := options.(*ClusterGatewayProxyOptions)</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;invalid options object: %#v&quot;</span>, options)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 从 Secrets 中获取集群信息，id 是集群名</span></span><br><span class="line">parentStorage, ok := contextutil.GetParentStorageGetter(ctx)</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;no parent storage found&quot;</span>)</span><br><span class="line">&#125;</span><br><span class="line">parentObj, err := parentStorage.Get(ctx, id, &amp;metav1.GetOptions&#123;&#125;)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;no such cluster %v&quot;</span>, id)</span><br><span class="line">&#125;</span><br><span class="line">clusterGateway := parentObj.(*ClusterGateway)</span><br><span class="line"></span><br><span class="line">reqInfo, _ := request.RequestInfoFrom(ctx)</span><br><span class="line">factory := request.RequestInfoFactory&#123;</span><br><span class="line">APIPrefixes:          sets.NewString(<span class="string">&quot;api&quot;</span>, <span class="string">&quot;apis&quot;</span>),</span><br><span class="line">GrouplessAPIPrefixes: sets.NewString(<span class="string">&quot;api&quot;</span>),</span><br><span class="line">&#125;</span><br><span class="line">proxyReqInfo, _ := factory.NewRequestInfo(&amp;http.Request&#123;</span><br><span class="line">URL: &amp;url.URL&#123;</span><br><span class="line">Path: proxyOpts.Path,</span><br><span class="line">&#125;,</span><br><span class="line">Method: strings.ToUpper(reqInfo.Verb),</span><br><span class="line">&#125;)</span><br><span class="line">proxyReqInfo.Verb = reqInfo.Verb</span><br><span class="line"></span><br><span class="line"><span class="comment">// 校验权限</span></span><br><span class="line"><span class="keyword">if</span> config.AuthorizateProxySubpath &#123;</span><br><span class="line">user, _ := request.UserFrom(ctx)</span><br><span class="line"><span class="keyword">var</span> attr authorizer.Attributes</span><br><span class="line"><span class="keyword">if</span> proxyReqInfo.IsResourceRequest &#123;</span><br><span class="line">attr = authorizer.AttributesRecord&#123;</span><br><span class="line">User:        user,</span><br><span class="line">APIGroup:    proxyReqInfo.APIGroup,</span><br><span class="line">APIVersion:  proxyReqInfo.APIVersion,</span><br><span class="line">Resource:    proxyReqInfo.Resource,</span><br><span class="line">Subresource: proxyReqInfo.Subresource,</span><br><span class="line">Namespace:   proxyReqInfo.Namespace,</span><br><span class="line">Name:        proxyReqInfo.Name,</span><br><span class="line">Verb:        proxyReqInfo.Verb,</span><br><span class="line">&#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">path, _ := url.ParseRequestURI(proxyReqInfo.Path)</span><br><span class="line">attr = authorizer.AttributesRecord&#123;</span><br><span class="line">User: user,</span><br><span class="line">Path: path.Path,</span><br><span class="line">Verb: proxyReqInfo.Verb,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">decision, reason, err := loopback.GetAuthorizer().Authorize(ctx, attr)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, errors.Wrapf(err, <span class="string">&quot;authorization failed due to %s&quot;</span>, reason)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> decision != authorizer.DecisionAllow &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;proxying by user %v is forbidden authorization failed&quot;</span>, user.GetName())</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 返回代理实例</span></span><br><span class="line"><span class="keyword">return</span> &amp;proxyHandler&#123;</span><br><span class="line">parentName:     id,</span><br><span class="line">path:           proxyOpts.Path,</span><br><span class="line">impersonate:    proxyOpts.Impersonate,</span><br><span class="line">clusterGateway: clusterGateway,</span><br><span class="line">responder:      r,</span><br><span class="line">finishFunc: <span class="function"><span class="keyword">func</span><span class="params">(code <span class="type">int</span>)</span></span> &#123;</span><br><span class="line">metrics.RecordProxiedRequestsByResource(proxyReqInfo.Resource, proxyReqInfo.Verb, code)</span><br><span class="line">metrics.RecordProxiedRequestsByCluster(id, code)</span><br><span class="line">&#125;,</span><br><span class="line">&#125;, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><p><code>proxyHandler</code> 是一个实现了 <code>http.Handler</code> 接口的结构，接着将调用其 <code>ServeHTTP()</code> 方法来处理请求。</p><p>在 <code>ServeHTTP()</code> 方法中主要处理了以下逻辑：</p><ul><li>复制 <code>request</code>，作为转发请求，同时处理 URL 的变化；</li><li>使用集群配置构造 <code>RoundTripper</code>，构造的 <code>RoundTripper</code> 支持处理授权和用户伪装的功能（Cluster Gateway 支持用户伪装，这个功能另外讨论，此处不深究）；</li><li>最后，利用 Kubernetes 提供的 <code>UpgradeAwareHandler</code> 实现代理功能。</li></ul><h3 id="UpgradeAwareHandler-代理"><a href="#UpgradeAwareHandler-代理" class="headerlink" title="UpgradeAwareHandler 代理"></a>UpgradeAwareHandler 代理</h3><p><code>UpgradeAwareHandler</code> 基于 <code>http.ReverseProxy</code> 实现：</p><figure class="highlight go"><figcaption><span>UpgradeAwareHandler 代理实现</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(h *UpgradeAwareHandler)</span></span> ServeHTTP(w http.ResponseWriter, req *http.Request) &#123;</span><br><span class="line"><span class="comment">// WebSocket 处理，如能升级，直接处理并返回</span></span><br><span class="line"><span class="keyword">if</span> h.tryUpgrade(w, req) &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> h.UpgradeRequired &#123;</span><br><span class="line">h.Responder.Error(w, req, errors.NewBadRequest(<span class="string">&quot;Upgrade request required&quot;</span>))</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">loc := *h.Location</span><br><span class="line">loc.RawQuery = req.URL.RawQuery</span><br><span class="line"></span><br><span class="line"><span class="comment">// If original request URL ended in &#x27;/&#x27;, append a &#x27;/&#x27; at the end of the</span></span><br><span class="line"><span class="comment">// of the proxy URL</span></span><br><span class="line"><span class="keyword">if</span> !strings.HasSuffix(loc.Path, <span class="string">&quot;/&quot;</span>) &amp;&amp; strings.HasSuffix(req.URL.Path, <span class="string">&quot;/&quot;</span>) &#123;</span><br><span class="line">loc.Path += <span class="string">&quot;/&quot;</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">proxyRedirect := proxyRedirectsforRootPath(loc.Path, w, req)</span><br><span class="line"><span class="keyword">if</span> proxyRedirect &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> h.Transport == <span class="literal">nil</span> || h.WrapTransport &#123;</span><br><span class="line">h.Transport = h.defaultProxyTransport(req.URL, h.Transport)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// WithContext creates a shallow clone of the request with the same context.</span></span><br><span class="line">newReq := req.WithContext(req.Context())</span><br><span class="line">newReq.Header = utilnet.CloneHeader(req.Header)</span><br><span class="line"><span class="keyword">if</span> !h.UseRequestLocation &#123;</span><br><span class="line">newReq.URL = &amp;loc</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> h.UseLocationHost &#123;</span><br><span class="line"><span class="comment">// exchanging req.Host with the backend location is necessary for backends that act on the HTTP host header (e.g. API gateways),</span></span><br><span class="line"><span class="comment">// because req.Host has preference over req.URL.Host in filling this header field</span></span><br><span class="line">newReq.Host = h.Location.Host</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// create the target location to use for the reverse proxy</span></span><br><span class="line">reverseProxyLocation := &amp;url.URL&#123;Scheme: h.Location.Scheme, Host: h.Location.Host&#125;</span><br><span class="line"><span class="keyword">if</span> h.AppendLocationPath &#123;</span><br><span class="line">reverseProxyLocation.Path = h.Location.Path</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">proxy := httputil.NewSingleHostReverseProxy(reverseProxyLocation)</span><br><span class="line">proxy.Transport = h.Transport</span><br><span class="line">proxy.FlushInterval = h.FlushInterval</span><br><span class="line">proxy.ErrorLog = log.New(noSuppressPanicError&#123;&#125;, <span class="string">&quot;&quot;</span>, log.LstdFlags)</span><br><span class="line"><span class="keyword">if</span> h.RejectForwardingRedirects &#123;</span><br><span class="line">oldModifyResponse := proxy.ModifyResponse</span><br><span class="line">proxy.ModifyResponse = <span class="function"><span class="keyword">func</span><span class="params">(response *http.Response)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">code := response.StatusCode</span><br><span class="line"><span class="keyword">if</span> code &gt;= <span class="number">300</span> &amp;&amp; code &lt;= <span class="number">399</span> &amp;&amp; <span class="built_in">len</span>(response.Header.Get(<span class="string">&quot;Location&quot;</span>)) &gt; <span class="number">0</span> &#123;</span><br><span class="line"><span class="comment">// close the original response</span></span><br><span class="line">response.Body.Close()</span><br><span class="line">msg := <span class="string">&quot;the backend attempted to redirect this request, which is not permitted&quot;</span></span><br><span class="line"><span class="comment">// replace the response</span></span><br><span class="line">*response = http.Response&#123;</span><br><span class="line">StatusCode:    http.StatusBadGateway,</span><br><span class="line">Status:        fmt.Sprintf(<span class="string">&quot;%d %s&quot;</span>, response.StatusCode, http.StatusText(response.StatusCode)),</span><br><span class="line">Body:          io.NopCloser(strings.NewReader(msg)),</span><br><span class="line">ContentLength: <span class="type">int64</span>(<span class="built_in">len</span>(msg)),</span><br><span class="line">&#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line"><span class="keyword">if</span> oldModifyResponse != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err := oldModifyResponse(response); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line">proxy.ServeHTTP(w, newReq)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="KubeVela-请求-Cluster-Gateway"><a href="#KubeVela-请求-Cluster-Gateway" class="headerlink" title="KubeVela 请求 Cluster Gateway"></a>KubeVela 请求 Cluster Gateway</h2><p>在 KubeVela 中大致有三种需要操作集群的场景：</p><ul><li><code>Controller</code>：KubeVela 的核心 Controller，在执行应用管理和下发动作时需要连接集群进行操作；</li><li><code>KubeVela API Server</code>：当安装 VelaUX 插件时会安装一个 API 服务，这个服务封装了 KubeVela 的功能并提供接口给 VelaUX 前端使用，在管理集群时也会连接 Cluster Gateway；</li><li><code>CLI</code>：用户在使用 CLI 管理集群时（比如纳管集群），会通过 Kubernetes APIServer 连接 Cluster Gateway。</li></ul><p>下面根据这三个场景，分别研究一下是怎么实现调用 Cluster Gateway 的。</p><h3 id="KubeVela-Controller"><a href="#KubeVela-Controller" class="headerlink" title="KubeVela Controller"></a>KubeVela Controller</h3><p>KubeVela Controller  的启动入口 <code>run()</code> 函数中，使用 <code>ctrl.NewManager()</code> 函数来创建  <code>Manager</code>，这个 <code>Manager</code> 用来保存一些共享的对象，比如 <code>Caches</code> 和 <code>Clients</code>，通常创建 Controller 都需要一个 <code>Manager</code>。</p><figure class="highlight go"><figcaption><span>cmd/core/app/server.go:140</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">mgr, err := ctrl.NewManager(restConfig, ctrl.Options&#123;</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line">NewClient: velaclient.DefaultNewControllerClient,</span><br><span class="line">NewCache:  sharding.BuildCache(scheme, &amp;v1beta1.Application&#123;&#125;, &amp;v1beta1.ApplicationRevision&#123;&#125;, &amp;v1beta1.ResourceTracker&#123;&#125;),</span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><p>调用 <code>ctrl.NewManager()</code> 时指定了 <code>Client</code> 的工厂方法为 <code>velaclient.DefaultNewControllerClient</code>，KubeVela 会创建定制的多集群客户端，也会在创建 Kubernetes 客户端时增加一些处理逻辑。</p><p><code>DefaultNewControllerClient() -&gt; NewDefaultClient() -&gt; pkgmulticluster.NewClient()</code>，最后调用的 <code>NewClient()</code> 会使用 <code>NewTransportWrapper()</code> 获取一个 <code>multicluster.Transport</code> 并包装到 <code>Client</code> 配置中：</p><figure class="highlight go"><figcaption><span>github.com/kubevela/pkg@v0.0.0-20230118103503-4a6096e79c1c/multicluster/client.go:135</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewClient</span><span class="params">(config *rest.Config, options ClientOptions)</span></span> (client.Client, <span class="type">error</span>) &#123;</span><br><span class="line">wrapped := rest.CopyConfig(config)</span><br><span class="line">wrapped.Wrap(NewTransportWrapper())</span><br><span class="line"><span class="comment">// ...略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>multicluster.Transport</code> 会获取请求的目标集群，如果非本地集群会对请求路径进行修改：</p><figure class="highlight go"><figcaption><span>github.com/kubevela/pkg@v0.0.0-20230118103503-4a6096e79c1c/multicluster/transport.go:118</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span></span> RoundTrip(req *http.Request) (*http.Response, <span class="type">error</span>) &#123;</span><br><span class="line">cluster := t.getClusterFor(req)</span><br><span class="line"><span class="keyword">if</span> !IsLocal(cluster) &#123;</span><br><span class="line">req = req.Clone(req.Context())</span><br><span class="line">req.URL.Path = formatProxyURL(cluster, req.URL.Path)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> t.delegate.RoundTrip(req)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>formatProxyURL()</code> 函数将在原有 URL 前面增加集群代理的路径：</p><figure class="highlight go"><figcaption><span>github.com/kubevela/pkg@v0.0.0-20230118103503-4a6096e79c1c/multicluster/transport.go:93</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">formatProxyURL</span><span class="params">(cluster, originalPath <span class="type">string</span>)</span></span> <span class="type">string</span> &#123;</span><br><span class="line">originalPath = strings.TrimPrefix(originalPath, <span class="string">&quot;/&quot;</span>)</span><br><span class="line"><span class="keyword">return</span> path.Clean(strings.Join([]<span class="type">string</span>&#123;</span><br><span class="line"><span class="string">&quot;/apis&quot;</span>,</span><br><span class="line">clustergatewayconfig.MetaApiGroupName,</span><br><span class="line">clustergatewayconfig.MetaApiVersionName,</span><br><span class="line">clustergatewayconfig.MetaApiResourceName,</span><br><span class="line">cluster,</span><br><span class="line"><span class="string">&quot;proxy&quot;</span>,</span><br><span class="line">originalPath,</span><br><span class="line">&#125;, <span class="string">&quot;/&quot;</span>))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如集群名为 <code>cluster1</code>，那么增加的前缀路径是：</p><figure class="highlight txt"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/apis/cluster.core.oam.dev/v1alpha1/clustergateways/cluster1/proxy/</span><br></pre></td></tr></table></figure><p>这样请求就会被管理平面的 APIServer 接收到并根据 APIServer Aggregation 配置转发给 Cluster Gateway 处理。</p><h3 id="KubeVela-API-Server"><a href="#KubeVela-API-Server" class="headerlink" title="KubeVela API Server"></a>KubeVela API Server</h3><p>KubeVela 的 API Server 并不是 kube-apiserver 类型的服务，而单纯只是一个 KubeVela 的 API 服务而已，通常提供给前端 VelaUX 使用。</p><p>当 KubeVela 的 API Server 启动时，运行通过以下调用链后，来到 <code>NewClient()</code> 函数：</p><p><code>server.Run() -&gt; run() -&gt; restServer.Run() -&gt; s.buildIoCContainer() -&gt; clients.GetKubeClient() -&gt; pkgmulticluster.NewClient()</code></p><p><code>pkgmulticluster.NewClient</code> 方法就是上面 KubeVela Controller 用来创建 <code>Client</code> 的方法，内部包装了 <code>multicluster.Transport</code> 用于修改请求的路径，将请求通过 APIServer 转发到 Cluster Gateway，这里不再细说。</p><h3 id="KubeVela-CLI"><a href="#KubeVela-CLI" class="headerlink" title="KubeVela CLI"></a>KubeVela CLI</h3><p>vela-cli 其实也是相同的实现，最终都是通过 <code>pkgmulticluster.NewClient()</code> 方法创建 <code>Client</code>，将请求通过 APIServer 转发到 Cluster Gateway 来实现。</p><article class="message is-info">        <div class="message-header"><p>CLI 这里特别注意的点</p></div>        <div class="message-body">            <p>CLI 访问 APIServer 的请求（包括连接纳管集群的请求）都是从本地发出的，并且是请求的当前 <code>~/.kube/config</code> 里配置的集群 Host 的域名。如果集群 Host 有 Context Path 会被去掉。如 <code>~/.kube/config</code>里配置的集群 Host 如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">https://imoe.tech/apiserver</span><br></pre></td></tr></table></figure><p>处理后的请求的是：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">https://imoe.tech/apis/cluster.core.oam.dev/v1alpha1/clustergateways/cluster1/proxy/</span><br></pre></td></tr></table></figure><p>如果代理的时候是通过 Context Path 进行分流代理到不同 APIServer 的话，这里需要特别注意，大概率报错。</p>        </div>    </article><p>创建 <code>Client</code> 的逻辑在每个命令对应的 <code>Command</code> 的 <code>RunE()</code> 函数中实现，以集群纳管命令 <code>vela cluster join</code> 为例，创建代码在：</p><p><code>NewClusterJoinCommand() -&gt; RunE() —&gt; c.GetClient() -&gt; pkgmulticluster.NewClient()</code></p><p>逻辑和之前介绍的相同，这里不详细讨论了。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>Cluster Gateway 利用 apiserver 的 apiserver-aggregation 扩展功能和 UpgradeAwareHandler 代理转发工具实现了访问 APIServer 就可以实现访问纳管集群功能。</p><p>KubeVela 的三个组件都是通过自定义 <code>Transport</code> 修改 Kubernetes Client 的请求路径，实现将请求转发到 Cluster Gateway 的 APIServer 扩展点进而访问纳管集群的管理功能。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/">apiserver-aggregation</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;KubeVela 的多集群管理依赖于 Cluster Gateway 组件，在 KubeVela 的 Helm Chart 中会自动安装。KubeVela 并不会直连集群，而是必须通过 Cluster Gateway 连接集群进行管理。&lt;/p&gt;
&lt;p&gt;包括集群管理在内的功能都是依赖于 Cluster Gateway 实现的，所以 Cluster Gateway 是 KubeVela 多集群管理必不可少的一个组件。&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://images.imoe.tech/blog/kubevela-cluster-gateway.png?200x200&quot; alt=&quot;KubeVela 通过 Cluster Gateway 访问集群&quot;&gt;&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="KubeVela" scheme="https://blog.imoe.tech/tags/KubeVela/"/>
    
  </entry>
  
  <entry>
    <title>多集群应用管理方案的选择</title>
    <link href="https://blog.imoe.tech/2023/03/20/app-management-in-multiple-clusters/"/>
    <id>https://blog.imoe.tech/2023/03/20/app-management-in-multiple-clusters/</id>
    <published>2023-03-20T15:36:33.000Z</published>
    <updated>2025-06-24T14:42:47.154Z</updated>
    
    <content type="html"><![CDATA[<p>Kubernetes 是容器编排引擎，用来对容器进行自动化部署、扩缩和管理。Kubernetes 更像是一个对资源进行管理和分配的系统，只负责把工作负载进行合理调度，最大化利用集群资源。</p><p>在实践中 Kubernetes 集群往往是部署很多套的，不同的业务线、不同部门或子公司都是使用的独立的集群，甚至常见的同一个服务会部署在不同区域的集群中以实现用户就近服务。要想实现将服务同时部署到多个集群单纯靠 Kubernetes 是不行的。严格来讲，在 Kubernetes 中是没有这里所说的应用这个维度的。</p><p>在应用的整个生命周期中，应用是可以存在多个集群，多个环境的。如在开发中可以把应用部署在开发测试集群，在上线后可以同时部署到生产和容灾集群，这很显然是一种超越 Kubernetes 集群的概念。</p><p>由于 Kubernetes 无法完成对应用的管理，所以业界诞生了多种方案来解决应用管理的需求，常见的有 OCM、Karmada 和 KubeVela，本文将对这几种方案进行总体介绍并基于使用需求进行选型。</p><span id="more"></span><h2 id="Open-Cluster-Management（OCM）"><a href="#Open-Cluster-Management（OCM）" class="headerlink" title="Open Cluster Management（OCM）"></a>Open Cluster Management（OCM）</h2><p>Open Cluster Management (OCM) 是用于 Kubernetes 多集群编排的一个功能强大，模块化，可扩展的平台。</p><p>在 OCM 中，多集群控制平面，被直观的建模为 Hub，而相对的，每一个被 Hub 管理的集群则为 Klusterlet：</p><ul><li><strong>Hub Cluster</strong>：表示运行着 OCM 多集群控制平面的集群。通常 hub cluster 应该是一个轻量级的 Kubernetes 集群，仅仅托管着一些基础的控制器和服务。</li><li><strong>Klusterlet</strong>：表示由 hub cluster 管理着的集群，也被称为 <code>managed cluster</code> 或 <code>spoke cluster</code>。klusterlet 主动的从 hub cluster<strong>拉取</strong>最新的指令配置，并持续将 Kubernetes 集群调和到预期状态。</li></ul><p>这样的设计受 kubelet 和 apiserver 的启发，klusterlet 的名字就来自 kubelet。</p><p>在这种架构的设计之下，hub 并不会直接请求实际集群，而是以声明式的方式维护每个集群的处方，由 klusterlet 主动从 hub 拉取指令并执行。</p><p><img src="https://images.imoe.tech/blog/ocm-20230130171615.png" alt="OCM 架构示意图"></p><p>上图中的组件：</p><ul><li>registration 负责集群注册、集群生命周期管理、管理插件的注册和生命周期管理；</li><li>work 负责资源的分发；</li><li>placement 负责集群负载的调度。</li></ul><p>OCM 提供了丰富的多集群负责调度策略和可靠的资源分发引擎，可以让托管集群连接到集群的控制平面，甚至这些托管的集群并不能被控制平面直接访问到。它特别能处理托管集群和控制平面处于不同 VPC 的情况。</p><blockquote><p>得益于 OCM 的 hub-agent 架构</p></blockquote><p>OCM 还能与 KubeVela 等结合使用，提供集群管理功能。</p><h3 id="使用-OCM-部署应用"><a href="#使用-OCM-部署应用" class="headerlink" title="使用 OCM 部署应用"></a>使用 OCM 部署应用</h3><p>在 OCM 中使用 <code>ManifestWork</code> CRD 来部署资源，如部署 <code>Deployment</code>：</p><figure class="highlight yaml"><figcaption><span>部署 Deployment</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">work.open-cluster-management.io/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">ManifestWork</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">namespace:</span> <span class="string">&lt;target</span> <span class="string">managed</span> <span class="string">cluster&gt;</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">hello-work-demo</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">workload:</span></span><br><span class="line">    <span class="attr">manifests:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line">        <span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line">        <span class="attr">metadata:</span></span><br><span class="line">          <span class="attr">name:</span> <span class="string">hello</span></span><br><span class="line">          <span class="attr">namespace:</span> <span class="string">default</span></span><br><span class="line">        <span class="attr">spec:</span></span><br><span class="line">          <span class="attr">selector:</span></span><br><span class="line">            <span class="attr">matchLabels:</span></span><br><span class="line">              <span class="attr">app:</span> <span class="string">hello</span></span><br><span class="line">          <span class="attr">template:</span></span><br><span class="line">            <span class="attr">metadata:</span></span><br><span class="line">              <span class="attr">labels:</span></span><br><span class="line">                <span class="attr">app:</span> <span class="string">hello</span></span><br><span class="line">            <span class="attr">spec:</span></span><br><span class="line">              <span class="attr">containers:</span></span><br><span class="line">                <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">hello</span></span><br><span class="line">                  <span class="attr">image:</span> <span class="string">quay.io/asmacdo/busybox</span></span><br><span class="line">                  <span class="attr">command:</span></span><br><span class="line">                    [ <span class="string">&quot;sh&quot;</span>, <span class="string">&quot;-c&quot;</span>, <span class="string">&#x27;echo &quot;Hello, Kubernetes!&quot; &amp;&amp; sleep 3600&#x27;</span> ]</span><br></pre></td></tr></table></figure><h3 id="OCM-访问纳管集群"><a href="#OCM-访问纳管集群" class="headerlink" title="OCM 访问纳管集群"></a>OCM 访问纳管集群</h3><p>OCM 使用拉模式管理的集群，但在一些特定时候就有从 Hub 集群访问纳管集群的需求，这时需要有一个访问的方法。在 OCM 中使用 Cluster Proxy(Apiserver-Network-Proxy) 和 Cluster Gateway 实现。</p><p>对于处在不同 VPC 中的纳管集群，Hub 到纳管集群的网络可能是不通的，而纳管集群到 Hub 可以连通。在 OCM 中可以安装 Cluster Proxy 打通 Hub 到纳管集群的网络连接。示意图如下：</p><p><img src="https://images.imoe.tech/blog/ocm-cluster-proxy.png" alt="Cluster Proxy"></p><p>当需要从 Hub 集群主动访问纳管集群时，使用 Cluster Gateway 组件来实现，此时如果 Hub 到纳管集群能连通会使用直连的方式代理，否则走 Tunnel 去访问，示意如图：</p><p><img src="https://images.imoe.tech/blog/ocm-cluster-gateway.png" alt="Cluster Gateway"></p><p>Cluster Gateway 使用的是 KubeVela 的 Cluster Gateway 组件，实现原理在介绍 KubeVela 的时候我们再讨论。</p><h2 id="Karmada"><a href="#Karmada" class="headerlink" title="Karmada"></a>Karmada</h2><p>Karmada（Kubernetes Armada）基于 Kubernetes Federation v1 和 v2 开发，是一个 Kubernetes 管理系统，一个 kubernetes 多集群管理的插件，运行在 kubernetes 集群里，可让跨多个 Kubernetes 集群和云运行云原生应用程序，而无需更改应用程序。</p><p><img src="https://images.imoe.tech/blog/karmada-arch.png" alt="Karmada 架构图"></p><p>Karmada 通过独立的 API 服务器（Karmada API Server）提供与其他组件进行通信的 REST 接口，包含 Kubernetes 原生 API 及 Karmada 扩展 API。纳管集群支持 Push 和 Pull 模式。</p><p>Karmada 调度器则实现应用在多集群中的调度。</p><p>Karmada 控制器运行各种控制器，控制器监视 Karmada 的对象，然后与底层集群的 API 服务器通信，对 Kubernetes 资源进行全生命周期管理。</p><ul><li><strong>Cluster Controller</strong>：集群管理，将 Kubernetes 集群附加到 Karmada，通过创建集群对象（Cluster）管理集群的生命周期；</li><li><strong>Policy Controller</strong>：实现 PropagationPolicy 对象的生命周期。根据 PropagationPolicy 中的 resourceSelector 匹配对应 Kubernetes 资源对象，并为创建 ResourceBinding 以进行应用多集群调度；</li><li><strong>Binding Controller</strong>：实现 ResourceBinding 对象的生命周期，根据调度器的结果，为每个调度到目标集群的对应资源创建 Work 对象；</li><li><strong>Execution Controller</strong>：负责 Work 对象与成员集群中实际资源对象的状态同步。</li></ul><p>Karmada 处理流程：</p><p><img src="https://images.imoe.tech/blog/karmada-workflow.png" alt="Karmada 处理流程"></p><h3 id="使用-Karmada-部署应用"><a href="#使用-Karmada-部署应用" class="headerlink" title="使用 Karmada 部署应用"></a>使用 Karmada 部署应用</h3><p>在 Karmada 中直接使用原生 Kubernetes 的方式就可以部署资源，比如部署 nginx 的 Deployment：</p><figure class="highlight yaml"><figcaption><span>https://github.com/karmada-io/karmada/blob/master/samples/nginx/deployment.yaml</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">apps/v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Deployment</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">replicas:</span> <span class="number">2</span></span><br><span class="line">  <span class="attr">selector:</span></span><br><span class="line">    <span class="attr">matchLabels:</span></span><br><span class="line">      <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line">  <span class="attr">template:</span></span><br><span class="line">    <span class="attr">metadata:</span></span><br><span class="line">      <span class="attr">labels:</span></span><br><span class="line">        <span class="attr">app:</span> <span class="string">nginx</span></span><br><span class="line">    <span class="attr">spec:</span></span><br><span class="line">      <span class="attr">containers:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">image:</span> <span class="string">nginx</span></span><br><span class="line">          <span class="attr">name:</span> <span class="string">nginx</span></span><br></pre></td></tr></table></figure><p>注意，部署时需要使用 Karmada 的 Apiserver，并且要实现调度控制需要另外创建 <code>PropagationPolicy</code> 来进行。</p><h3 id="Karmada-访问纳管集群"><a href="#Karmada-访问纳管集群" class="headerlink" title="Karmada 访问纳管集群"></a>Karmada 访问纳管集群</h3><p>Karmada 在使用 PULL 模式管理纳管集群时也可以使用 ANP（Apiserver-Network-Proxy）进行代理，也提供了和 Cluster Gateway 一样使用 AA（apiserver-aggregation）来实现 apiserver 代理的能力，可以将 API 请求转发到后端纳管集群。</p><p>路径格式如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/apis/cluster.karmada.io/v1alpha1/clusters/&#123;clusterName&#125;/proxy/api/v1/pods</span><br></pre></td></tr></table></figure><p>路径分为三个部分：</p><ul><li>karmada 固定部分：<code>/apis/cluster.karmada.io/v1alpha1/clusters/</code></li><li>指定目标集群：<code>&#123;clusterName&#125;</code> 替换成自己需要的集群名</li><li>Kubernetes 真实请求 API：<code>/proxy</code> 后面部分，上例中是 <code>/api/v1/pods</code></li></ul><h2 id="KubeVela"><a href="#KubeVela" class="headerlink" title="KubeVela"></a>KubeVela</h2><p>KubeVela 是基于 Kubernetes 的混合云应用交付和管理控制平面，它以 CRD 控制器的形式运行，可以很轻量的安装到企业现有的 PaaS 体系中，并带来 OAM 的标准化模型和基于模型高可扩展功能的丰富社区插件。</p><p>KubeVela 的核心是将应用部署所需的所有组件和各项运维动作，描述为一个统一的、与基础设施无关的“部署计划”。</p><p><img src="https://images.imoe.tech/blog/kubevela-dag.png" alt="DAG"></p><p>每一个应用部署计划都由四个部分组成，分别是组件、运维能力、部署策略和工作流：</p><ul><li><strong>组件（Component）</strong>: 组件定义一个应用包含的待交付制品（二进制、Docker 镜像、Helm Chart…）或云服务（一个应用部署计划部署的是一个微服务单元）；</li><li><strong>运维特征（Trait）</strong>: 运维特征是可以随时绑定给待部署组件的、模块化、可拔插的运维能力，比如：副本数调整（手动、自动）、数据持久化、 设置网关策略、自动设置 DNS 解析等；</li><li><strong>应用策略（Policy）</strong>: 应用策略负责定义指定应用交付过程中的策略，比如多集群部署的差异化配置、资源放置策略、安全组策略、防火墙规则、SLO 目标等；</li><li><strong>工作流步骤（Workflow Step）</strong>: 工作流由多个步骤组成，允许用户自定义应用在某个环境的交付过程。典型的工作流步骤包括人工审核、数据传递、多集群发布、通知等。</li></ul><p>下图为多集群应用的整体结构图。如图所示，所有的配置信息（包括应用、策略和工作流）都处于管控集群中。<strong>只有资源（如 deployment 或者 service）会被下发到子集群之中</strong>。</p><p><img src="https://images.imoe.tech/blog/kubevela-app-structure.png" alt="应用结构图"></p><h3 id="架构"><a href="#架构" class="headerlink" title="架构"></a>架构</h3><p>在架构上，KubeVela <strong>只有一个 controller</strong> 并且以插件的方式运行在 Kubernetes 之上，交付的应用部署在纳管集群中，为 Kubernetes 带来了面向应用层的抽象。</p><p><img src="https://images.imoe.tech/blog/kubevela-arch.png" alt="部署架构"></p><p>当使用 KubeVela 交付应用时，最终应用信息会以 CRD 的形式存储在 Kubernetes 集群中。Controller 在 CRD 的控制循环中，通过内嵌的工作流组件执行应用的 Workflow 实现应用交付。</p><p>KubeVela 操作纳管集群时需要通过 Cluster Gateway 组件进行，并不直接访问目标集群。集群信息在注册集群时会保存在对应的 Secret 配置中，Cluster Gateway 在代理时使用这些信息进行访问。</p><h3 id="使用-KubeVela-部署应用"><a href="#使用-KubeVela-部署应用" class="headerlink" title="使用 KubeVela 部署应用"></a>使用 KubeVela 部署应用</h3><p>部署资源使用 OAM 应用定义的配置文件。</p><p>这里是一个包括了一个无状态服务组件和运维特征，三个部署策略和工作流步骤的配置示例。含义是：</p><ul><li>将一个服务部署到两个目标命名空间；</li><li>在第一个目标部署完成后等待人工审核；</li><li>人工审核后部署到第二个目标，且在第二个目标时部署 2 个实例。</li></ul><figure class="highlight yaml"><figcaption><span>KubeVela Quick Start</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">core.oam.dev/v1beta1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Application</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">first-vela-app</span></span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line">  <span class="attr">components:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">express-server</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">webservice</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="attr">image:</span> <span class="string">oamdev/hello-world</span></span><br><span class="line">        <span class="attr">ports:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">port:</span> <span class="number">8000</span></span><br><span class="line">            <span class="attr">expose:</span> <span class="literal">true</span></span><br><span class="line">      <span class="attr">traits:</span></span><br><span class="line">        <span class="bullet">-</span> <span class="attr">type:</span> <span class="string">scaler</span></span><br><span class="line">          <span class="attr">properties:</span></span><br><span class="line">            <span class="attr">replicas:</span> <span class="number">1</span></span><br><span class="line">  <span class="attr">policies:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">target-default</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">topology</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="comment"># local 集群即 Kubevela 所在的集群</span></span><br><span class="line">        <span class="attr">clusters:</span> [ <span class="string">&quot;local&quot;</span> ]</span><br><span class="line">        <span class="attr">namespace:</span> <span class="string">&quot;default&quot;</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">target-prod</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">topology</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="attr">clusters:</span> [ <span class="string">&quot;local&quot;</span> ]</span><br><span class="line">        <span class="comment"># 此命名空间需要在应用部署前完成创建</span></span><br><span class="line">        <span class="attr">namespace:</span> <span class="string">&quot;prod&quot;</span></span><br><span class="line">    <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">deploy-ha</span></span><br><span class="line">      <span class="attr">type:</span> <span class="string">override</span></span><br><span class="line">      <span class="attr">properties:</span></span><br><span class="line">        <span class="attr">components:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="attr">type:</span> <span class="string">webservice</span></span><br><span class="line">            <span class="attr">traits:</span></span><br><span class="line">              <span class="bullet">-</span> <span class="attr">type:</span> <span class="string">scaler</span></span><br><span class="line">                <span class="attr">properties:</span></span><br><span class="line">                  <span class="attr">replicas:</span> <span class="number">2</span></span><br><span class="line">  <span class="attr">workflow:</span></span><br><span class="line">    <span class="attr">steps:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">deploy2default</span></span><br><span class="line">        <span class="attr">type:</span> <span class="string">deploy</span></span><br><span class="line">        <span class="attr">properties:</span></span><br><span class="line">          <span class="attr">policies:</span> [ <span class="string">&quot;target-default&quot;</span> ]</span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">manual-approval</span></span><br><span class="line">        <span class="attr">type:</span> <span class="string">suspend</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">name:</span> <span class="string">deploy2prod</span></span><br><span class="line">        <span class="attr">type:</span> <span class="string">deploy</span></span><br><span class="line">        <span class="attr">properties:</span></span><br><span class="line">          <span class="attr">policies:</span> [ <span class="string">&quot;target-prod&quot;</span>, <span class="string">&quot;deploy-ha&quot;</span> ]</span><br></pre></td></tr></table></figure><h3 id="KubeVela-访问纳管集群"><a href="#KubeVela-访问纳管集群" class="headerlink" title="KubeVela 访问纳管集群"></a>KubeVela 访问纳管集群</h3><p>KubeVela 的多集群依赖于 Cluster-Gateway 组件，在安装 KubeVela 的 Helm Chart 中一同安装。</p><p><img src="https://images.imoe.tech/blog/kubevela-cluster-gateway.png" alt="Cluster Gateway"></p><p>KubeVela 纳管集群时，将 <code>kubeconfig</code> 信息存储到 Secret 中，可以纳管原生集群，也可以接入 OCM 来使用拉取（PULL）模式来管理某些特殊集群。</p><p>Cluster-Gateway 使用 apiserver 原生扩展接口 <a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/">apiserver-aggregation</a> 实现的代理功能。</p><p><img src="https://images.imoe.tech/blog/kubevela-cluster-gateway-workflow.png" alt="Cluster Gateway 处理流程图"></p><p>代理访问的路径格式如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/apis/cluster.core.oam.dev/v1alpha1/clustergateways/&#123;clusterName&#125;/proxy/&#123;api&#125;</span><br></pre></td></tr></table></figure><p>转发到纳管集群的 Apiserver 前，认证信息会使用<strong>集群名字</strong>对应的 Secret 里配置的信息进行替换。Secret 配置样例如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> <span class="string">v1</span></span><br><span class="line"><span class="attr">kind:</span> <span class="string">Secret</span></span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line">  <span class="attr">name:</span> <span class="string">managed1</span></span><br><span class="line">  <span class="attr">labels:</span></span><br><span class="line">    <span class="attr">cluster.core.oam.dev/cluster-credential-type:</span> <span class="string">ServiceAccountToken</span></span><br><span class="line"><span class="attr">type:</span> <span class="string">Opaque</span> <span class="comment"># &lt;--- Has to be opaque</span></span><br><span class="line"><span class="attr">data:</span></span><br><span class="line">  <span class="attr">endpoint:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># Should NOT be 127.0.0.1</span></span><br><span class="line">  <span class="attr">ca.crt:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># ca cert for cluster &quot;managed1&quot;</span></span><br><span class="line">  <span class="attr">token:</span> <span class="string">&quot;...&quot;</span> <span class="comment"># working jwt token</span></span><br></pre></td></tr></table></figure><h2 id="选型思考"><a href="#选型思考" class="headerlink" title="选型思考"></a>选型思考</h2><p>三种方案都能满足跨集群应用管理的功能需求，所以选型时主要基于应用部署方式、部署架构、集群代理方式、维护难度和扩展性进行考虑，这几种方案对比如下：</p><table><thead><tr><th></th><th>KubeVela</th><th>Karmada</th><th>OCM</th></tr></thead><tbody><tr><td>应用部署</td><td>CRD 部署</td><td>原生</td><td>CRD 部署</td></tr><tr><td>部署架构</td><td>单 Controller</td><td>自定义的 apiserver，Scheduler, Controller 等</td><td>多个 Controller + 纳管集群的 Agent</td></tr><tr><td>集群代理</td><td>Cluster Gateway</td><td>karmada-aggregated-apiserver</td><td>Cluster Gateway+Cluster Proxy</td></tr><tr><td>维护难度</td><td>易</td><td>难</td><td>中</td></tr><tr><td>扩展性</td><td>灵活</td><td>灵活</td><td>灵活</td></tr><tr><td>社区活跃</td><td>活跃</td><td>活跃</td><td>不太活跃</td></tr></tbody></table><p>karmada 虽然不需要使用自定义的 CRD 方式进行应用部署，能减少迁移的成本，但需要使用 karmada 修改后的 apiserver 和调度器，架构的维护难度更大，且对于使用 OCP(OpenShift Container Platform) 等集群后续升级兼容等难以支持。</p><p>OCM 相对 karmada 来说部署架构简单了许多，其架构是 PULL 模式，Agent 需要连接 Hub 拉取部署配置。有些公司的网络结构不支持，比如我所在公司。现在社区也不够活跃，文档也比较少。</p><p>KubeVela 相对 OCM 在部署架构上更进一步，只有一个 Controller 负责应用的策略调度下发等功能，集群管理时只通过 Cluster Gateway 进行访问，只需要做一些适配开发就可以支持管理集群中的应用。</p><p>所以综合考虑，KubeVela 更适合目前的需要，同时灵活的扩展支持也能为以后的需求提供持久保障。</p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://kubevela.io/zh/">kubevela</a></li><li><a href="https://karmada.io/zh/">karmada</a></li><li><a href="https://open-cluster-management.io/">OCM</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;Kubernetes 是容器编排引擎，用来对容器进行自动化部署、扩缩和管理。Kubernetes 更像是一个对资源进行管理和分配的系统，只负责把工作负载进行合理调度，最大化利用集群资源。&lt;/p&gt;
&lt;p&gt;在实践中 Kubernetes 集群往往是部署很多套的，不同的业务线、不同部门或子公司都是使用的独立的集群，甚至常见的同一个服务会部署在不同区域的集群中以实现用户就近服务。要想实现将服务同时部署到多个集群单纯靠 Kubernetes 是不行的。严格来讲，在 Kubernetes 中是没有这里所说的应用这个维度的。&lt;/p&gt;
&lt;p&gt;在应用的整个生命周期中，应用是可以存在多个集群，多个环境的。如在开发中可以把应用部署在开发测试集群，在上线后可以同时部署到生产和容灾集群，这很显然是一种超越 Kubernetes 集群的概念。&lt;/p&gt;
&lt;p&gt;由于 Kubernetes 无法完成对应用的管理，所以业界诞生了多种方案来解决应用管理的需求，常见的有 OCM、Karmada 和 KubeVela，本文将对这几种方案进行总体介绍并基于使用需求进行选型。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="KubeVela" scheme="https://blog.imoe.tech/tags/KubeVela/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes 的 Informer 机制</title>
    <link href="https://blog.imoe.tech/2023/02/15/kubernetes-informer-mechanism/"/>
    <id>https://blog.imoe.tech/2023/02/15/kubernetes-informer-mechanism/</id>
    <published>2023-02-15T14:44:32.000Z</published>
    <updated>2023-05-16T07:18:12.970Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p>在 Kubernetes 中，kube-apiserver 是整个集群的大脑和心脏，是控制集群的入口，所有模块都是通过其提供的 HTTP REST API 接口来操作集群的。</p><p>由于是所有模块的数据交互和通信的枢纽，大量组件直接通过 HTTP 请求 apiserver 带来的访问压力是非常大的。一但 apiserver 出现异常，整个集群就会受到影响，甚至崩溃。</p><p>所以尽可能降低 apiserver 的访问压力是很有必要的，Informer 机制就是 Kubernetes 解决这个问题的方案。Informer 本质就是 <code>client-go</code> 提供的一种本地缓存机制：</p><ul><li>通过在本地缓存一份准实时的 Kubernetes 资源数据，应用在查询时直接从本地查询；</li><li>当资源变化时通过长连接将变更推送到本地 Informer 并更新本地缓存；</li><li>变更缓存后，触发本地的处理函数执行相关业务逻辑。</li></ul><p>通过 Informer 机制，大大降低了 Kubernetes 各个组件跟与 API Server 的通信压力，同时 ETCD 的查询压力也同样得到缓解。</p><span id="more"></span><h2 id="Informer-机制架构设计"><a href="#Informer-机制架构设计" class="headerlink" title="Informer 机制架构设计"></a>Informer 机制架构设计</h2><p>下面这张图来自官方文档《<a href="https://github.com/kubernetes/sample-controller/blob/master/docs/controller-client-go.md">client-go under the hood</a>》，展现了 client-go 中各组件的工作原理和与自定义 Controller 的交互流程。</p><p><img src="https://images.imoe.tech/blog/informer-workflow.png"></p><p>图中的组件分为上下两部分，分别是 client-go 组件和自定义 Controller 组件。Informer 机制指的是就是这一整套 Controller 的交互流程。</p><h3 id="client-go-组件"><a href="#client-go-组件" class="headerlink" title="client-go 组件"></a>client-go 组件</h3><ul><li><strong>Reflector</strong>：指的是 <code>cache</code> 包中定义的 <a href="https://github.com/kubernetes/client-go/blob/master/tools/cache/reflector.go">Reflector</a> 类，用于监控 Kubernetes 资源变化，其功能由 <code>ListAndWatch</code> 函数实现。当 Reflector 接收到资源变更的事件，会获取到变更的对象并在函数 <code>watchHandler</code> 中放到 <code>Delta Fifo</code> 队列。</li><li><strong>Delta FIFO</strong>：是一个 FIFO 的队列，用来缓存 <code>Reflector</code> 拉取到的变更事件和资源对象；</li><li><strong>Informor</strong>：是流程中最重要的节点，是整个流程的桥梁，<code>Informer</code> 也是在 <code>cache</code> 包中<a href="https://github.com/kubernetes/client-go/blob/master/tools/cache/controller.go">定义</a>的，其功能在 <code>processLoop</code> 函数中实现，负责：<ul><li>从 Delta FIFO 中 pop 出对象并更新到 Indexer 的 cache 中；</li><li>调用自定义 Controller，传递该对象。</li></ul></li><li><strong>Indexer</strong>：指在 <code>cache</code> 包中定义的 <a href="https://github.com/kubernetes/client-go/blob/master/tools/cache/index.go">Indexer</a> 类，主要是在资源对象上提供了索引和本地缓存的功能。经典的使用场景是基于对象的 Labels 创建索引，Indexer 可以支持使用索引函数来维护索引，同时 Indexer 使用线程安全的 Data Store 来存储资源对象和对应的 Key。默认使用的是 <code>cache</code> 包里的 <a href="https://github.com/kubernetes/client-go/blob/master/tools/cache/store.go">MetaNamespaceKeyFunc</a> 函数来生成对象的 Key，格式如：<code>&lt;namespace&gt;/&lt;name&gt;</code>。</li></ul><h3 id="自定义组件"><a href="#自定义组件" class="headerlink" title="自定义组件"></a>自定义组件</h3><p>上图中 <code>Informer reference</code> 和 <code>Indexer reference</code> 是指在自定义 Controller 中需要自己创建的 Informer 和 Indexer 的实例，用来与整个流程进行交互，需要根据需要的资源创建对应的实例。client-go 提供了 <code>NewIndexerInformer</code> <a href="https://github.com/kubernetes/client-go/blob/master/examples/workqueue/main.go#L174">函数</a>来创建 Informer 和 Indexer 实例，也可以使用 <code>SharedInformerFactory</code> 的<a href="https://github.com/kubernetes/sample-controller/blob/master/main.go#L61">工厂方法</a>来创建实例。</p><p>每个资源都会对应一个 Informer，每个 Informer 都通过 Watch 创建一个长连接。如果一个资源创建了多个 Informer 无疑是非常浪费的，所以通常都使用 <code>SharedInformerFactory</code> 工厂方法来创建，这样每种资源都复用一个 Informer，从而降低开销。</p><h2 id="Reflector"><a href="#Reflector" class="headerlink" title="Reflector"></a>Reflector</h2><p>Reflector 用于监控 Kubernetes 资源变化，当资源发生变化时将资源对象更新到本地缓存 Delta FIFO 中，其主要功能由 <code>ListAndWatch</code> 函数实现，后面详细介绍，现在看一下 Reflector 的创建过程。</p><h3 id="Reflector-创建"><a href="#Reflector-创建" class="headerlink" title="Reflector 创建"></a>Reflector 创建</h3><p>直接使用 <code>NewReflector</code> 或 <code>NewReflectorWithOptions</code> 就可以实例化 Reflector 对象，但常见的使用方式是使用 <code>SharedInformerFactory</code> 来创建。在使用 <code>SharedInformerFactory</code> 工厂方法创建 Informer 后，Informer <strong>启动时</strong>会自动创建 Reflector。</p><p><code>SharedInformerFactory</code> 需要使用 <code>Start</code> 方法启动 Informer，<code>Start</code> 执行时会运行所有 Informer 的 <code>Run</code> 方法：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/informers/factory.go:133</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *sharedInformerFactory)</span></span> Start(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line">f.lock.Lock()</span><br><span class="line"><span class="keyword">defer</span> f.lock.Unlock()</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> f.shuttingDown &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> informerType, informer := <span class="keyword">range</span> f.informers &#123;</span><br><span class="line"><span class="keyword">if</span> !f.startedInformers[informerType] &#123;</span><br><span class="line">f.wg.Add(<span class="number">1</span>)</span><br><span class="line">informer := informer</span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">defer</span> f.wg.Done()</span><br><span class="line"><span class="comment">// 启动 informer</span></span><br><span class="line">informer.Run(stopCh)</span><br><span class="line">&#125;()</span><br><span class="line">f.startedInformers[informerType] = <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>sharedIndexInformer</code>（<code>informer</code>） 的 <code>Run</code> 方法中会构造一个 Controller 对象，该对象的 <code>Run</code> 函数使用 <code>NewReflectorWithOptions</code> 函数构造了一个 Reflector 对象。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/controller.go:129</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *controller)</span></span> Run(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line">   <span class="keyword">defer</span> utilruntime.HandleCrash()</span><br><span class="line">   <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">      &lt;-stopCh</span><br><span class="line">      c.config.Queue.Close()</span><br><span class="line">   &#125;()</span><br><span class="line">   r := NewReflectorWithOptions(</span><br><span class="line">      c.config.ListerWatcher,</span><br><span class="line">      c.config.ObjectType,</span><br><span class="line">      c.config.Queue,</span><br><span class="line">      ReflectorOptions&#123;</span><br><span class="line">         ResyncPeriod:    c.config.FullResyncPeriod,</span><br><span class="line">         TypeDescription: c.config.ObjectDescription,</span><br><span class="line">      &#125;,</span><br><span class="line">   )</span><br><span class="line">   r.ShouldResync = c.config.ShouldResync</span><br><span class="line">   r.WatchListPageSize = c.config.WatchListPageSize</span><br><span class="line">   r.clock = c.clock</span><br><span class="line">   <span class="keyword">if</span> c.config.WatchErrorHandler != <span class="literal">nil</span> &#123;</span><br><span class="line">      r.watchErrorHandler = c.config.WatchErrorHandler</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   c.reflectorMutex.Lock()</span><br><span class="line">   c.reflector = r</span><br><span class="line">   c.reflectorMutex.Unlock()</span><br><span class="line"></span><br><span class="line">   <span class="keyword">var</span> wg wait.Group</span><br><span class="line"></span><br><span class="line">   <span class="comment">// 启动 Reflector</span></span><br><span class="line">   wg.StartWithChannel(stopCh, r.Run)</span><br><span class="line"></span><br><span class="line">   wait.Until(c.processLoop, time.Second, stopCh)</span><br><span class="line">   wg.Wait()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>上面在创建 Reflector 后也会调用 <code>Run</code> 方法启动 Reflector 的处理流程。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/reflector.go:272</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Run repeatedly uses the reflector&#x27;s ListAndWatch to fetch all the// objects and subsequent deltas.</span></span><br><span class="line"><span class="comment">// Run will exit when stopCh is closed.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Reflector)</span></span> Run(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line">   klog.V(<span class="number">3</span>).Infof(<span class="string">&quot;Starting reflector %s (%s) from %s&quot;</span>, r.typeDescription, r.resyncPeriod, r.name)</span><br><span class="line">   wait.BackoffUntil(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">      <span class="keyword">if</span> err := r.ListAndWatch(stopCh); err != <span class="literal">nil</span> &#123;</span><br><span class="line">         r.watchErrorHandler(r, err)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;, r.backoffManager, <span class="literal">true</span>, stopCh)</span><br><span class="line">   klog.V(<span class="number">3</span>).Infof(<span class="string">&quot;Stopping reflector %s (%s) from %s&quot;</span>, r.typeDescription, r.resyncPeriod, r.name)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>核心逻辑在 <code>ListAndWatch</code> 函数中，主要功能包括：</p><ul><li>拉取全量数据，初始化 Delta FIFO 数据；</li><li>启动 <a href="#Resync-%E6%9C%BA%E5%88%B6">Resync 机制</a>；</li><li>监控资源变更。</li></ul><p>Resync 机制后面 Delta FIFO 部分再说，这里先看看其它两部分。</p><h3 id="拉取数据"><a href="#拉取数据" class="headerlink" title="拉取数据"></a>拉取数据</h3><p>启动时，首次是直接拉取全量数据的，完整的实现在 <code>r.list(stopCh)</code> 调用的函数中，该函数主要流程如下：</p><ul><li><code>r.listerWatcher.List(opts)</code>：调用拉取全量数据</li><li><code>meta.ExtractList(list)</code>：<code>runtime.Object</code> 转换成 <code>[]runtime.Object</code> 列表</li><li><code>r.syncWith(items, resourceVersion)</code>：同步数据到 Delta FIFO 队列中</li><li><code>r.setLastSyncResourceVersion(resourceVersion)</code>：刷新版本号</li></ul><p>拉取全量数据时，为避免给服务器造成太大压力，首先使用的是分页方式分片拉取：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/reflector.go:420</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="keyword">if</span> r := <span class="built_in">recover</span>(); r != <span class="literal">nil</span> &#123;</span><br><span class="line">panicCh &lt;- r</span><br><span class="line">&#125;</span><br><span class="line">&#125;()</span><br><span class="line"><span class="comment">// Attempt to gather list in chunks, if supported by listerWatcher, if not, the first</span></span><br><span class="line"><span class="comment">// list request will return the full response.</span></span><br><span class="line">pager := pager.New(pager.SimplePageFunc(<span class="function"><span class="keyword">func</span><span class="params">(opts metav1.ListOptions)</span></span> (runtime.Object, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">return</span> r.listerWatcher.List(opts)</span><br><span class="line">&#125;))</span><br><span class="line"><span class="keyword">switch</span> &#123;</span><br><span class="line"><span class="keyword">case</span> r.WatchListPageSize != <span class="number">0</span>:</span><br><span class="line">pager.PageSize = r.WatchListPageSize</span><br><span class="line"><span class="keyword">case</span> r.paginatedResult:</span><br><span class="line"><span class="keyword">case</span> options.ResourceVersion != <span class="string">&quot;&quot;</span> &amp;&amp; options.ResourceVersion != <span class="string">&quot;0&quot;</span>:</span><br><span class="line">pager.PageSize = <span class="number">0</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">list, paginatedResult, err = pager.List(context.Background(), options)</span><br><span class="line"><span class="keyword">if</span> isExpiredError(err) || isTooLargeResourceVersionError(err) &#123;</span><br><span class="line">r.setIsLastSyncResourceVersionUnavailable(<span class="literal">true</span>)</span><br><span class="line">list, paginatedResult, err = pager.List(context.Background(), metav1.ListOptions&#123;ResourceVersion: r.relistResourceVersion()&#125;)</span><br><span class="line">&#125;</span><br><span class="line"><span class="built_in">close</span>(listCh)</span><br><span class="line">&#125;()</span><br></pre></td></tr></table></figure><p>最终调用的是 <code>r.listerWatcher.List(opts)</code> 方法来拉取数据，这个方法会基于 <code>ResourceVersion</code> 获取指定资源下所有对象。比如 <code>Pod</code> 最终调用的是 Pod Informer 的 <code>ListFunc</code> 方法，通过 client-go 向 Kubernetes 发起 API 请求来获取资源数据。</p><h3 id="监控资源变更"><a href="#监控资源变更" class="headerlink" title="监控资源变更"></a>监控资源变更</h3><p><code>ListAndWatch</code> 函数的最后一部分逻辑是监控资源变化的，原理是通过 HTTP 与 APIServer 建立长连接，基于 HTTP 协议的分块传输编码（chunked）实现：</p><ul><li>当 client-go 请求 API Server 并带了 watch 参数时，API Server 在响应中头中会带有 <code>Transfer-Encoding: chunked</code>，表示使用分块传输编码（<a href="https://github.com/kubernetes/kubernetes/blob/c9ed04762f94a319d7b1fb718dc345491a32bea6/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch.go#L164">参考</a>：<code>staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch.go:164</code>）；</li><li>client-go 通过创建 <code>StreamWatcher</code> 的方式创建一个管道，监听新的数据并传回（<a href="https://github.com/kubernetes/kubernetes/blob/c9ed04762f94a319d7b1fb718dc345491a32bea6/staging/src/k8s.io/client-go/rest/request.go#L765">参考</a>：<code>staging/src/k8s.io/client-go/rest/request.go:765</code>）。</li></ul><p>Reflector 里，Watch 管道通过调用 <code>r.listerWatcher.Watch(options)</code> 方法创建，这个方法最终由 Informer 的 <code>WatchFunc</code> 实现，如 Pod 的 Informer 是这样实现的：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/informers/core/v1/pod.go</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewFilteredPodInformer</span><span class="params">(...)</span></span> cache.SharedIndexInformer &#123;</span><br><span class="line"><span class="keyword">return</span> cache.NewSharedIndexInformer(</span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line">WatchFunc: <span class="function"><span class="keyword">func</span><span class="params">(options metav1.ListOptions)</span></span> (watch.Interface, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> tweakListOptions != <span class="literal">nil</span> &#123;</span><br><span class="line">tweakListOptions(&amp;options)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> client.CoreV1().Pods(namespace).Watch(options)</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line">)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看出最终还是由 client-go 封装好的方法实现具体的功能，上面提过，client-go 内部会创建一个 <code>StreamWatcher</code> 对象返回，后面可以通过 <code>w.ResultChan()</code> 管道获取数据。</p><p>获取到 <code>StreamWatcher</code> 对象后，调用 <code>watchHandler</code> 进入监控处理逻辑。当有数据通过 <code>w.ResultChan()</code> 管道传递过来时，根据不同的事件类型调用 Delta FIFO 的不同方法更新缓存。</p><p>新的数据会带来新的 <code>resourceVersion</code>，处理完数据对应的事件后会通过 <code>setLastSyncResourceVersion(resourceVersion)</code> 方法更新当前 <code>Watch</code> 的数据版本。当网络原因等导致 <code>watch</code> 的长连接中断后，会基于本地数据版本的 <code>resourceVersion</code> 重新建立 <code>watch</code> 连接。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/reflector.go:405</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Reflector)</span></span> watch(w watch.Interface, stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;, resyncerrc <span class="keyword">chan</span> <span class="type">error</span>) <span class="type">error</span> &#123;</span><br><span class="line"><span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">retry := NewRetryWithDeadline(r.MaxInternalErrorRetryDuration, time.Minute, apierrors.IsInternalError, r.clock)</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line"><span class="comment">// give the stopCh a chance to stop the loop, even in case of continue statements further down on errors</span></span><br><span class="line"><span class="keyword">select</span> &#123;</span><br><span class="line"><span class="keyword">case</span> &lt;-stopCh:</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line"><span class="keyword">default</span>:</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// start the clock before sending the request, since some proxies won&#x27;t flush headers until after the first watch event is sent</span></span><br><span class="line">start := r.clock.Now()</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> w == <span class="literal">nil</span> &#123;</span><br><span class="line">timeoutSeconds := <span class="type">int64</span>(minWatchTimeout.Seconds() * (rand.Float64() + <span class="number">1.0</span>))</span><br><span class="line">options := metav1.ListOptions&#123;</span><br><span class="line">ResourceVersion: r.LastSyncResourceVersion(),</span><br><span class="line"><span class="comment">// We want to avoid situations of hanging watchers. Stop any watchers that do not</span></span><br><span class="line"><span class="comment">// receive any events within the timeout window.</span></span><br><span class="line">TimeoutSeconds: &amp;timeoutSeconds,</span><br><span class="line"><span class="comment">// To reduce load on kube-apiserver on watch restarts, you may enable watch bookmarks.</span></span><br><span class="line"><span class="comment">// Reflector doesn&#x27;t assume bookmarks are returned at all (if the server do not support</span></span><br><span class="line"><span class="comment">// watch bookmarks, it will ignore this field).</span></span><br><span class="line">AllowWatchBookmarks: <span class="literal">true</span>,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">w, err = r.listerWatcher.Watch(options)</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>watchHandler()</code> 方法中，更新同步 <code>resourceVersion</code> 的代码片断如下：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/reflector.go:741</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">resourceVersion := meta.GetResourceVersion()</span><br><span class="line"><span class="keyword">switch</span> event.Type &#123;</span><br><span class="line"><span class="keyword">case</span> watch.Added:</span><br><span class="line">err := store.Add(event.Object)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;%s: unable to add watch event object (%#v) to store: %v&quot;</span>, name, event.Object, err))</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// ... 略</span></span><br><span class="line">&#125;</span><br><span class="line">setLastSyncResourceVersion(resourceVersion)</span><br><span class="line"><span class="keyword">if</span> rvu, ok := store.(ResourceVersionUpdater); ok &#123;</span><br><span class="line">rvu.UpdateResourceVersion(resourceVersion)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="Informer"><a href="#Informer" class="headerlink" title="Informer"></a>Informer</h2><p>Informer 是流程中最重要的节点，是整个流程的桥梁，这也是为什么常把这个机制叫 Informer 的原因。</p><p>使用 Informer 可以参考官方自定义 Controller 的<a href="https://github.com/kubernetes/sample-controller/blob/master/main.go#L61">例子</a>，这里把 Informer 操作提取出来：</p><figure class="highlight go"><figcaption><span>Informer Example</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">clientset, err := kubernetes.NewForConfig(config)</span><br><span class="line">stopCh := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line"><span class="keyword">defer</span> <span class="built_in">close</span>(stopch)</span><br><span class="line"></span><br><span class="line">informerFactory := informers.NewSharedInformerFactory(clientset, time.Minute)</span><br><span class="line"></span><br><span class="line">informer := informerFactory.Core().V1().Pods().Informer()</span><br><span class="line">informer.AddEventHandler(cache.ResourceEventHandlerFuncs&#123;</span><br><span class="line">   AddFunc: <span class="function"><span class="keyword">func</span><span class="params">(obj <span class="keyword">interface</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line"></span><br><span class="line">   &#125;,</span><br><span class="line">   UpdateFunc: <span class="function"><span class="keyword">func</span><span class="params">(oldObj, newObj <span class="keyword">interface</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line"></span><br><span class="line">   &#125;,</span><br><span class="line">   DeleteFunc: <span class="function"><span class="keyword">func</span><span class="params">(obj <span class="keyword">interface</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line"></span><br><span class="line">   &#125;,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line">informerFactory.Start(stopCh)</span><br></pre></td></tr></table></figure><h3 id="资源的-Informer"><a href="#资源的-Informer" class="headerlink" title="资源的 Informer"></a>资源的 Informer</h3><p>在上面的 Informer 操作例子中可以看到，创建 Pod 对象的 Informer 时使用的是 <code>Core().V1().Pods().Informer()</code> 这样的调用，返回 <code>cache.SharedIndexInformer</code> 接口的实例。</p><p>Pod 资源的核心逻辑都集中在 <code>SharedIndexInformer</code> 实例构造方法中：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/informers/core/v1/pod.go:58</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewFilteredPodInformer</span><span class="params">(client kubernetes.Interface, namespace <span class="type">string</span>, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions internalinterfaces.TweakListOptionsFunc)</span></span> cache.SharedIndexInformer &#123;</span><br><span class="line"><span class="keyword">return</span> cache.NewSharedIndexInformer(</span><br><span class="line">&amp;cache.ListWatch&#123;</span><br><span class="line">ListFunc: <span class="function"><span class="keyword">func</span><span class="params">(options metav1.ListOptions)</span></span> (runtime.Object, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> tweakListOptions != <span class="literal">nil</span> &#123;</span><br><span class="line">tweakListOptions(&amp;options)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> client.CoreV1().Pods(namespace).List(context.TODO(), options)</span><br><span class="line">&#125;,</span><br><span class="line">WatchFunc: <span class="function"><span class="keyword">func</span><span class="params">(options metav1.ListOptions)</span></span> (watch.Interface, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> tweakListOptions != <span class="literal">nil</span> &#123;</span><br><span class="line">tweakListOptions(&amp;options)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> client.CoreV1().Pods(namespace).Watch(context.TODO(), options)</span><br><span class="line">&#125;,</span><br><span class="line">&#125;,</span><br><span class="line">&amp;corev1.Pod&#123;&#125;,</span><br><span class="line">resyncPeriod,</span><br><span class="line">indexers,</span><br><span class="line">)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>SharedIndexInformer</code> 的构造使用的是通用的构造工厂方法 <code>cache.NewSharedIndexInformer</code>，可自定义内容只有：</p><ul><li>提供如何拉取数据和创建监控资源变化的方法（<code>ListFunc</code> 和 <code>WatchFunc</code> 会提供给 Reflector 进行调用）</li><li>Informer 的资源对象</li><li>Indexers 和 resync 间隔时间</li></ul><p>得益于这样优秀的封装，client-go 使用 <code>informer-gen</code> 生成了所有 Kubernetes 资源的 Informer 代码。</p><h3 id="Informer-共享机制"><a href="#Informer-共享机制" class="headerlink" title="Informer 共享机制"></a>Informer 共享机制</h3><p>Reflector 创建那一小节就提到过，开发 Controller 通常都是使用 <code>SharedInformerFactory</code> 来创建 Informer 的。在 <code>SharedInformerFactory</code> 中，同类型的资源会共享一个 Reflector，获取 Informer 时会先在内部的缓存里查询，如果不存在对应的 Informer 才会创建一个新的，否则复用缓存的。</p><p><code>SharedInformerFactory</code> 缓存 Informer 的数据结构如下：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/informers/factory.go</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> sharedInformerFactory <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line">informers <span class="keyword">map</span>[reflect.Type]cache.SharedIndexInformer</span><br><span class="line"><span class="comment">// startedInformers is used for tracking which informers have been started.</span></span><br><span class="line"><span class="comment">// This allows Start() to be called multiple times safely.</span></span><br><span class="line">startedInformers <span class="keyword">map</span>[reflect.Type]<span class="type">bool</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当调用  <code>informerFactory.Core().V1().Pods().Informer()</code> 获取 Informer 实例时，最终调用的是 <code>podInformer</code> 结构的方法：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/informers/core/v1/pod.go:84</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *podInformer)</span></span> Informer() cache.SharedIndexInformer &#123;</span><br><span class="line">   <span class="keyword">return</span> f.factory.InformerFor(&amp;corev1.Pod&#123;&#125;, f.defaultInformer)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *podInformer)</span></span> defaultInformer(client kubernetes.Interface, resyncPeriod time.Duration) cache.SharedIndexInformer &#123;</span><br><span class="line">   <span class="keyword">return</span> NewFilteredPodInformer(client, f.namespace, resyncPeriod, cache.Indexers&#123;cache.NamespaceIndex: cache.MetaNamespaceIndexFunc&#125;, f.tweakListOptions)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看出，<code>Informer()</code> 是调用  <code>SharedInformerFactory</code> 的 <code>InformerFor</code> 方法来创建创建 <code>SharedIndexInformer</code> 实例的，这个方法最终通过 <code>f.defaultInformer</code> 调用的  <code>NewFilteredPodInformer</code> 来实现。</p><p>具体看一下  <code>InformerFor</code> 方法：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *sharedInformerFactory)</span></span> InformerFor(obj runtime.Object, newFunc internalinterfaces.NewInformerFunc) cache.SharedIndexInformer &#123;</span><br><span class="line">f.lock.Lock()</span><br><span class="line"><span class="keyword">defer</span> f.lock.Unlock()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检查是否已存在 Informer</span></span><br><span class="line">informerType := reflect.TypeOf(obj)</span><br><span class="line">informer, exists := f.informers[informerType]</span><br><span class="line"><span class="keyword">if</span> exists &#123;</span><br><span class="line"><span class="keyword">return</span> informer</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 是否自定义同步时间间隔</span></span><br><span class="line">resyncPeriod, exists := f.customResync[informerType]</span><br><span class="line"><span class="keyword">if</span> !exists &#123;</span><br><span class="line">resyncPeriod = f.defaultResync</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 构建新的 Informer</span></span><br><span class="line">informer = newFunc(f.client, resyncPeriod)</span><br><span class="line">f.informers[informerType] = informer</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> informer</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>InformerFor</code> 函数的实现很简单，和我们常写的单例工厂方法差不太多。</p><h2 id="DeltaFIFO"><a href="#DeltaFIFO" class="headerlink" title="DeltaFIFO"></a>DeltaFIFO</h2><p>前面多次提到了 DeltaFIFO，其实这是个缓冲队列，用来保存从 Reflector 拉取来的数据，在存入 DeltaFIFO 时会转换成操作事件对象。</p><p><code>DeltaFIFO</code> 的数据结构：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/delta_fifo.go:97</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> DeltaFIFO <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// lock/cond protects access to &#x27;items&#x27; and &#x27;queue&#x27;.</span></span><br><span class="line">lock sync.RWMutex</span><br><span class="line">cond sync.Cond</span><br><span class="line"></span><br><span class="line"><span class="comment">// `items` maps a key to a Deltas.</span></span><br><span class="line"><span class="comment">// Each such Deltas has at least one Delta.</span></span><br><span class="line">items <span class="keyword">map</span>[<span class="type">string</span>]Deltas</span><br><span class="line"></span><br><span class="line"><span class="comment">// `queue` maintains FIFO order of keys for consumption in Pop().</span></span><br><span class="line"><span class="comment">// There are no duplicates in `queue`.</span></span><br><span class="line"><span class="comment">// A key is in `queue` if and only if it is in `items`.</span></span><br><span class="line">queue []<span class="type">string</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// knownObjects list keys that are &quot;known&quot; --- affecting Delete(),</span></span><br><span class="line"><span class="comment">// Replace(), and Resync()</span></span><br><span class="line">knownObjects KeyListerGetter</span><br><span class="line"></span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>DeltaFIFO 保存了对资源对象的操作，如对对象的 <code>Added</code>，<code>Updated</code>，<code>Deleted</code>，<code>Sync</code> 等操作，这个对象叫 <code>Delta</code>，结构如下：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/delta_fifo.go</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// DeltaType is the type of a change (addition, deletion, etc)</span></span><br><span class="line"><span class="keyword">type</span> DeltaType <span class="type">string</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// Delta is the type stored by a DeltaFIFO. It tells you what change</span></span><br><span class="line"><span class="comment">// happened, and the object&#x27;s state after* that change.</span></span><br><span class="line"><span class="keyword">type</span> Delta <span class="keyword">struct</span> &#123;</span><br><span class="line">Type   DeltaType</span><br><span class="line">Object <span class="keyword">interface</span>&#123;&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Deltas is a list of one or more &#x27;Delta&#x27;s to an individual object.</span></span><br><span class="line"><span class="comment">// The oldest delta is at index 0, the newest delta is the last one.</span></span><br><span class="line"><span class="keyword">type</span> Deltas []Delta</span><br></pre></td></tr></table></figure><p>DeltaFIFO 结构中：</p><ul><li><code>queue</code> 字段存储资源的 key，由 <code>KeyOf</code> 函数计算得到；</li><li><code>items</code> 字段存储的 <code>Deltas</code> 数组，是具体的资源事件内容。</li></ul><p>存储结构示意如图：</p><p><img src="https://images.imoe.tech/blog/DeltaFIFO-structure.png" alt="DeltaFIFO 结构"></p><h3 id="生产者逻辑"><a href="#生产者逻辑" class="headerlink" title="生产者逻辑"></a>生产者逻辑</h3><p>Reflector 调用 <code>Add</code>、<code>Update</code> 或 <code>Replace</code> 等方法，往 DeltaFIFO 队列中增加数据。在 Reflector 的 <code>watchHandler</code> 方法中有如下代码：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/reflector.go:575</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">switch</span> event.Type &#123;</span><br><span class="line"><span class="keyword">case</span> watch.Added:</span><br><span class="line">err := store.Add(event.Object)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;%s: unable to add watch event object (%#v) to store: %v&quot;</span>, name, event.Object, err))</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> watch.Modified:</span><br><span class="line">err := store.Update(event.Object)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;%s: unable to update watch event object (%#v) to store: %v&quot;</span>, name, event.Object, err))</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> watch.Deleted:</span><br><span class="line"><span class="comment">// <span class="doctag">TODO:</span> Will any consumers need access to the &quot;last known</span></span><br><span class="line"><span class="comment">// state&quot;, which is passed in event.Object? If so, may need</span></span><br><span class="line"><span class="comment">// to change this.</span></span><br><span class="line">err := store.Delete(event.Object)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;%s: unable to delete watch event object (%#v) from store: %v&quot;</span>, name, event.Object, err))</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> watch.Bookmark:</span><br><span class="line"><span class="comment">// A `Bookmark` means watch has synced here, just update the resourceVersion</span></span><br><span class="line"><span class="keyword">default</span>:</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;%s: unable to understand watch event %#v&quot;</span>, name, event))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这段代码就是根据监控到的数据变化类型的不同，调用不同的 DeltaFIFO 方法，最后都是调用的 <code>queueActionLocked</code> 这个方法将变更入队。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/delta_fifo.go:413</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *DeltaFIFO)</span></span> queueActionLocked(actionType DeltaType, obj <span class="keyword">interface</span>&#123;&#125;) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 计算 obj 的 key</span></span><br><span class="line">id, err := f.KeyOf(obj)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> KeyError&#123;obj, err&#125;</span><br><span class="line">&#125;</span><br><span class="line">oldDeltas := f.items[id]</span><br><span class="line">newDeltas := <span class="built_in">append</span>(oldDeltas, Delta&#123;actionType, obj&#125;)</span><br><span class="line"><span class="comment">// 去重</span></span><br><span class="line">newDeltas = dedupDeltas(newDeltas)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(newDeltas) &gt; <span class="number">0</span> &#123;</span><br><span class="line"><span class="keyword">if</span> _, exists := f.items[id]; !exists &#123;</span><br><span class="line">f.queue = <span class="built_in">append</span>(f.queue, id)</span><br><span class="line">&#125;</span><br><span class="line">f.items[id] = newDeltas</span><br><span class="line"><span class="comment">// 广播通知消费</span></span><br><span class="line">f.cond.Broadcast()</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line"><span class="comment">// 不会进入这里，如果 newDeltas 不为空，dedupDeltas 不会返回空列表</span></span><br><span class="line"><span class="comment">// This never happens, because dedupDeltas never returns an empty list</span></span><br><span class="line"><span class="comment">// when given a non-empty list (as it is here).</span></span><br><span class="line"><span class="comment">// If somehow it happens anyway, deal with it but complain.</span></span><br><span class="line"><span class="keyword">if</span> oldDeltas == <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; ignoring&quot;</span>, id, oldDeltas, obj)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; breaking invariant by storing empty Deltas&quot;</span>, id, oldDeltas, obj)</span><br><span class="line">f.items[id] = newDeltas</span><br><span class="line"><span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;Impossible dedupDeltas for id=%q: oldDeltas=%#+v, obj=%#+v; broke DeltaFIFO invariant by storing empty Deltas&quot;</span>, id, oldDeltas, obj)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在成功入队后会通过 <code>f.cond.Broadcast()</code> 广播通知消费者进行消费。</p><h3 id="消费者逻辑"><a href="#消费者逻辑" class="headerlink" title="消费者逻辑"></a>消费者逻辑</h3><p>消费者是通过 <code>Pop</code> 方法进行消费的，在 Informer 启动的 <code>controller</code> 中，使用 <code>wait.Until</code> 启动了一个协程去消费 DeltaFIFO 的数据：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/controller.go:129</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *controller)</span></span> Run(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line"><span class="keyword">defer</span> utilruntime.HandleCrash()</span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">&lt;-stopCh</span><br><span class="line">c.config.Queue.Close()</span><br><span class="line">&#125;()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 这里省略了 reflector 启动代码</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 启动 DeltaFIFO 消费协程</span></span><br><span class="line">wait.Until(c.processLoop, time.Second, stopCh)</span><br><span class="line">wg.Wait()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>processLoop</code> 函数使用一个无限循环去消费 DeltaFIFO 队列里的数据：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/controller.go:186</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *controller)</span></span> processLoop() &#123;</span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line">obj, err := c.config.Queue.Pop(PopProcessFunc(c.config.Process))</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err == ErrFIFOClosed &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> c.config.RetryOnError &#123;</span><br><span class="line"><span class="comment">// This is the safe way to re-enqueue.</span></span><br><span class="line">c.config.Queue.AddIfNotPresent(obj)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>Pop</code> 方法需要传入一个 <code>PopProcessFunc</code> 类型的处理函数，当数据成功出队后会调用这个函数处理数据。这里的 <code>c.config.Process</code> 是在 Informer 中定义的 <code>s.HandleDeltas</code> 方法，在 <code>controller</code> 构造时传入的。</p><p><code>HandleDeltas</code> 调用 <code>processDeltas</code> 方法进行处理，这个方法做两件事：</p><ul><li>更新 Indexer 缓存；</li><li>触发注册的 Handler 处理事件。</li></ul><p>先看看 DeltaFIFO 出队方法 <code>Pop</code> 是怎么实现的：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *DeltaFIFO)</span></span> Pop(process PopProcessFunc) (<span class="keyword">interface</span>&#123;&#125;, <span class="type">error</span>) &#123;</span><br><span class="line">f.lock.Lock()</span><br><span class="line"><span class="keyword">defer</span> f.lock.Unlock()</span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line"><span class="keyword">for</span> <span class="built_in">len</span>(f.queue) == <span class="number">0</span> &#123;</span><br><span class="line"><span class="comment">// When the queue is empty, invocation of Pop() is blocked until new item is enqueued.</span></span><br><span class="line"><span class="comment">// When Close() is called, the f.closed is set and the condition is broadcasted.</span></span><br><span class="line"><span class="comment">// Which causes this loop to continue and return from the Pop().</span></span><br><span class="line"><span class="keyword">if</span> f.closed &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, ErrFIFOClosed</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 队列为空时，阻塞等待</span></span><br><span class="line">f.cond.Wait()</span><br><span class="line">&#125;</span><br><span class="line">isInInitialList := !f.hasSynced_locked()</span><br><span class="line">id := f.queue[<span class="number">0</span>]</span><br><span class="line">f.queue = f.queue[<span class="number">1</span>:]</span><br><span class="line">depth := <span class="built_in">len</span>(f.queue)</span><br><span class="line"><span class="keyword">if</span> f.initialPopulationCount &gt; <span class="number">0</span> &#123;</span><br><span class="line">f.initialPopulationCount--</span><br><span class="line">&#125;</span><br><span class="line">item, ok := f.items[id]</span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="comment">// This should never happen</span></span><br><span class="line">klog.Errorf(<span class="string">&quot;Inconceivable! %q was in f.queue but not f.items; ignoring.&quot;</span>, id)</span><br><span class="line"><span class="keyword">continue</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 出队后删除缓存的数据</span></span><br><span class="line"><span class="built_in">delete</span>(f.items, id)</span><br><span class="line"><span class="comment">// 调用跟踪日志</span></span><br><span class="line"><span class="keyword">if</span> depth &gt; <span class="number">10</span> &#123;</span><br><span class="line">trace := utiltrace.New(<span class="string">&quot;DeltaFIFO Pop Process&quot;</span>,</span><br><span class="line">utiltrace.Field&#123;Key: <span class="string">&quot;ID&quot;</span>, Value: id&#125;,</span><br><span class="line">utiltrace.Field&#123;Key: <span class="string">&quot;Depth&quot;</span>, Value: depth&#125;,</span><br><span class="line">utiltrace.Field&#123;Key: <span class="string">&quot;Reason&quot;</span>, Value: <span class="string">&quot;slow event handlers blocking the queue&quot;</span>&#125;)</span><br><span class="line"><span class="keyword">defer</span> trace.LogIfLong(<span class="number">100</span> * time.Millisecond)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 调用处理函数</span></span><br><span class="line">err := process(item, isInInitialList)</span><br><span class="line"><span class="keyword">if</span> e, ok := err.(ErrRequeue); ok &#123;</span><br><span class="line">f.addIfNotPresent(id, item)</span><br><span class="line">err = e.Err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// Don&#x27;t need to copyDeltas here, because we&#x27;re transferring</span></span><br><span class="line"><span class="comment">// ownership to the caller.</span></span><br><span class="line"><span class="keyword">return</span> item, err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当 DeltaFIFO 队列不为空时，取出 <code>f.queue</code> 第一个元素，并调用消费者函数进行处理，如果处理函数返回 <code>ErrRequeue</code> 错误会重新入队。</p><p>当 <code>Pop</code> 返回后，<code>processLoop</code> 会再次进行判断，如果开启了 <code>c.config.RetryOnError</code> 功能也会重新入队进行重试。</p><p><code>HandleDeltas</code> 实现只是简单对 <code>obj</code> 进行了类型检查，处理逻辑在 <code>processDeltas</code> 方法中：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:635</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *sharedIndexInformer)</span></span> HandleDeltas(obj <span class="keyword">interface</span>&#123;&#125;, isInInitialList <span class="type">bool</span>) <span class="type">error</span> &#123;</span><br><span class="line">s.blockDeltas.Lock()</span><br><span class="line"><span class="keyword">defer</span> s.blockDeltas.Unlock()</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> deltas, ok := obj.(Deltas); ok &#123;</span><br><span class="line"><span class="keyword">return</span> processDeltas(s, s.indexer, s.transform, deltas, isInInitialList)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> errors.New(<span class="string">&quot;object given as Process argument is not Deltas&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在处理前，对 <code>blockDeltas</code> 锁进行了加锁操作，这个锁的用处是保证当注册新的 Event Handler 时能暂停事件分发操作，避免并发问题。<code>AddEventHandler</code> 等方法注册事件处理器时也会加这个锁。</p><p><code>processDeltas</code> 方法对 <code>deltas</code> 进行遍历更新缓存和触发事件：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/controller.go:447</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">processDeltas</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">// Object which receives event notifications from the given deltas</span></span></span><br><span class="line"><span class="params"><span class="function">handler ResourceEventHandler,</span></span></span><br><span class="line"><span class="params"><span class="function">clientState Store,</span></span></span><br><span class="line"><span class="params"><span class="function">transformer TransformFunc,</span></span></span><br><span class="line"><span class="params"><span class="function">deltas Deltas,</span></span></span><br><span class="line"><span class="params"><span class="function">isInInitialList <span class="type">bool</span>,</span></span></span><br><span class="line"><span class="params"><span class="function">)</span></span> <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// from oldest to newest</span></span><br><span class="line"><span class="keyword">for</span> _, d := <span class="keyword">range</span> deltas &#123;</span><br><span class="line">obj := d.Object</span><br><span class="line"><span class="keyword">if</span> transformer != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">obj, err = transformer(obj)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">switch</span> d.Type &#123;</span><br><span class="line"><span class="keyword">case</span> Sync, Replaced, Added, Updated:</span><br><span class="line"><span class="keyword">if</span> old, exists, err := clientState.Get(obj); err == <span class="literal">nil</span> &amp;&amp; exists &#123;</span><br><span class="line"><span class="comment">// 更新 Indexer</span></span><br><span class="line"><span class="keyword">if</span> err := clientState.Update(obj); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 触发 event handler 的 OnUpdate 方法</span></span><br><span class="line">handler.OnUpdate(old, obj)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line"><span class="keyword">if</span> err := clientState.Add(obj); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">handler.OnAdd(obj, isInInitialList)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> Deleted:</span><br><span class="line"><span class="keyword">if</span> err := clientState.Delete(obj); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">handler.OnDelete(obj)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码中的 <code>handler</code> 对应的对象是 <code>Informer</code> 对象，这个 Informer 对象也实现了 <code>ResourceEventHandler</code> 接口，并将调用代理给内部的 <code>processor</code>：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:646</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *sharedIndexInformer)</span></span> OnAdd(obj <span class="keyword">interface</span>&#123;&#125;, isInInitialList <span class="type">bool</span>) &#123;</span><br><span class="line"><span class="comment">// Invocation of this function is locked under s.blockDeltas, so it is</span></span><br><span class="line"><span class="comment">// save to distribute the notification</span></span><br><span class="line">s.cacheMutationDetector.AddObject(obj)</span><br><span class="line">s.processor.distribute(addNotification&#123;newObj: obj, isInInitialList: isInInitialList&#125;, <span class="literal">false</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="Processor-逻辑"><a href="#Processor-逻辑" class="headerlink" title="Processor 逻辑"></a>Processor 逻辑</h4><p>前面代码中 Informer事件的响应处理是将事件代理给 <code>processor</code> 处理， <code>processor</code> 是维护的是之前注册到 Informer 中 Event Handler 的，<code>distribute</code> 的逻辑是将事件转发给这些方法进行处理：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:776</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *sharedProcessor)</span></span> distribute(obj <span class="keyword">interface</span>&#123;&#125;, sync <span class="type">bool</span>) &#123;</span><br><span class="line">p.listenersLock.RLock()</span><br><span class="line"><span class="keyword">defer</span> p.listenersLock.RUnlock()</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> listener, isSyncing := <span class="keyword">range</span> p.listeners &#123;</span><br><span class="line"><span class="keyword">switch</span> &#123;</span><br><span class="line"><span class="keyword">case</span> !sync:</span><br><span class="line"><span class="comment">// non-sync messages are delivered to every listener</span></span><br><span class="line">listener.add(obj)</span><br><span class="line"><span class="keyword">case</span> isSyncing:</span><br><span class="line"><span class="comment">// sync messages are delivered to every syncing listener</span></span><br><span class="line">listener.add(obj)</span><br><span class="line"><span class="keyword">default</span>:</span><br><span class="line"><span class="comment">// skipping a sync obj for a non-syncing listener</span></span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>distribute</code> 代码中的 <code>listener</code> 是 <code>processorListener</code> 类型的对象，这个对象是在向 Informer 注册 Event Handler 时调用的 <code>AddEventHandlerWithResyncPeriod</code> 方法中创建的：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *sharedIndexInformer)</span></span> AddEventHandlerWithResyncPeriod(handler ResourceEventHandler, resyncPeriod time.Duration) (ResourceEventHandlerRegistration, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// 上面代码略</span></span><br><span class="line"></span><br><span class="line">listener := newProcessListener(handler, resyncPeriod, determineResyncPeriod(resyncPeriod, s.resyncCheckPeriod), s.clock.Now(), initialBufferSize, s.HasSynced)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> !s.started &#123;</span><br><span class="line"><span class="keyword">return</span> s.processor.addListener(listener), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 下面代码略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看出，一个 <code>handler</code> 对应一个 <code>listener</code>，但是在调 <code>listener.add(obj)</code> 时并不是直接调用 <code>handler</code> 的，这里面另有玄机。</p><p>在 Informer 的启动代码中启动了一个协程 <code>s.processor.run</code>，下面是 Informer 启动代码：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:458</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *sharedIndexInformer)</span></span> Run(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line"><span class="comment">// 上略</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// Separate stop channel because Processor should be stopped strictly after controller</span></span><br><span class="line">processorStopCh := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line"><span class="keyword">var</span> wg wait.Group</span><br><span class="line"><span class="keyword">defer</span> wg.Wait()              <span class="comment">// Wait for Processor to stop</span></span><br><span class="line"><span class="keyword">defer</span> <span class="built_in">close</span>(processorStopCh) <span class="comment">// Tell Processor to stop</span></span><br><span class="line">wg.StartWithChannel(processorStopCh, s.cacheMutationDetector.Run)</span><br><span class="line">wg.StartWithChannel(processorStopCh, s.processor.run)</span><br><span class="line"><span class="comment">// 下略</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个 <code>run</code> 方法的功能主要就是启动所有 <code>listener</code> 的 <code>pop</code> 和 <code>run</code> 协程并等待退出消息，当收到退出消息后关闭这些协程：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:794</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *sharedProcessor)</span></span> run(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;) &#123;</span><br><span class="line"><span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">p.listenersLock.RLock()</span><br><span class="line"><span class="keyword">defer</span> p.listenersLock.RUnlock()</span><br><span class="line"><span class="keyword">for</span> listener := <span class="keyword">range</span> p.listeners &#123;</span><br><span class="line">p.wg.Start(listener.run)</span><br><span class="line">p.wg.Start(listener.pop)</span><br><span class="line">&#125;</span><br><span class="line">p.listenersStarted = <span class="literal">true</span></span><br><span class="line">&#125;()</span><br><span class="line">&lt;-stopCh</span><br><span class="line"></span><br><span class="line">p.listenersLock.Lock()</span><br><span class="line"><span class="keyword">defer</span> p.listenersLock.Unlock()</span><br><span class="line"><span class="keyword">for</span> listener := <span class="keyword">range</span> p.listeners &#123;</span><br><span class="line"><span class="comment">// 关闭 pop</span></span><br><span class="line"><span class="built_in">close</span>(listener.addCh) <span class="comment">// Tell .pop() to stop. .pop() will tell .run() to stop</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Wipe out list of listeners since they are now closed</span></span><br><span class="line"><span class="comment">// (processorListener cannot be re-used)</span></span><br><span class="line">p.listeners = <span class="literal">nil</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// Reset to false since no listeners are running</span></span><br><span class="line">p.listenersStarted = <span class="literal">false</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 等待 pop 和 run 关闭</span></span><br><span class="line">p.wg.Wait() <span class="comment">// Wait for all .pop() and .run() to stop</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>s.processor.run</code> 启动了两个协程：</p><ul><li><code>pop</code>：从 <code>p.addCh</code> 管道里取出推送的事件，经过一个 Ring Buffer 缓冲，再通过 <code>p.nextCh</code> 管道传递给 <code>run</code> 协程，<code>p.addCh</code> 管道关闭时关闭 <code>p.nextCh</code> 管道（通知 <code>run</code> 退出）；</li><li><code>run</code>：从 <code>p.nextCh</code> 管道中取出数据，调用注册的 handler 处理，当 <code>p.nextCh</code> 管道关闭时，退出协程。</li></ul><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:933</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *processorListener)</span></span> pop() &#123;</span><br><span class="line"><span class="keyword">defer</span> utilruntime.HandleCrash()</span><br><span class="line"><span class="keyword">defer</span> <span class="built_in">close</span>(p.nextCh) <span class="comment">// Tell .run() to stop</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> nextCh <span class="keyword">chan</span>&lt;- <span class="keyword">interface</span>&#123;&#125;</span><br><span class="line"><span class="keyword">var</span> notification <span class="keyword">interface</span>&#123;&#125;</span><br><span class="line"><span class="keyword">for</span> &#123;</span><br><span class="line"><span class="keyword">select</span> &#123;</span><br><span class="line"><span class="keyword">case</span> nextCh &lt;- notification:</span><br><span class="line"><span class="comment">// 上个 notification 发送成功</span></span><br><span class="line"><span class="keyword">var</span> ok <span class="type">bool</span></span><br><span class="line">notification, ok = p.pendingNotifications.ReadOne()</span><br><span class="line"><span class="keyword">if</span> !ok &#123; <span class="comment">// 缓冲中无待发数据</span></span><br><span class="line">nextCh = <span class="literal">nil</span> <span class="comment">// 禁用当前分支</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> notificationToAdd, ok := &lt;-p.addCh:</span><br><span class="line"><span class="comment">// 当 p.addCh 关闭时退出方法</span></span><br><span class="line"><span class="keyword">if</span> !ok &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 若当前无数据待发送且 pendingNotifications 缓冲无数据</span></span><br><span class="line"><span class="keyword">if</span> notification == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// 直接将获取到的新数据设置为待发送</span></span><br><span class="line"><span class="comment">// 无需加到缓冲中</span></span><br><span class="line"><span class="comment">// Optimize the case - skip adding to pendingNotifications</span></span><br><span class="line">notification = notificationToAdd</span><br><span class="line">nextCh = p.nextCh</span><br><span class="line">&#125; <span class="keyword">else</span> &#123; </span><br><span class="line"><span class="comment">// 已经有 notification 待发送，放入缓冲</span></span><br><span class="line">p.pendingNotifications.WriteOne(notificationToAdd)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *processorListener)</span></span> run() &#123;</span><br><span class="line">stopCh := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line">wait.Until(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="comment">// 从 p.nextCh 获取事件并调用</span></span><br><span class="line"><span class="keyword">for</span> next := <span class="keyword">range</span> p.nextCh &#123;</span><br><span class="line"><span class="keyword">switch</span> notification := next.(<span class="keyword">type</span>) &#123;</span><br><span class="line"><span class="keyword">case</span> updateNotification:</span><br><span class="line">p.handler.OnUpdate(notification.oldObj, notification.newObj)</span><br><span class="line"><span class="keyword">case</span> addNotification:</span><br><span class="line">p.handler.OnAdd(notification.newObj, notification.isInInitialList)</span><br><span class="line"><span class="keyword">if</span> notification.isInInitialList &#123;</span><br><span class="line">p.syncTracker.Finished()</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">case</span> deleteNotification:</span><br><span class="line">p.handler.OnDelete(notification.oldObj)</span><br><span class="line"><span class="keyword">default</span>:</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;unrecognized notification: %T&quot;</span>, next))</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// p.nextCh 关闭时退出</span></span><br><span class="line"><span class="built_in">close</span>(stopCh)</span><br><span class="line">&#125;, <span class="number">1</span>*time.Second, stopCh)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>pop</code> 方法使用 <code>select</code> 多分支并用 Ring Buffer 的好外是，当 <code>run</code> 协程来不及处理时，新来的数据可以进入第二分支，将数据放到 <code>pendingNotifications</code> 中缓存。可以保证 <code>processor</code> 能一直接受新数据。</p><p>现在回头看消费 DeltaFIFO 时调用的 <code>distribute</code> 方法里，最终调用的 <code>listener.add(obj)</code> 是怎么实现的：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/shared_informer.go:926</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *processorListener)</span></span> add(notification <span class="keyword">interface</span>&#123;&#125;) &#123;</span><br><span class="line"><span class="keyword">if</span> a, ok := notification.(addNotification); ok &amp;&amp; a.isInInitialList &#123;</span><br><span class="line">p.syncTracker.Start()</span><br><span class="line">&#125;</span><br><span class="line">p.addCh &lt;- notification</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>该方法直接把传入的 <code>notification</code> 发送到 <code>p.addCh</code> 管道，这个管道在创建时没有设置大小，没有缓存（无消费阻塞），所以 <code>pop</code> 方法必须尽快取走数据，不然会影响 DeltaFIFO 的消费。</p><p><img src="https://images.imoe.tech/blog/DeltaFIFO-pop-and-run.png" alt="pop 和 run 处理流程"></p><h3 id="Resync-机制"><a href="#Resync-机制" class="headerlink" title="Resync 机制"></a>Resync 机制</h3><p>在 Reflector 主流程 <code>ListAndWatch</code> 中增提过 DeltaFIFO 的 Resync 机制，Resync 机制存在的作用是让处理失败的事件有重新处理的机会。</p><p>在处理 Informer 事件回调时，可能存在处理失败的情况，且由前面讨论过的 <code>run</code> 流程可知，报错的事件会被跳过，并不会重试报错的事件。</p><p>Resync 机制会定期将 Indexer 中的缓存同步到 DeltaFIFO 中，重新走一遍消费流程，不过与之前不同的是 Resync 的数据的事件类型是 <code>Sync</code>。</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/delta_fifo.go:666</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *DeltaFIFO)</span></span> Resync() <span class="type">error</span> &#123;</span><br><span class="line">f.lock.Lock()</span><br><span class="line"><span class="keyword">defer</span> f.lock.Unlock()</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> f.knownObjects == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取 Indexer 所有 key，并传入 syncKeyLocked 处理</span></span><br><span class="line">keys := f.knownObjects.ListKeys()</span><br><span class="line"><span class="keyword">for</span> _, k := <span class="keyword">range</span> keys &#123;</span><br><span class="line"><span class="keyword">if</span> err := f.syncKeyLocked(k); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(f *DeltaFIFO)</span></span> syncKeyLocked(key <span class="type">string</span>) <span class="type">error</span> &#123;</span><br><span class="line">obj, exists, err := f.knownObjects.GetByKey(key)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Unexpected error %v during lookup of key %v, unable to queue object for sync&quot;</span>, err, key)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125; <span class="keyword">else</span> <span class="keyword">if</span> !exists &#123;</span><br><span class="line">klog.Infof(<span class="string">&quot;Key %v does not exist in known objects store, unable to queue object for sync&quot;</span>, key)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 如果 DeltaFIFO 中已经存在同样 Key 的数据，说明有新 event，忽略</span></span><br><span class="line">id, err := f.KeyOf(obj)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> KeyError&#123;obj, err&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(f.items[id]) &gt; <span class="number">0</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 入队操作</span></span><br><span class="line"><span class="keyword">if</span> err := f.queueActionLocked(Sync, obj); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;couldn&#x27;t queue object: %v&quot;</span>, err)</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>通过 Resync 机制产生的事件会以 <code>updateNotification</code> 的形式发送，最终触发 <code>onUpdate</code> 事件回调。</p><h2 id="Indexer"><a href="#Indexer" class="headerlink" title="Indexer"></a>Indexer</h2><p>Indexer 是 client-go 里存储 Kubernetes 资源的本地缓存，当创建资源的 Informer，对应资源的全量数据都会缓存在对应的 Indexer 中并通过 Reflector 监听变更同步更新。client-go 查询时就可以优先查询本地缓存，降低 Kubernetes APIServer 和 ETCD 的压力。</p><p>Indexer 内部基于 <code>ThreadSafeMap</code> 实现：</p><figure class="highlight go"><figcaption><span>staging/src/k8s.io/client-go/tools/cache/store.go:139</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// `*cache` implements Indexer in terms of a ThreadSafeStore and an</span></span><br><span class="line"><span class="comment">// associated KeyFunc.</span></span><br><span class="line"><span class="keyword">type</span> cache <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// cacheStorage bears the burden of thread safety for the cache</span></span><br><span class="line">cacheStorage ThreadSafeStore</span><br><span class="line"><span class="comment">// keyFunc is used to make the key for objects stored in and retrieved from items, and</span></span><br><span class="line"><span class="comment">// should be deterministic.</span></span><br><span class="line">keyFunc KeyFunc</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewIndexer</span><span class="params">(keyFunc KeyFunc, indexers Indexers)</span></span> Indexer &#123;  </span><br><span class="line">   <span class="keyword">return</span> &amp;cache&#123;  </span><br><span class="line">      cacheStorage: NewThreadSafeStore(indexers, Indices&#123;&#125;),  </span><br><span class="line">      keyFunc:      keyFunc,  </span><br><span class="line">   &#125;  </span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>NewThreadSafeStore</code> 使用的 <code>threadSafeMap</code> 来创建：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewThreadSafeStore</span><span class="params">(indexers Indexers, indices Indices)</span></span> ThreadSafeStore &#123;</span><br><span class="line"><span class="keyword">return</span> &amp;threadSafeMap&#123;</span><br><span class="line">items: <span class="keyword">map</span>[<span class="type">string</span>]<span class="keyword">interface</span>&#123;&#125;&#123;&#125;,</span><br><span class="line">index: &amp;storeIndex&#123;</span><br><span class="line">indexers: indexers,</span><br><span class="line">indices:  indices,</span><br><span class="line">&#125;,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="threadSafeMap"><a href="#threadSafeMap" class="headerlink" title="threadSafeMap"></a>threadSafeMap</h3><p><code>threadSafeMap</code> 是线程安全的 <code>Map</code>，类似于 Go 的 <code>sync.Map</code>，只是 Kubernetes 开发时还没有 <code>sync.Map</code>。</p><p><code>threadSafeMap</code> 实现了 <code>ThreadSafeStore</code>  接口，而 <code>Indexer</code>  接口和 <code>ThreadSafeStore</code> 接口是一样的，<code>Indexer</code> 是对 <code>threadSafeMap</code> 的封装，增加了 <code>keyFunc</code> 的功能。<code>threadSafeMap</code> 结构如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// threadSafeMap implements ThreadSafeStore</span></span><br><span class="line"><span class="keyword">type</span> threadSafeMap <span class="keyword">struct</span> &#123;</span><br><span class="line">lock  sync.RWMutex</span><br><span class="line">items <span class="keyword">map</span>[<span class="type">string</span>]<span class="keyword">interface</span>&#123;&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// index implements the indexing functionality</span></span><br><span class="line">index *storeIndex</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> storeIndex <span class="keyword">struct</span> &#123;</span><br><span class="line"><span class="comment">// indexers maps a name to an IndexFunc</span></span><br><span class="line">indexers Indexers</span><br><span class="line"><span class="comment">// indices maps a name to an Index</span></span><br><span class="line">indices Indices</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Index <span class="keyword">map</span>[<span class="type">string</span>]sets.String</span><br><span class="line"></span><br><span class="line"><span class="comment">// Indexers maps a name to an IndexFunc</span></span><br><span class="line"><span class="keyword">type</span> Indexers <span class="keyword">map</span>[<span class="type">string</span>]IndexFunc</span><br><span class="line"></span><br><span class="line"><span class="comment">// Indices maps a name to an Index</span></span><br><span class="line"><span class="keyword">type</span> Indices <span class="keyword">map</span>[<span class="type">string</span>]Index</span><br></pre></td></tr></table></figure><p><code>threadSafeMap</code> 中各字段的作用：</p><ul><li><code>items</code> 用于存储缓存的数据，<code>key -&gt; 资源对象</code>；</li><li><code>indexers</code> 存储的是索引生成函数的名字和函数引用，<code>索引生成函数名字 -&gt; 函数</code>；</li><li><code>indices</code> 是存储索引生成函数名字和用这个函数生成的索引数据，<code>生成函数名字 -&gt; 索引数据</code>；</li><li><code>Index</code> 存储的是索引数据，<code>索引(索引生成函数产生) -&gt; key 列表</code>。</li></ul><p><img src="https://images.imoe.tech/blog/threadSafeMap-structure.png" alt="threadSafeMap"></p><p>在 <code>SharedIndexInformer</code> 中，默认使用 <code>DeletionHandlingMetaNamespaceKeyFunc</code> 生成 <code>items</code>  <code>&lt;namespace&gt;/&lt;name&gt;</code> 格式的 key。</p><h3 id="Indexer-索引器"><a href="#Indexer-索引器" class="headerlink" title="Indexer 索引器"></a>Indexer 索引器</h3><p>从前面 <code>NewIndexer</code> 函数可以看出，Indexer 支持自定义索引函数。在之前讨论<a href="#Informer-%E5%85%B1%E4%BA%AB%E6%9C%BA%E5%88%B6">Informer 共享机制</a>里，Pod Informer 创建时调用的 <code>defaultInformer</code> 方法使用了一个默认的索引函数：</p><figure class="highlight go"><figcaption><span>defaultInformer 中创建 Indexer</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cache.Indexers&#123;cache.NamespaceIndex: cache.MetaNamespaceIndexFunc&#125;</span><br></pre></td></tr></table></figure><p>该索引函数使用资源的 <code>namespace</code> 创建索引，返回一个资源对象的索引列表，依照这个实现实现一个自定义的：</p><figure class="highlight go"><figcaption><span>使用 Indexer</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">UidIndexFunc</span><span class="params">(obj interfaces&#123;&#125;)</span></span> ([]<span class="type">string</span>, <span class="type">error</span>) &#123;</span><br><span class="line">  pod := obj.(*v1.Pod)</span><br><span class="line">  uid := pod.Annotations[<span class="string">&quot;uid&quot;</span>]</span><br><span class="line">  <span class="keyword">return</span> []<span class="type">string</span>&#123;uid&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line">  index := cache.NewIndexer(cache.MetaNamespaceKeyFunc, cache.Indexers&#123;<span class="string">&quot;userId&quot;</span>: UidIndexFunc&#125;)</span><br><span class="line">  pod := &amp;v1.Pod&#123;ObjectMeta: metav1.ObjectMeta&#123;Name: <span class="string">&quot;test&quot;</span>, Annotations: <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>&#123;<span class="string">&quot;uid&quot;</span>: <span class="string">&quot;id1&quot;</span>&#125;&#125;&#125;</span><br><span class="line"></span><br><span class="line">  index.Add(pod)</span><br><span class="line"></span><br><span class="line">  pods, err := index.ByIndex(<span class="string">&quot;userId&quot;</span>, <span class="string">&quot;id1&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>上面代码中展示的是独立使用 Indexer 的情况，若想与 <code>SharedInformerFactory</code> 结合使用，可以调用 <code>SharedIndexInformer</code> 的 <code>AddIndexers</code> 方法，下面代码来自 <code>external-provisioner</code> 的用法：</p><figure class="highlight go"><figcaption><span>vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:687</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> controller.claimInformer != <span class="literal">nil</span> &#123;</span><br><span class="line">controller.claimInformer.AddEventHandlerWithResyncPeriod(claimHandler, controller.resyncPeriod)</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">controller.claimInformer = informer.Core().V1().PersistentVolumeClaims().Informer()</span><br><span class="line">controller.claimInformer.AddEventHandler(claimHandler)</span><br><span class="line">&#125;</span><br><span class="line">err = controller.claimInformer.AddIndexers(cache.Indexers&#123;uidIndex: <span class="function"><span class="keyword">func</span><span class="params">(obj <span class="keyword">interface</span>&#123;&#125;)</span></span> ([]<span class="type">string</span>, <span class="type">error</span>) &#123;</span><br><span class="line">uid, err := getObjectUID(obj)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> []<span class="type">string</span>&#123;uid&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;&#125;)</span><br></pre></td></tr></table></figure><h3 id="Indexer-索引查询"><a href="#Indexer-索引查询" class="headerlink" title="Indexer 索引查询"></a>Indexer 索引查询</h3><p>查询数据可以使用 Indexer 的 <code>Get</code> 方法，更常用的是使用  <code>ByIndex</code> 方法，能与自定义的索引函数结合使用，如 <code>external-provisioner</code> 里获取数据的方法：</p><figure class="highlight go"><figcaption><span>vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:1015</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ctrl *ProvisionController)</span></span> syncClaimHandler(ctx context.Context, key <span class="type">string</span>) <span class="type">error</span> &#123;</span><br><span class="line"><span class="comment">// 使用自定义的索引函数获取数据</span></span><br><span class="line">objs, err := ctrl.claimsIndexer.ByIndex(uidIndex, key)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">var</span> claimObj <span class="keyword">interface</span>&#123;&#125;</span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(objs) &gt; <span class="number">0</span> &#123;</span><br><span class="line">claimObj = objs[<span class="number">0</span>] <span class="comment">// 基于业务理解，通常一个 uid 只有一个对象</span></span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">obj, found := ctrl.claimsInProgress.Load(key)</span><br><span class="line"><span class="keyword">if</span> !found &#123;</span><br><span class="line">utilruntime.HandleError(fmt.Errorf(<span class="string">&quot;claim %q in work queue no longer exists&quot;</span>, key))</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">claimObj = obj</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> ctrl.syncClaim(ctx, claimObj)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>ByIndex</code> 方法接收两个参数：</p><ul><li><code>indexName</code>：索引函数名；</li><li><code>indexedValue</code>：索引值。</li></ul><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *threadSafeMap)</span></span> ByIndex(indexName, indexedValue <span class="type">string</span>) ([]<span class="keyword">interface</span>&#123;&#125;, <span class="type">error</span>) &#123;</span><br><span class="line">c.lock.RLock()</span><br><span class="line"><span class="keyword">defer</span> c.lock.RUnlock()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 查询索引函数下，该索引值对应的 key 列表</span></span><br><span class="line">set, err := c.index.getKeysByIndex(indexName, indexedValue)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// key 列表查原始资源对象</span></span><br><span class="line">list := <span class="built_in">make</span>([]<span class="keyword">interface</span>&#123;&#125;, <span class="number">0</span>, set.Len())</span><br><span class="line"><span class="keyword">for</span> key := <span class="keyword">range</span> set &#123;</span><br><span class="line">list = <span class="built_in">append</span>(list, c.items[key])</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> list, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(i *storeIndex)</span></span> getKeysByIndex(indexName, indexedValue <span class="type">string</span>) (sets.String, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="comment">// 查询索引函数名是否存在</span></span><br><span class="line">indexFunc := i.indexers[indexName]</span><br><span class="line"><span class="keyword">if</span> indexFunc == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;Index with name %s does not exist&quot;</span>, indexName)</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 从索引函数对应的 索引 中取出指定 索引值关联的 key 列表</span></span><br><span class="line">index := i.indices[indexName]</span><br><span class="line"><span class="keyword">return</span> index[indexedValue], <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>方法的逻辑：</p><ul><li>检查给定的索引函数名是否存在；</li><li>取出索引函数名对应的索引数据；</li><li>从索引数据中取出给定索引值所关联的 key 列表；</li><li>使用 key 列表从 <code>items</code> 中获取资源对象列表并返回。</li></ul><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>本文从 Informer 机制的设计原因和整体架构开始，深入讨论了 Informer 机制，了解了支撑 Informer 机制实现的 Reflector、Informer、DeltaFIFO 和 Indexer 几个内部组件的实现原理。</p><p>作为 client-go 重要的组成部分，基本上 Kubernetes 自定义组件都离不开 Informer，Kubernetes 之所以设计这样一个结构，核心需求是为了减少 Kubernetes API Server 和 ETCD 的压力，增强整个集群的稳定性。</p><p><img src="https://images.imoe.tech/blog/pasted-image-20230516143112.png" alt="整体流程"><br><img src="https://images.imoe.tech/blog/informer-component-relation.png" alt="代码调用关系"></p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://cloudnative.to/blog/client-go-informer-source-code/">深入了解 Kubernetes Informer</a></li><li><a href="https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/controller/Kubernetes%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0-Controller-P2-Controller%E4%B8%8Einformer.md">P2-Controller 与 informer</a></li><li><a href="https://github.com/cloudnativeto/sig-kubernetes/issues/11">Informer 中为什么需要引入 Resync 机制？</a></li><li><a href="https://github.com/kubernetes/sample-controller/blob/master/docs/controller-client-go.md">client-go under the hood</a></li></ol><article class="message is-info">        <div class="message-header"><p><i class="far fa-edit mr-2"></i>修订历史</p></div>        <div class="message-body">            <ul><li><strong>2023-05-16</strong>：细化补充 <code>Reflector</code> 的 <code>Watch</code> 相关逻辑，更新流程图增加 Informer <code>Lister</code> 入口说明。</li></ul>        </div>    </article>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;在 Kubernetes 中，kube-apiserver 是整个集群的大脑和心脏，是控制集群的入口，所有模块都是通过其提供的 HTTP REST API 接口来操作集群的。&lt;/p&gt;
&lt;p&gt;由于是所有模块的数据交互和通信的枢纽，大量组件直接通过 HTTP 请求 apiserver 带来的访问压力是非常大的。一但 apiserver 出现异常，整个集群就会受到影响，甚至崩溃。&lt;/p&gt;
&lt;p&gt;所以尽可能降低 apiserver 的访问压力是很有必要的，Informer 机制就是 Kubernetes 解决这个问题的方案。Informer 本质就是 &lt;code&gt;client-go&lt;/code&gt; 提供的一种本地缓存机制：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;通过在本地缓存一份准实时的 Kubernetes 资源数据，应用在查询时直接从本地查询；&lt;/li&gt;
&lt;li&gt;当资源变化时通过长连接将变更推送到本地 Informer 并更新本地缓存；&lt;/li&gt;
&lt;li&gt;变更缓存后，触发本地的处理函数执行相关业务逻辑。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;通过 Informer 机制，大大降低了 Kubernetes 各个组件跟与 API Server 的通信压力，同时 ETCD 的查询压力也同样得到缓解。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="client-go" scheme="https://blog.imoe.tech/tags/client-go/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>Kubernetes 核心组件 Leader 选举机制</title>
    <link href="https://blog.imoe.tech/2023/01/18/principle-of-kubernetes-leaderelection/"/>
    <id>https://blog.imoe.tech/2023/01/18/principle-of-kubernetes-leaderelection/</id>
    <published>2023-01-18T04:23:18.000Z</published>
    <updated>2023-02-15T03:50:42.025Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue"><p>Kubernetes 内部很多核心组件是有状态的，都以一主多从多实例的方式运行。这些组件的一主多从中，只有主实例负责处理数据，从实例处在热备状态。当主实例异常时从实例将竞选成为主实例并接替进行任务处理，所以这个选举机制是 Kubernetes 对于这有状态组件高可用的保障。</p><p>核心组件如 kube-scheduler 或 kube-controller-manager 等组件，在同一时刻只有一个实例在处理业务逻辑，因此需要在启动的实例中进行选主，决定哪个实例负责处理任务。这些核心组件都是使用的 client-go 中提供的工具类 <code>leaderelection</code>，也就是本文的主角。</p><p><code>leaderelection</code> 依赖于 Kubernetes 中提供的 <code>Endpoints</code>、<code>ConfigMap</code> 和 <code>Lease</code> 三种资源锁，<code>leaderelection</code> 选主的实现方式就是基于这三种资源锁：</p><ul><li>多个副本去创建资源，创建成功则获得锁成为 leader；</li><li>leader 在租约内去刷新锁；</li><li>其他副本则通过比对锁的更新时间，判断是否竞争成为 leader。</li></ul><p>除了能在核心组件中使用，这个组件也能使用在我们开发的应用中，前提是我们的应用运行在 Kubernetes 环境且有操作资源锁的权限。</p><span id="more"></span><h2 id="启动选举"><a href="#启动选举" class="headerlink" title="启动选举"></a>启动选举</h2><p>如果我们想使用 <code>leaderelection</code> 组件实现选主的功能，那么在开始选举先需要先通过方法 <code>resourcelock.New</code> 获取资源锁的对象。Kubernetes 虽然现在只有 endpoints/configmap/lease 几种资源锁，但他们之间可以组合使用。</p><p>可选的资源锁组合如下： </p><figure class="highlight go"><figcaption><span>资源锁组合</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> (</span><br><span class="line">endpointsResourceLock        = <span class="string">&quot;endpoints&quot;</span></span><br><span class="line">configMapsResourceLock       = <span class="string">&quot;configmaps&quot;</span></span><br><span class="line">LeasesResourceLock           = <span class="string">&quot;leases&quot;</span></span><br><span class="line">EndpointsLeasesResourceLock  = <span class="string">&quot;endpointsleases&quot;</span></span><br><span class="line">ConfigMapsLeasesResourceLock = <span class="string">&quot;configmapsleases&quot;</span></span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>另外，<code>resourcelock.NewFromKubeconfig</code> 方法也可以创建资源锁，该方法对 <code>resourcelock.New</code> 进行了封装。kube-controller-manager 就是使用这个方法创建的资源锁：</p><figure class="highlight go"><figcaption><span>kube-controller-manager 创建资源锁</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">rl, err := resourcelock.NewFromKubeconfig(resourceLock,  </span><br><span class="line">   c.ComponentConfig.Generic.LeaderElection.ResourceNamespace,  </span><br><span class="line">   leaseName,  </span><br><span class="line">   resourcelock.ResourceLockConfig&#123;  </span><br><span class="line">      Identity:      lockIdentity,  </span><br><span class="line">      EventRecorder: c.EventRecorder,  </span><br><span class="line">   &#125;,  </span><br><span class="line">   c.Kubeconfig,  </span><br><span class="line">   c.ComponentConfig.Generic.LeaderElection.RenewDeadline.Duration)</span><br></pre></td></tr></table></figure><p>在 kube-controller-manager 中，<code>resourceLock</code> 参数默认传的是 <code>endpointsleases</code>，所以会创建一个包含 <code>endpointsLock</code> 和 <code>LeaseLock</code> 的组合锁 <code>MultiLock</code>，在操作锁的时候会先操作 <code>endpointsLock</code>，成功再操作 <code>LeaseLock</code>。</p><p>准备好资源锁后，调用方法 <code>leaderelection.RunOrDie</code> 开始选举。</p><figure class="highlight go"><figcaption><span>启动选举</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig&#123;</span><br><span class="line">  <span class="comment">// 资源锁类型</span></span><br><span class="line">  Lock: lock,</span><br><span class="line">  <span class="comment">// 租约时长，非主候选者用来判断资源锁是否过期</span></span><br><span class="line">  LeaseDuration:   <span class="number">60</span> * time.Second,</span><br><span class="line">  <span class="comment">// leader刷新资源锁超时时间</span></span><br><span class="line">  RenewDeadline:   <span class="number">15</span> * time.Second,</span><br><span class="line">  <span class="comment">// 调用资源锁间隔</span></span><br><span class="line">  RetryPeriod:     <span class="number">5</span> * time.Second,</span><br><span class="line">  <span class="comment">// 回调函数，根据选举不同事件触发</span></span><br><span class="line">  Callbacks: leaderelection.LeaderCallbacks&#123;</span><br><span class="line">  OnStartedLeading: <span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context)</span></span> &#123;</span><br><span class="line">  run(ctx)</span><br><span class="line">  &#125;,</span><br><span class="line">  OnStoppedLeading: <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">  klog.Infof(<span class="string">&quot;leader lost: %s&quot;</span>, id)</span><br><span class="line">  os.Exit(<span class="number">0</span>) <span class="comment">// 通常要退出程序，重启后重新开始选主，否则将不会参与到选主中</span></span><br><span class="line">  &#125;,</span><br><span class="line">  OnNewLeader: <span class="function"><span class="keyword">func</span><span class="params">(identity <span class="type">string</span>)</span></span> &#123;</span><br><span class="line">  <span class="keyword">if</span> identity == id &#123;</span><br><span class="line">  <span class="keyword">return</span></span><br><span class="line">  &#125;</span><br><span class="line">  klog.Infof(<span class="string">&quot;new leader elected: %s&quot;</span>, identity)</span><br><span class="line">  &#125;,</span><br><span class="line">  &#125;,</span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><h2 id="选举主流程"><a href="#选举主流程" class="headerlink" title="选举主流程"></a>选举主流程</h2><p>启动选举后，<code>RunOrDie</code> 方法会调用 <code>le.Run(ctx)</code> 方法开始真正的选举流程，该方法除非在以下情况下才会返回：</p><ul><li><code>ctx</code> 被取消（外部要求中止选举流程）</li><li>当选了 Leader 后，任期结束（网络或某种原因导致续期失败）</li></ul><p>其它情况，比如当前节点未曾当选 Leader 则会卡在 <code>acquire</code> 方法中持续竞选。</p><figure class="highlight go"><figcaption><span>le.Run(ctx)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(le *LeaderElector)</span></span> Run(ctx context.Context) &#123;</span><br><span class="line"><span class="keyword">defer</span> runtime.HandleCrash()</span><br><span class="line"><span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="comment">// 当方法退出时回调通知任期结束</span></span><br><span class="line">le.config.Callbacks.OnStoppedLeading()</span><br><span class="line">&#125;()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 开始竞选</span></span><br><span class="line"><span class="keyword">if</span> !le.acquire(ctx) &#123;</span><br><span class="line"><span class="comment">// ctx 被取消，中止选举</span></span><br><span class="line"><span class="keyword">return</span> <span class="comment">// ctx signalled done</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 当选 leader</span></span><br><span class="line">ctx, cancel := context.WithCancel(ctx)</span><br><span class="line"><span class="keyword">defer</span> cancel()</span><br><span class="line"><span class="comment">// 通知执行 leader 任务</span></span><br><span class="line"><span class="keyword">go</span> le.config.Callbacks.OnStartedLeading(ctx)</span><br><span class="line"><span class="comment">// 执行 leader 续期</span></span><br><span class="line">le.renew(ctx)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="竞选"><a href="#竞选" class="headerlink" title="竞选"></a>竞选</h2><figure class="highlight go"><figcaption><span>acquire</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(le *LeaderElector)</span></span> acquire(ctx context.Context) <span class="type">bool</span> &#123;</span><br><span class="line">ctx, cancel := context.WithCancel(ctx)</span><br><span class="line"><span class="keyword">defer</span> cancel()</span><br><span class="line"><span class="comment">// 默认 false，当未当选且 ctx 被取消时返回</span></span><br><span class="line">succeeded := <span class="literal">false</span></span><br><span class="line">desc := le.config.Lock.Describe()</span><br><span class="line">klog.Infof(<span class="string">&quot;attempting to acquire leader lease %v...&quot;</span>, desc)</span><br><span class="line"><span class="comment">// 循环竞选</span></span><br><span class="line"><span class="comment">// 间隔 RetryPeriod 执行一次，直到 ctx.Done()</span></span><br><span class="line">wait.JitterUntil(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"><span class="comment">// 竞选</span></span><br><span class="line">succeeded = le.tryAcquireOrRenew(ctx)</span><br><span class="line"><span class="comment">// leader 变化回调</span></span><br><span class="line">le.maybeReportTransition()</span><br><span class="line"><span class="comment">// 竞选失败返回</span></span><br><span class="line"><span class="keyword">if</span> !succeeded &#123;</span><br><span class="line">klog.V(<span class="number">4</span>).Infof(<span class="string">&quot;failed to acquire lease %v&quot;</span>, desc)</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 成功则退出竞选函数</span></span><br><span class="line">le.config.Lock.RecordEvent(<span class="string">&quot;became leader&quot;</span>)</span><br><span class="line">le.metrics.leaderOn(le.config.Name)</span><br><span class="line">klog.Infof(<span class="string">&quot;successfully acquired lease %v&quot;</span>, desc)</span><br><span class="line">cancel()</span><br><span class="line">&#125;, le.config.RetryPeriod, JitterFactor, <span class="literal">true</span>, ctx.Done())</span><br><span class="line"><span class="keyword">return</span> succeeded</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>对于返回值，有以下情况：</p><ul><li><code>succeeded</code> 默认值是 <code>false</code>，在竞选循环中，外部通过 <code>ctx</code> 中止竞选时会中止 <code>wait.JitterUntil</code> 循环并返回 <code>false</code>；</li><li>如果竞选成功，则会改写 <code>succeeded=true</code> 并手动调用 <code>cancel()</code> 中止 <code>wait.JitterUntil</code> 循环。</li></ul><p>所以 <code>acquire</code> 函数只会有两种情况会返回：</p><ul><li><code>true</code>：当选 leader；</li><li><code>false</code>：外部中止竞选。</li></ul><p>未当选的竞选者会在 <code>wait.JitterUntil</code> 循环中持续尝试。</p><h2 id="抢锁和续期"><a href="#抢锁和续期" class="headerlink" title="抢锁和续期"></a>抢锁和续期</h2><p>抢锁操作和 Leader 续期都是在 <code>tryAcquireOrRenew</code> 方法中实现。</p><ul><li>如果当前节点未取得锁，会尝试获取锁；</li><li>否则进行续期；</li><li>成功返回 <code>true</code>，失败返回 <code>false</code>。</li></ul><figure class="highlight go"><figcaption><span>tryAcquireOrRenew</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(le *LeaderElector)</span></span> tryAcquireOrRenew(ctx context.Context) <span class="type">bool</span> &#123;</span><br><span class="line">now := metav1.Now()</span><br><span class="line"><span class="comment">// 使用默认值初始化记录对象，HolderIdentity 为当前竞选者标识</span></span><br><span class="line">leaderElectionRecord := rl.LeaderElectionRecord&#123;</span><br><span class="line">HolderIdentity:       le.config.Lock.Identity(),</span><br><span class="line">LeaseDurationSeconds: <span class="type">int</span>(le.config.LeaseDuration / time.Second),</span><br><span class="line">RenewTime:            now,</span><br><span class="line">AcquireTime:          now,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 获取锁记录</span></span><br><span class="line">oldLeaderElectionRecord, oldLeaderElectionRawRecord, err := le.config.Lock.Get(ctx)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">if</span> !errors.IsNotFound(err) &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;error retrieving resource lock %v: %v&quot;</span>, le.config.Lock.Describe(), err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 如果锁不存在则尝试创建</span></span><br><span class="line"><span class="keyword">if</span> err = le.config.Lock.Create(ctx, leaderElectionRecord); err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;error initially creating leader election record: %v&quot;</span>, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 创建成功则取锁成功，更新当前监控锁记录内容</span></span><br><span class="line">le.setObservedRecord(&amp;leaderElectionRecord)</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检查获取的数据是否更新，更新则刷新本地缓存</span></span><br><span class="line"><span class="comment">// 如果有变化，说明上次尝试获取锁到现在的间隔内 leader 变化了</span></span><br><span class="line"><span class="keyword">if</span> !bytes.Equal(le.observedRawRecord, oldLeaderElectionRawRecord) &#123;</span><br><span class="line">le.setObservedRecord(oldLeaderElectionRecord)</span><br><span class="line"></span><br><span class="line">le.observedRawRecord = oldLeaderElectionRawRecord</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 检查是否被其它人持有，且任期未结束</span></span><br><span class="line"><span class="comment">// le.observedTime 由 le.setObservedRecord 方法更新，在 leader 发生变化时更新</span></span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(oldLeaderElectionRecord.HolderIdentity) &gt; <span class="number">0</span> &amp;&amp;</span><br><span class="line">le.observedTime.Add(le.config.LeaseDuration).After(now.Time) &amp;&amp;</span><br><span class="line">!le.IsLeader() &#123;</span><br><span class="line">klog.V(<span class="number">4</span>).Infof(<span class="string">&quot;lock is held by %v and has not yet expired&quot;</span>, oldLeaderElectionRecord.HolderIdentity)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 使用正确数据填充锁记录</span></span><br><span class="line"><span class="keyword">if</span> le.IsLeader() &#123;</span><br><span class="line"><span class="comment">// 如果当前是 leader，则是续期操作，使用记录里的取锁时间和变化次数</span></span><br><span class="line">leaderElectionRecord.AcquireTime = oldLeaderElectionRecord.AcquireTime</span><br><span class="line">leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line"><span class="comment">// 如果不是 leader 说明正在抢锁，将 leader 变化次数加 1</span></span><br><span class="line"><span class="comment">// AcquireTime 已经在上面默认设置成当前时间</span></span><br><span class="line">leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions + <span class="number">1</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 尝试更新锁</span></span><br><span class="line"><span class="keyword">if</span> err = le.config.Lock.Update(ctx, leaderElectionRecord); err != <span class="literal">nil</span> &#123;</span><br><span class="line">klog.Errorf(<span class="string">&quot;Failed to update lock: %v&quot;</span>, err)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 更新锁成功则说明当前节点持有锁：抢锁成功/续期成功</span></span><br><span class="line">le.setObservedRecord(&amp;leaderElectionRecord)</span><br><span class="line"><span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码的流程如下：</p><p><img src="https://images.imoe.tech/blog/SCR-20230117-f9v.png" alt="tryAcquireOrRenew"></p><h3 id="任期的原理"><a href="#任期的原理" class="headerlink" title="任期的原理"></a>任期的原理</h3><p>代码中对是否在任期的判断是基于 <code>le.observedTime</code> 的，<code>le.observedTime</code> 是在 leader 发生变化时调用 <code>le.setObservedRecord</code> 方法更新的。更新时机：</p><ol><li>锁不存在，创建锁成功时更新；</li><li>获取到锁记录和缓存的不同，说明上次尝试获取锁到现在的间隔内 leader 变化了，更新缓存；</li><li>leader 超期没续期且当前节点抢锁成功，更新缓存。</li></ol><p>相当于每次 <code>le.observedTime</code> 变化的时候都是监测到 leader 变化的时间，所以 <code>le.observedTime + LeaseDuration</code> 的时间就是 leader 当前任期的结束时间。</p><p>如果当前时间超过这个任期时间，但 leader 没及时刷新锁，就会导致获取到的锁记录 <code>oldLeaderElectionRawRecord</code> 和缓存的相同（无法满足更新时机 2），那么当前节点会走到后面抢锁的逻辑，执行锁更新的尝试。</p><h3 id="抢锁的原理"><a href="#抢锁的原理" class="headerlink" title="抢锁的原理"></a>抢锁的原理</h3><p>进行锁更新尝试（抢锁）的原理是基于 kubernetes 的资源乐观锁实现的：</p><ul><li>获取锁方法 <code>le.config.Lock.Get(ctx)</code> 会取得当前锁的最新 <code>resourceVersion</code> 并保存；</li><li>更新锁时提供保存的 <code>resourceVersion</code>；</li><li>Kubernetes 对比 <code>resourceVersion</code> 和最新值，如果相等则允许更新，返回成功，否则更新失败。</li></ul><p>当 leader 在任期内时，通常只有 leader 自己去更新锁；而在抢锁阶段会有多个节点尝试更新，但只有第一个到达 Kubernetes 的更新请求会被处理，其它节点请求到达的时候 <code>resourceVersion</code> 已经被更新过了，所以请求会被拒绝。</p><h2 id="续期处理循环"><a href="#续期处理循环" class="headerlink" title="续期处理循环"></a>续期处理循环</h2><p>在获取到锁成为 leader 后，会进入 <code>le.renew(ctx)</code> 方法进行定期续期操作。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(le *LeaderElector)</span></span> renew(ctx context.Context) &#123;</span><br><span class="line">ctx, cancel := context.WithCancel(ctx)</span><br><span class="line"><span class="keyword">defer</span> cancel()</span><br><span class="line"><span class="comment">// 定期续期，每 RetryPeriod 执行一次续期</span></span><br><span class="line">wait.Until(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">timeoutCtx, timeoutCancel := context.WithTimeout(ctx, le.config.RenewDeadline)</span><br><span class="line"><span class="keyword">defer</span> timeoutCancel()</span><br><span class="line"><span class="comment">// 执行续期，直到成功或超时</span></span><br><span class="line">err := wait.PollImmediateUntil(le.config.RetryPeriod, <span class="function"><span class="keyword">func</span><span class="params">()</span></span> (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">return</span> le.tryAcquireOrRenew(timeoutCtx), <span class="literal">nil</span></span><br><span class="line">&#125;, timeoutCtx.Done())</span><br><span class="line"></span><br><span class="line">le.maybeReportTransition()</span><br><span class="line">desc := le.config.Lock.Describe()</span><br><span class="line"><span class="comment">// 没报错说明续期成功</span></span><br><span class="line"><span class="keyword">if</span> err == <span class="literal">nil</span> &#123;</span><br><span class="line">klog.V(<span class="number">5</span>).Infof(<span class="string">&quot;successfully renewed lease %v&quot;</span>, desc)</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 超时错误，说明续期失败，当前不再是 leader，退出续期流程</span></span><br><span class="line">le.config.Lock.RecordEvent(<span class="string">&quot;stopped leading&quot;</span>)</span><br><span class="line">le.metrics.leaderOff(le.config.Name)</span><br><span class="line">klog.Infof(<span class="string">&quot;failed to renew lease %v: %v&quot;</span>, desc, err)</span><br><span class="line">cancel()</span><br><span class="line">&#125;, le.config.RetryPeriod, ctx.Done())</span><br><span class="line"></span><br><span class="line"><span class="comment">// if we hold the lease, give it up</span></span><br><span class="line"><span class="keyword">if</span> le.config.ReleaseOnCancel &#123;</span><br><span class="line">le.release()</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>renew</code> 方法如果续期失败会直接返回，然后选主流程结束。其它主从竞选流程通常都会让当前节点重新成为备选节点并继续进行选主，而 <code>leaderelection</code> 是直接中止。</p><p>要想使用 <code>leaderelection</code> 实现重新竞选，需要自己再次调用 <code>leaderelection.RunOrDie</code> 重新开始或重启程序。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p><code>leaderelection</code> 流程中，只有 Leader 才是 Worker 节点，其它节点是热备节点，用于在 Leader 异常时及时接替工作。</p><p><code>kube-controller-manager</code> 等 Kubernetes 内部组件在 leader 续期失败时都是直接退出程序，由 Kubernetes 保活机制重启程序，再重新加入竞选。</p><p>如 <code>kube-controller-manager</code> 是这样配置 leader 续期失败动作的：</p><figure class="highlight go"><figcaption><span>kube-controller-manager 的 OnStoppedLeading</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">OnStoppedLeading: <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">klog.ErrorS(<span class="literal">nil</span>, <span class="string">&quot;leaderelection lost&quot;</span>)</span><br><span class="line">klog.FlushAndExit(klog.ExitFlushTimeout, <span class="number">1</span>)</span><br><span class="line">&#125;,</span><br></pre></td></tr></table></figure><p>完整的 <code>leaderelection</code> 流程如图：</p><p><img src="https://images.imoe.tech/blog/client-go-leaderelection.png" alt="完整流程"></p><h2 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h2><ol><li><a href="https://qingwave.github.io/k8s-leaderelection-code">k8s 基于资源锁的选主分析</a></li><li><a href="https://github.com/yinwenqin/kubeSourceCodeNote/blob/master/controller/Kubernetes%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0-Controller-P1-%E5%A4%9A%E5%AE%9E%E4%BE%8Bleader%E9%80%89%E4%B8%BE.md">多实例 leader 选举</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/based%20on-kubernetes%20v1.26-blue&quot;&gt;

&lt;p&gt;Kubernetes 内部很多核心组件是有状态的，都以一主多从多实例的方式运行。这些组件的一主多从中，只有主实例负责处理数据，从实例处在热备状态。当主实例异常时从实例将竞选成为主实例并接替进行任务处理，所以这个选举机制是 Kubernetes 对于这有状态组件高可用的保障。&lt;/p&gt;
&lt;p&gt;核心组件如 kube-scheduler 或 kube-controller-manager 等组件，在同一时刻只有一个实例在处理业务逻辑，因此需要在启动的实例中进行选主，决定哪个实例负责处理任务。这些核心组件都是使用的 client-go 中提供的工具类 &lt;code&gt;leaderelection&lt;/code&gt;，也就是本文的主角。&lt;/p&gt;
&lt;p&gt;&lt;code&gt;leaderelection&lt;/code&gt; 依赖于 Kubernetes 中提供的 &lt;code&gt;Endpoints&lt;/code&gt;、&lt;code&gt;ConfigMap&lt;/code&gt; 和 &lt;code&gt;Lease&lt;/code&gt; 三种资源锁，&lt;code&gt;leaderelection&lt;/code&gt; 选主的实现方式就是基于这三种资源锁：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;多个副本去创建资源，创建成功则获得锁成为 leader；&lt;/li&gt;
&lt;li&gt;leader 在租约内去刷新锁；&lt;/li&gt;
&lt;li&gt;其他副本则通过比对锁的更新时间，判断是否竞争成为 leader。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;除了能在核心组件中使用，这个组件也能使用在我们开发的应用中，前提是我们的应用运行在 Kubernetes 环境且有操作资源锁的权限。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Kubernetes" scheme="https://blog.imoe.tech/tags/Kubernetes/"/>
    
    <category term="client-go" scheme="https://blog.imoe.tech/tags/client-go/"/>
    
    <category term="源码学习" scheme="https://blog.imoe.tech/tags/%E6%BA%90%E7%A0%81%E5%AD%A6%E4%B9%A0/"/>
    
  </entry>
  
  <entry>
    <title>Docker 构建多平台镜像</title>
    <link href="https://blog.imoe.tech/2023/01/05/docker-build-multiplatform-image/"/>
    <id>https://blog.imoe.tech/2023/01/05/docker-build-multiplatform-image/</id>
    <published>2023-01-05T15:18:57.000Z</published>
    <updated>2023-01-10T02:47:02.137Z</updated>
    
    <content type="html"><![CDATA[<p>最近公司在组织各个系统的开发人员在搞信创改造，其中有部分改造内容就是要让系统能兼容 ARM 架构的 CPU。我们的系统都是运行在 Kubernetes 的容器中的，所以需要将应用打包到不同架构的镜像中。</p><p>Docker 提供了多平台的支持，可以将不同架构的镜像打包成一个镜像，部署时再根据运行的架构不同拉取不同架构的镜像运行，构建多平台镜像可以使用 BuildX 组件实现。</p><span id="more"></span><h1 id="Docker-BuildX"><a href="#Docker-BuildX" class="headerlink" title="Docker BuildX"></a>Docker BuildX</h1><p>我们可以使用 <code>docker buildx</code> 命令构建多平台容器镜像。<a href="https://github.com/docker/buildx">Buildx</a> 是 Docker 的组件，支持很多构建特性。通过 Buildx 的构建都是在 <a href="https://github.com/moby/buildkit">Moby Buildkit</a> 构建引擎里执行的。 Docker 23.0 版本的 <code>docker build</code> 命令也将默认基于 Buildx 构建。</p><p>Buildx 默认构建当前机器架构的镜像，如果需要构建不同架构的镜像需要使用 <code>--platform</code> 参数，比如：<code>--platform=linux/arm64</code>。如果想构建支持多个架构的镜像，可以使用逗号分隔多个架构的值。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker buildx build --platform=linux/amd64,linux/arm64 .</span><br></pre></td></tr></table></figure><p>使用 Buildx 构建多平台镜像时，需要创建支持的构建实例。上面说过，buildx 是运行在 BuildKit 里的，所以在构建多平台镜像之前需要创建一个构建实例。有些 Docker 环境提供的默认构建实例本身支持则无需再额外创建。可以通过 <code>docker buildx ls</code> 命令检查：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># docker buildx ls</span></span><br><span class="line">NAME/NODE       DRIVER/ENDPOINT             STATUS  BUILDKIT PLATFORMS</span><br><span class="line">m1_builder *    docker-container</span><br><span class="line">  m1_builder0   unix:///var/run/docker.sock running v0.10.5  linux/arm64, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6</span><br><span class="line">default         docker</span><br><span class="line">  default       default                     running 20.10.21 linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6</span><br><span class="line">desktop-linux   docker</span><br><span class="line">  desktop-linux desktop-linux               running 20.10.21 linux/arm64, linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6</span><br></pre></td></tr></table></figure><p>如果不支持，可以手动创建一个。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker buildx create --use</span><br></pre></td></tr></table></figure><p>构建多平台镜像时，实际上是每个平台分别构建一次，最终合并成一个镜像。如下面镜像：</p><figure class="highlight dockerfile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> alpine</span><br><span class="line"><span class="keyword">RUN</span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&quot;Hello&quot;</span> &gt; /hello</span></span><br></pre></td></tr></table></figure><p>构建的时候，buildx 会拉取不同架构的 alpine 镜像，最终在执行的时候，分别执行的是各自架构的二进制。</p><blockquote><p>这里告诉我们，构建多平台镜像时，依赖的基础镜像也应该支持多平台。</p></blockquote><h1 id="准备-Dockerfile"><a href="#准备-Dockerfile" class="headerlink" title="准备 Dockerfile"></a>准备 Dockerfile</h1><p>Docker 官方文章《<a href="https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/">Faster Multi-Platform Builds: Dockerfile Cross-Compilation Guide</a>》介绍了一种更快的多平台镜像构建方法。</p><p><img src="https://images.imoe.tech/blog/49-docker-1-1110x587.webp" alt="运行在 Apple M1 上的构建示例，蓝色的包含 x86 文件，黄色的包含 ARM 文件"></p><p>总体上说，思路是在构建的机器上直接构建生成目标架构的可执行文件，再将文件复制到镜像中进行打包。因为如果在目标架构上构建，意味着需要使用 BuildKit 模拟目标架构来运行，中间存在指令转换的开销，严重影响构建效率。</p><p>可以通过多阶段构建（Multi-stage build）的 Dockerfile 实现构建和打包运行镜像分开，只有最终的 stage 才会生成镜像，其它 stage 实际上是构建的中间过程。</p><p>在构建 stage 使用的 <code>FROM debian</code> 的语句，实际上是让构建器拉取的当前构建的目标架构的 <code>Debian</code> 镜像（<code>--platform</code> 参数指定）。如果想让构建 stage 直接以构建主机的架构构建需要手动指定架构，如使用 x86 架构进行构建可以这样配置基础镜像：</p><figure class="highlight dockerfile"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> --platform=linux/amd64 debian as builder</span><br></pre></td></tr></table></figure><p>这样配置后，无论当前是构建 x86 还是 ARM 架构的镜像，builder 阶段都使用 x86 的基础镜像进行构建。</p><p>Docker 提供了一些预定义的全局构建参数，用于描述当前的构建情况。上面提到的 <code>linux/amd64</code> 在更换平台进行构建时需要调整，而使用全局定义参数 <code>BUILDPLATFORM</code> 就可以解决这个问题，这个值在构建时会自动填充成当前构建系统的架构。</p><p>常用的预定义参数如下：</p><table><thead><tr><th>参数</th><th>说明</th><th>示例</th></tr></thead><tbody><tr><td>BUILDPLATFORM</td><td>当前主机平台</td><td>linux/amd64</td></tr><tr><td>BUILDOS</td><td>当前系统类型</td><td>linux</td></tr><tr><td>BUILDARCH</td><td>当前主机架构</td><td>amd64, arm64, riscv64</td></tr><tr><td>BUILDVARIANT</td><td>ARM 版本</td><td>v7</td></tr><tr><td>TARGETPLATFORM</td><td>目标平台</td><td>linux/arm64</td></tr><tr><td>TARGETOS</td><td>目标系统</td><td>linux</td></tr><tr><td>TARGETARCH</td><td>目标架构</td><td>arm64</td></tr></tbody></table><p>现在，配合预定义参数，构建镜像的 Dockerfile 可以配置成：</p><figure class="highlight dockerfile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> --platform=$BUILDPLATFORM alpine AS build</span><br><span class="line"><span class="comment"># RUN &lt;install build dependecies/compiler&gt;</span></span><br><span class="line"><span class="comment"># COPY &lt;source&gt; .</span></span><br><span class="line"><span class="keyword">ARG</span> TARGETPLATFORM</span><br><span class="line"><span class="keyword">RUN</span><span class="language-bash"> compile --target=<span class="variable">$TARGETPLATFORM</span> -o /out/mybinary</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">FROM</span> alpine</span><br><span class="line"><span class="comment"># RUN &lt;install runtime dependencies installed via emulation&gt;</span></span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> --from=build /out/mybinary /bin</span></span><br></pre></td></tr></table></figure><p>这个 Dockerfile 有两个阶段（stage），<code>build</code> 阶段和 <code>runtime</code> 阶段。</p><ul><li><code>build</code> 阶段负责准备编译环境和进行交叉编译生成目标平台的可执行文件。由于 Dockerfile <a href="https://docs.docker.com/engine/reference/builder/#scope">Scope</a> 的影响，全局预定义变量要想在命令中使用，需要使用 <code>ARG</code> 指令将参数传递到 stage 的本地 scope；</li><li><code>runtime</code> 阶段是容器镜像最终生成的阶段，这里的 <code>FROM</code> 指令不配置 <code>--platform</code> 相当于配置了 <code>FROM --platform=$TARGETPLATFORM</code>，拉取当前构建目标架构的基础镜像，所以可以省略掉 <code>--platform</code> 参数。</li></ul><h1 id="多平台镜像的结构"><a href="#多平台镜像的结构" class="headerlink" title="多平台镜像的结构"></a>多平台镜像的结构</h1><p>生成的多个镜像最后会合并到一个 OCI 镜像中，BuildKit 会生成包含这些镜像的 manifest 信息，内容如下：</p><figure class="highlight json"><figcaption><span>manifest list</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">   <span class="attr">&quot;mediaType&quot;</span><span class="punctuation">:</span> <span class="string">&quot;application/vnd.docker.distribution.manifest.list.v2+json&quot;</span><span class="punctuation">,</span></span><br><span class="line">   <span class="attr">&quot;schemaVersion&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span></span><br><span class="line">   <span class="attr">&quot;manifests&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">      <span class="punctuation">&#123;</span></span><br><span class="line">         <span class="attr">&quot;mediaType&quot;</span><span class="punctuation">:</span> <span class="string">&quot;application/vnd.docker.distribution.manifest.v2+json&quot;</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;digest&quot;</span><span class="punctuation">:</span> <span class="string">&quot;sha256:33cc662c443c0c3f4bfcdc37d97b3c172d5b4c0bb4e9ed19f2b3d288466d9bfb&quot;</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;size&quot;</span><span class="punctuation">:</span> <span class="number">738</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;platform&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">            <span class="attr">&quot;architecture&quot;</span><span class="punctuation">:</span> <span class="string">&quot;amd64&quot;</span><span class="punctuation">,</span></span><br><span class="line">            <span class="attr">&quot;os&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linux&quot;</span></span><br><span class="line">         <span class="punctuation">&#125;</span></span><br><span class="line">      <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">      <span class="punctuation">&#123;</span></span><br><span class="line">         <span class="attr">&quot;mediaType&quot;</span><span class="punctuation">:</span> <span class="string">&quot;application/vnd.docker.distribution.manifest.v2+json&quot;</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;digest&quot;</span><span class="punctuation">:</span> <span class="string">&quot;sha256:e969f63ec867a40c8363fb960c8f8d0b7e188ff26064b40c846bc9cd097ff0b3&quot;</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;size&quot;</span><span class="punctuation">:</span> <span class="number">738</span><span class="punctuation">,</span></span><br><span class="line">         <span class="attr">&quot;platform&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">            <span class="attr">&quot;architecture&quot;</span><span class="punctuation">:</span> <span class="string">&quot;arm64&quot;</span><span class="punctuation">,</span></span><br><span class="line">            <span class="attr">&quot;os&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linux&quot;</span></span><br><span class="line">         <span class="punctuation">&#125;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">   <span class="punctuation">]</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>如果你使用的 Harbor 作为镜像管理仓库，可以访问这个页面快速获取镜像的 manifest（根据需要替换下面路径中的参数）。</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">https://registry.imoe.tech/v2/&lt;project-name&gt;/&lt;image-name&gt;/manifests/&lt;tag&gt;</span><br></pre></td></tr></table></figure><p>使用 BuildX 构建时，也可以直接导出 OCI 镜像包，并查看该包的结构。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">docker buildx build \</span><br><span class="line">  --platform linux/amd64,linux/arm64 \</span><br><span class="line">  -f Dockerfile . --output <span class="built_in">type</span>=oci,dest=./dockertest.tar</span><br></pre></td></tr></table></figure><p>上面命令会在当前目录生成 <code>dockertest.tar</code> 文件，解压后就能一窥 OCI 镜像的结构。</p><p><img src="https://images.imoe.tech/blog/oci-multiplatform-image.jpeg" alt="OCI 多平台镜像结构"></p><h1 id="Podman-构建多平台镜像"><a href="#Podman-构建多平台镜像" class="headerlink" title="Podman 构建多平台镜像"></a>Podman 构建多平台镜像</h1><p>得益于容器的标准化，容器的管理工具有了更多选择。现在很多人使用 <a href="https://podman.io/">Podman</a> 而不是 Docker 来构建和运行容器。但是 Podman 并不支持 Docker 的 BuildX 组件，上面第一节说的方法没法用了。好消息是，Podman 原生支持构建跨平台的镜像。</p><p>在使用 podman 构建前，需要登录一个 OCI 兼容的镜像仓库，这里登录腾讯云的镜像仓库。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">podman login ccr.ccs.tencentyun.com -p Secret -u User</span><br></pre></td></tr></table></figure><h2 id="构建指定架构的镜像"><a href="#构建指定架构的镜像" class="headerlink" title="构建指定架构的镜像"></a>构建指定架构的镜像</h2><p>Docker BuildX 支持在一次执行中自动生成多个架构的镜像并合并成一个镜像，这个过程完全是自动的。而在 Podman 中则需要手动创建不同的镜像，比如这里创建 arm64 和 amd64 的镜像：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">podman build --platform linux/arm64/v8 -t ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0-linux-arm64 .</span><br><span class="line">podman build --platform linux/amd64 -t ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0-linux-amd64 .</span><br></pre></td></tr></table></figure><p>上面命令使用 <code>--platform</code> 参数指定了镜像的架构，格式和 Docker BuildX 一样但一次只能指定一个架构。所以上面实际创建了两个不同的镜像，用不同的 tag 进行了区分。</p><p>使用 <code>podman image ls</code> 查看生成的镜像。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ podman image <span class="built_in">ls</span></span><br><span class="line">REPOSITORY                                    TAG                 IMAGE ID      CREATED         SIZE</span><br><span class="line">ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world         v1.0.0-linux-amd64  de9a0c2f01c8  29 seconds ago  114 MB</span><br><span class="line">ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world         v1.0.0-linux-arm64  0f3a0bc46798  43 seconds ago  136 MB</span><br></pre></td></tr></table></figure><p>可以看到，除了 tag 不同，我们甚至看不出有什么区别。要查看镜像的架构，需要使用 <code>inspect</code> 命令。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ podman image inspect ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0-linux-amd64 | grep Arch</span><br><span class="line">        <span class="string">&quot;Architecture&quot;</span>: <span class="string">&quot;amd64&quot;</span>,</span><br></pre></td></tr></table></figure><h2 id="合并多个架构镜像"><a href="#合并多个架构镜像" class="headerlink" title="合并多个架构镜像"></a>合并多个架构镜像</h2><p>OCI 镜像通过 manifest 来管理多平台的镜像，<code>podman manifest</code> 命令可以创建和管理 manifest。</p><figure class="highlight bash"><figcaption><span>合并现有的多平台镜像</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">podman manifest create \</span><br><span class="line">  ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0 \</span><br><span class="line">  ccr.ccs.tencentyun.com/&lt;namespace&gt;hello-world:v1.0.0-linux-arm64 \</span><br><span class="line">  ccr.ccs.tencentyun.com/&lt;namespace&gt;hello-world:v1.0.0-linux-amd64</span><br><span class="line"></span><br><span class="line">podman manifest push ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0 \</span><br><span class="line">  docker://ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0</span><br><span class="line">docker manifest <span class="built_in">rm</span> ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0</span><br></pre></td></tr></table></figure><p>上面第一个命令的作用是创建了一个 manifest，叫作 <code>hello-world:v1.0.0</code>，然后添加了两个镜像到 manifest 中（后面两个）。可以把这个命令拆分出来，先创建 manifest 再分别构建好镜像，直接加到 manifest 中。</p><figure class="highlight bash"><figcaption><span>构建多平台镜像</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">podman manifest create ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> IMAGEARCH <span class="keyword">in</span> amd64 arm64 ; <span class="keyword">do</span></span><br><span class="line">    podman build --platform linux/<span class="variable">$&#123;IMAGEARCH&#125;</span> --layers=<span class="literal">false</span> -f Dockerfile \</span><br><span class="line">        --build-arg=TARGETOS=linux \</span><br><span class="line">        --build-arg=TARGETARCH=<span class="string">&quot;<span class="variable">$&#123;IMAGEARCH&#125;</span>&quot;</span> \</span><br><span class="line">        --build-arg=TARGETPLATFORM=<span class="string">&quot;linux/<span class="variable">$&#123;IMAGEARCH&#125;</span>&quot;</span> \</span><br><span class="line">        --build-arg=BUILDPLATFORM=<span class="string">&quot;linux/amd64&quot;</span> \</span><br><span class="line">      -t ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0-<span class="variable">$&#123;IMAGEARCH&#125;</span> .</span><br><span class="line">    podman manifest add --<span class="built_in">arch</span>=<span class="variable">$&#123;IMAGEARCH&#125;</span> --os=linux \</span><br><span class="line">      ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0 \</span><br><span class="line">      docker://ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0-<span class="variable">$&#123;IMAGEARCH&#125;</span></span><br><span class="line"><span class="keyword">done</span></span><br><span class="line"></span><br><span class="line">podman manifest push --all ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0 \</span><br><span class="line">  docker://ccr.ccs.tencentyun.com/&lt;namespace&gt;/hello-world:v1.0.0</span><br></pre></td></tr></table></figure><p><code>podman manifest push</code> 命令用于将 manifest 推到镜像仓库中，推送后就可以使用 <code>podman manifest rm</code> 删除本地的 manifest。</p><p><strong>注意一</strong>：Podman 不提供全局预定义变量，需要手动通过 <code>--build-arg</code> 进行设置。<br><strong>注意二</strong>：Podman 不像 BuildX 能在 Builder 阶段使用 <code>FROM --platform</code> 指定拉取和 <code>podman build --platform</code> 不同平台的镜像。Podman 中拉取的始终都是 <code>podman build</code> 命令指定的架构，所以使用 Podman 环境下，只能在构建镜像前进行交叉编译产生可执行文件再使用 Podman 打包到镜像中。Dockerfile 文件也需要进行调整：</p><figure class="highlight dockerfile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> alpine AS build</span><br><span class="line"><span class="keyword">ARG</span> TARGETPLATFORM</span><br><span class="line"><span class="keyword">ADD</span><span class="language-bash"> ./output /output</span></span><br><span class="line"><span class="keyword">RUN</span><span class="language-bash"> <span class="built_in">cp</span> /output/<span class="string">&quot;<span class="variable">$TARGETPLATFORM</span>&quot;</span>/app /app</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">FROM</span> alpine</span><br><span class="line"><span class="comment"># RUN &lt;install runtime dependencies installed via emulation&gt;</span></span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> --from=build /app /bin</span></span><br></pre></td></tr></table></figure><p>上面在 <code>build</code> 阶段把所有的输出复制到层中，再使用 <code>RUN cp</code> 把需要的目标文件复制出来。这样做的原因是 <code>ADD</code> 这类指令不支持使用变量，所以多绕了一点路。</p><p>这样分两阶段，而不是合并，优点是最后阶段只打包需要的平台的可执行文件，而不是把所有的平台都打包，可以减小镜像的尺寸。以下一节的 Go 示例，对应的构建脚本可以是：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">for i in &quot;arm64&quot; &quot;amd64&quot; ; do</span><br><span class="line">  echo &quot;building for $i...&quot;</span><br><span class="line">  GOOS=linux GOARCH=&quot;$i&quot; go build -o ./output/linux/$i/app cmd/main.go</span><br><span class="line">  chmod +x ./output/linux/$i/app</span><br><span class="line">done</span><br></pre></td></tr></table></figure><blockquote><p>这里也可以进一步优化，在打包前构建后使用脚本直接处理好文件（移动到对应位置），而不用在 Dockerfile 中再对文件进行选择。</p></blockquote><p>除了 Podman 原生的支持外，还可以使用 <a href="https://danmanners.com/posts/2022-01-buildah-multi-arch/">Buildah</a> 来实现相同的功能，这里就不再赘述了。</p><h1 id="Go-构建样例"><a href="#Go-构建样例" class="headerlink" title="Go 构建样例"></a>Go 构建样例</h1><p>对于 Java 应用来说，一次构建产生的 Jar 包可以处处运行，不需要对不同平台进行交叉编译，所以 Java 应用只需要使用支持多平台的基础镜像构建容器镜像就可。</p><p>Go 语言编写的应用则不同，一般都需要编译成特定的可执行文件，比较适合作为上文所说的 <code>构建+打包</code> 的多阶段 Dockerfile 示例，这里举例一个 Go 应用使用多平台镜像的示例。</p><figure class="highlight dockerfile"><figcaption><span>单平台镜像构建</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> golang:<span class="number">1.19</span>-alpine AS build</span><br><span class="line"><span class="keyword">WORKDIR</span><span class="language-bash"> /src</span></span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> . .</span></span><br><span class="line"><span class="keyword">RUN</span><span class="language-bash"> go build -o /out/myapp .</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">FROM</span> alpine</span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> --from=build /out/myapp /bin</span></span><br></pre></td></tr></table></figure><p>上文是不进行多平台镜像构建的 Dockerfile，只需要简单的构建打包既可。接下来我们对其进行修改，增加交叉编译和多平台相关的内容。</p><p>Go 交叉编译非常简单，只需要在调用 <code>go build</code> 命令时传入 <code>GOOS</code> 和 <code>GOARCH</code> 两个环境变量既可。<code>GOOS</code> 和 <code>GOARCH</code> 的值与 BuildKit 的预定义变量 <code>TARGETOS</code> 和 <code>TARGETARCH</code> 的值是一样的。</p><p>所以只需要用 <code>ARG</code> 获取到值，简单地赋于 <code>GOOS</code> 和 <code>GOARCH</code> 就行，改造后的 Dockerfile 如下：</p><figure class="highlight dockerfile"><figcaption><span>多平台构建优化后</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">FROM</span> --platform=$BUILDPLATFORM golang:<span class="number">1.19</span>-alpine AS build</span><br><span class="line"><span class="keyword">WORKDIR</span><span class="language-bash"> /src</span></span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> . .</span></span><br><span class="line"><span class="keyword">ARG</span> TARGETOS TARGETARCH</span><br><span class="line"><span class="keyword">RUN</span><span class="language-bash"> GOOS=<span class="variable">$TARGETOS</span> GOARCH=<span class="variable">$TARGETARCH</span> go build -o /out/myapp .</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">FROM</span> alpine</span><br><span class="line"><span class="keyword">COPY</span><span class="language-bash"> --from=build /out/myapp /bin</span></span><br></pre></td></tr></table></figure><h1 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h1><p>Docker 目前对于多平台/平台镜像的支持已经非常好，自己构建镜像也简单，但构建多平台镜像要注意依赖的上游基础镜像要支持多平台，如果不支持需要再另外找支持的镜像或自己制作。</p><p>随着 Oracle 等云厂商都提供了 ARM 架构的云主机，多平台镜像也得到越来越多的支持。基本上各大开源组织官方的基础镜像都已经完成了支持，在选择基础镜像的时候也应该尽量使用各大开源如织的镜像，能给我们减少很多麻烦。</p><h1 id="引用"><a href="#引用" class="headerlink" title="引用"></a>引用</h1><ol><li><a href="https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/">Faster Multi-Platform Builds: Dockerfile Cross-Compilation Guide</a></li><li><a href="https://docs.docker.com/build/building/multi-platform/">Multi-platform images</a></li><li><a href="https://github.com/opencontainers/image-spec">OCI Image Format Specification</a></li><li><a href="https://github.com/opencontainers/image-spec/blob/main/image-index.md">OCI Image Index Specification</a></li><li><a href="https://medium.com/oracledevs/building-multi-architecture-containers-on-oci-with-podman-67d49a8b965e">Building Multi-Architecture Containers on OCI with Podman</a></li></ol>]]></content>
    
    
    <summary type="html">&lt;p&gt;最近公司在组织各个系统的开发人员在搞信创改造，其中有部分改造内容就是要让系统能兼容 ARM 架构的 CPU。我们的系统都是运行在 Kubernetes 的容器中的，所以需要将应用打包到不同架构的镜像中。&lt;/p&gt;
&lt;p&gt;Docker 提供了多平台的支持，可以将不同架构的镜像打包成一个镜像，部署时再根据运行的架构不同拉取不同架构的镜像运行，构建多平台镜像可以使用 BuildX 组件实现。&lt;/p&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="容器技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="Docker" scheme="https://blog.imoe.tech/tags/Docker/"/>
    
    <category term="Podman" scheme="https://blog.imoe.tech/tags/Podman/"/>
    
    <category term="OCI" scheme="https://blog.imoe.tech/tags/OCI/"/>
    
  </entry>
  
  <entry>
    <title>使用 iKuai Exporter 监控爱快的网络情况</title>
    <link href="https://blog.imoe.tech/2022/12/25/48-use-ikuai-exporter-to-gather-metrics/"/>
    <id>https://blog.imoe.tech/2022/12/25/48-use-ikuai-exporter-to-gather-metrics/</id>
    <published>2022-12-25T15:44:04.000Z</published>
    <updated>2023-10-16T14:53:34.449Z</updated>
    
    <content type="html"><![CDATA[<img alt="" a="<" src="https://shields.imoe.tech/badge/爱快-v3.7.1--v3.7.6-brightgreen"><p>爱快软路由有非常完善的监控功能，但因为是爱快自己的体系，很不灵活，只能满足基本的使用，没法在 Prometheus 里使用，想要做些告警的功能就更不行了。</p><p>再加上最近跑网心云的虚机挂了几天才发现，搞一套内网的监控告警体系迫在眉睫了。对于家庭内网的监控告警，有几个需求需要满足：</p><ol><li>能统计每天设备的流量</li><li>能对公网在线状态进行监控告警</li><li>能对设备的在线状态进行监控告警</li></ol><span id="more"></span><h2 id="部署-Exporter"><a href="#部署-Exporter" class="headerlink" title="部署 Exporter"></a>部署 Exporter</h2><p>爱快没有提供 Prometheus 的 Exporter，所以我根据爱快的接口开发了一个 Exporter，代码在：<a href="https://github.com/jakeslee/ikuai-exporter">https://github.com/jakeslee/ikuai-exporter</a> 。</p><p>已经打包成 Docker 镜像，只需要按以下命令就可以获取到：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker pull jakes/ikuai-exporter:latest</span><br></pre></td></tr></table></figure><p>可以通过给容器传递环境变量对 Exporter 进行配置，目前有以下参数：</p><table><thead><tr><th>变量名</th><th>说明</th></tr></thead><tbody><tr><td>IK_URL</td><td>爱快地址</td></tr><tr><td>IK_USER</td><td>爱快登录用户</td></tr><tr><td>IK_PWD</td><td>爱快登录密码</td></tr></tbody></table><p>使用命令部署 exporter：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">docker run -d -p 9222:9090 -e IK_URL=http://10.10.1.253 -e IK_USER=test -e IK_PWD=test123 \</span><br><span class="line">  jakes/ikuai-exporter:latest</span><br></pre></td></tr></table></figure><h2 id="配置-Prometheus"><a href="#配置-Prometheus" class="headerlink" title="配置 Prometheus"></a>配置 Prometheus</h2><p>在 Prometheus 的配置文件 <code>prometheus.yml</code> 中增加如下配置：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">scrape_configs:</span></span><br><span class="line">  <span class="bullet">-</span> <span class="attr">job_name:</span> <span class="string">&#x27;ikuai_exporter&#x27;</span></span><br><span class="line">    <span class="attr">scrape_interval:</span> <span class="string">2s</span></span><br><span class="line">    <span class="attr">scrape_timeout:</span> <span class="string">2s</span></span><br><span class="line">    <span class="attr">relabel_configs:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">source_labels:</span> [<span class="string">__address__</span>]</span><br><span class="line">        <span class="attr">target_label:</span> <span class="string">instance</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">target_label:</span> <span class="string">__address__</span></span><br><span class="line">        <span class="attr">replacement:</span> <span class="number">10.10</span><span class="number">.1</span><span class="number">.202</span><span class="string">:9222</span>  <span class="comment"># exporter.</span></span><br><span class="line">    <span class="attr">static_configs:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="attr">targets:</span></span><br><span class="line">          <span class="bullet">-</span> <span class="number">10.10</span><span class="number">.1</span><span class="number">.253</span></span><br></pre></td></tr></table></figure><p>上面的配置中，<code>10.10.1.253</code> 是爱快的路由器 IP 地址，<code>10.10.1.202:9222</code> 是 Exporter 容器的地址。</p><p>默认 Prometheus 采集到的数据是被归到 exporter 的下面，但我们 exporter 和爱快是分开的，这样会导致显示的数据来源错误。</p><p>这样配置就可以让指标数据正确从 <code>10.10.1.202:9222</code> 采集到，并正确显示 <code>instance</code> 是 <code>10.10.1.253</code>。</p><p><img src="https://images.imoe.tech/blog/Snipaste_2023-01-04_10-04-26.png" alt="配置好的采集目标"></p><p>iKuai Exporter 暴露的指标是以 ikuai 开头的，采集配置成功后就可以在 Prometheus 查询到。</p><p><img src="https://images.imoe.tech/blog/Snipaste_2023-01-04_10-12-49.png" alt="导入的爱快指标"></p><h2 id="配置-Grafana"><a href="#配置-Grafana" class="headerlink" title="配置 Grafana"></a>配置 Grafana</h2><h3 id="配置监控面板"><a href="#配置监控面板" class="headerlink" title="配置监控面板"></a>配置监控面板</h3><p>我根据需要配置好了一个 Grafana 的面板，效果如图。</p><p><img src="https://images.imoe.tech/blog/Snipaste_2023-01-04_09-26-42.png" alt="爱快监控面板"></p><p>下面是我配置好的监控面板，可以导入到 Grafana 后再根据需要进行微调。</p><figure class="highlight json"><figcaption><span>面板配置[展开] >folded</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span><span class="attr">&quot;annotations&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;list&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;builtIn&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;grafana&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;-- Grafana --&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;enable&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;iconColor&quot;</span><span class="punctuation">:</span> <span class="string">&quot;rgba(0, 211, 255, 1)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;name&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Annotations &amp; Alerts&quot;</span><span class="punctuation">,</span><span class="attr">&quot;target&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;limit&quot;</span><span class="punctuation">:</span> <span class="number">100</span><span class="punctuation">,</span><span class="attr">&quot;matchAny&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;tags&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dashboard&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dashboard&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;editable&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;fiscalYearStartMonth&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;graphTooltip&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;links&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;liveNow&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;panels&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dp-AfZzIz&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;percentunit&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;colorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;value&quot;</span><span class="punctuation">,</span><span class="attr">&quot;graphMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;area&quot;</span><span class="punctuation">,</span><span class="attr">&quot;justifyMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;orientation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reduceOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;values&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;textMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;avg(ikuai_cpu_usage_ratio&#123;instance=\&quot;$instance\&quot;&#125;)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;CPU Usage&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;stat&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dp-AfZzIz&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;percentunit&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">4</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;colorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;value&quot;</span><span class="punctuation">,</span><span class="attr">&quot;graphMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;area&quot;</span><span class="punctuation">,</span><span class="attr">&quot;justifyMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;orientation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reduceOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;values&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;textMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_memory_usage_bytes&#123;instance=\&quot;$instance\&quot;&#125;/ikuai_memory_size_bytes&#123;instance=\&quot;$instance\&quot;&#125;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Memory Usage&quot;</span><span class="punctuation">,</span><span class="attr">&quot;transformations&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;stat&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dp-AfZzIz&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dtdhms&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;colorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;value&quot;</span><span class="punctuation">,</span><span class="attr">&quot;graphMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;area&quot;</span><span class="punctuation">,</span><span class="attr">&quot;justifyMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;orientation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reduceOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;values&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;textMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_uptime&#123;id=\&quot;host\&quot;, instance=\&quot;$instance\&quot;&#125;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Uptime&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;stat&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;celsius&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">8</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">8</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;colorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;value&quot;</span><span class="punctuation">,</span><span class="attr">&quot;graphMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;area&quot;</span><span class="punctuation">,</span><span class="attr">&quot;justifyMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;orientation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reduceOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;values&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;textMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_cpu_temperature&#123;instance=\&quot;$instance\&quot;&#125;\n&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Temperature&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;stat&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;palette-classic&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">10</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">14</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;colorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;value&quot;</span><span class="punctuation">,</span><span class="attr">&quot;graphMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;area&quot;</span><span class="punctuation">,</span><span class="attr">&quot;justifyMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;orientation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reduceOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;values&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;textMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_conn_count&#123;id=\&quot;host\&quot;, instance=\&quot;$instance\&quot;&#125;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Connection&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;stat&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;align&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;cellOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;inspect&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;minWidth&quot;</span><span class="punctuation">:</span> <span class="number">50</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;bytes&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;上传&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.width&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">98</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;IP&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.width&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">104</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;下载&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.width&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">112</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">17</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;footer&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;countRows&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reducer&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;sum&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;show&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;frameIndex&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;showHeader&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;sortBy&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;sort_desc(increase(ikuai_network_send_bytes&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125;[$__range]) * on(id, instance) group_left(comment, ip_addr) ikuai_device_info)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;increase(ikuai_network_recv_bytes&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125;[$__range])&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;流量统计&quot;</span><span class="punctuation">,</span><span class="attr">&quot;transformations&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;merge&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;organize&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;excludeByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;indexByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="number">7</span><span class="punctuation">,</span><span class="attr">&quot;comment&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="number">4</span><span class="punctuation">,</span><span class="attr">&quot;ip_addr&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;renameByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="string">&quot;上传&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="string">&quot;下载&quot;</span><span class="punctuation">,</span><span class="attr">&quot;comment&quot;</span><span class="punctuation">:</span> <span class="string">&quot;备注&quot;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;ip_addr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;IP&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;align&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;cellOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;inspect&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;minWidth&quot;</span><span class="punctuation">:</span> <span class="number">50</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;bytes&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">18</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">18</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;footer&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;countRows&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reducer&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;sum&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;show&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showHeader&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;sort_desc((increase(ikuai_network_send_bytes&#123;id=~\&quot;iface/.*\&quot;, instance=\&quot;$instance\&quot;&#125;[$__range]) and on(id) ikuai_uptime &gt; 0) * on (id) group_left(interface, comment) ikuai_iface_info)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;(increase(ikuai_network_recv_bytes&#123;id=~\&quot;iface/.*\&quot;, instance=\&quot;$instance\&quot;&#125;[$__range]) and on(id) ikuai_uptime &gt; 0)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;公网流量统计&quot;</span><span class="punctuation">,</span><span class="attr">&quot;transformations&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;merge&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;organize&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;excludeByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;indexByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="number">7</span><span class="punctuation">,</span><span class="attr">&quot;comment&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="number">4</span><span class="punctuation">,</span><span class="attr">&quot;interface&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;renameByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="string">&quot;上传&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="string">&quot;下载&quot;</span><span class="punctuation">,</span><span class="attr">&quot;comment&quot;</span><span class="punctuation">:</span> <span class="string">&quot;备注&quot;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;ip_addr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;IP&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dp-AfZzIz&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;thresholds&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;align&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;cellOptions&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;inspect&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;minWidth&quot;</span><span class="punctuation">:</span> <span class="number">50</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Bps&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;upload&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.cellOptions&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;color-text&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;download&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.cellOptions&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;color-text&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;connection&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;unit&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;decimals&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;mac&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.width&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">11</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;footer&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;countRows&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;fields&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;reducer&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;sum&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;show&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showHeader&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;sortBy&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;desc&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;displayName&quot;</span><span class="punctuation">:</span> <span class="string">&quot;upload&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;pluginVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;9.4.13&quot;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;sort_desc(ikuai_network_send_kbytes_per_second&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125;* on (id, instance) group_left(ip_addr, hostname, comment, mac) ikuai_device_info)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_recv_kbytes_per_second&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left(ip_addr, hostname, comment, mac) ikuai_device_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_conn_count&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left ikuai_device_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;format&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;__auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;C&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Devices&quot;</span><span class="punctuation">,</span><span class="attr">&quot;transformations&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;merge&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;organize&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;excludeByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;indexByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Time&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="number">9</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="number">10</span><span class="punctuation">,</span><span class="attr">&quot;Value #C&quot;</span><span class="punctuation">:</span> <span class="number">8</span><span class="punctuation">,</span><span class="attr">&quot;comment&quot;</span><span class="punctuation">:</span> <span class="number">4</span><span class="punctuation">,</span><span class="attr">&quot;hostname&quot;</span><span class="punctuation">:</span> <span class="number">3</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;ip_addr&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="number">7</span><span class="punctuation">,</span><span class="attr">&quot;mac&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;renameByName&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;Value #A&quot;</span><span class="punctuation">:</span> <span class="string">&quot;upload&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #B&quot;</span><span class="punctuation">:</span> <span class="string">&quot;download&quot;</span><span class="punctuation">,</span><span class="attr">&quot;Value #C&quot;</span><span class="punctuation">:</span> <span class="string">&quot;connection&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hostname&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instance&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;ip_addr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ip&quot;</span><span class="punctuation">,</span><span class="attr">&quot;job&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;palette-classic&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;axisCenteredZero&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;axisColorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;text&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisLabel&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisPlacement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;auto&quot;</span><span class="punctuation">,</span><span class="attr">&quot;barAlignment&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;drawStyle&quot;</span><span class="punctuation">:</span> <span class="string">&quot;line&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fillOpacity&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;gradientMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hideFrom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;viz&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineInterpolation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">,</span><span class="attr">&quot;lineWidth&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;pointSize&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;scaleDistribution&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showPoints&quot;</span><span class="punctuation">:</span> <span class="string">&quot;never&quot;</span><span class="punctuation">,</span><span class="attr">&quot;spanNulls&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;stacking&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;group&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">,</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;thresholdsStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;off&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Bps&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byName&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;netin&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.transform&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;negative-Y&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">10</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;max&quot;</span><span class="punctuation">,</span> <span class="string">&quot;mean&quot;</span><span class="punctuation">,</span> <span class="string">&quot;last&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;displayMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;placement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;right&quot;</span><span class="punctuation">,</span><span class="attr">&quot;showLegend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;single&quot;</span><span class="punctuation">,</span><span class="attr">&quot;sort&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_send_kbytes_per_second&#123;id=\&quot;host\&quot;, instance=\&quot;$instance\&quot;&#125;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;netout&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_recv_kbytes_per_second&#123;id=\&quot;host\&quot;, instance=\&quot;$instance\&quot;&#125;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;netin&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Host Network IO/s&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;timeseries&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;palette-classic&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;axisCenteredZero&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;axisColorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;text&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisLabel&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisPlacement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;left&quot;</span><span class="punctuation">,</span><span class="attr">&quot;barAlignment&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;drawStyle&quot;</span><span class="punctuation">:</span> <span class="string">&quot;line&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fillOpacity&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;gradientMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hideFrom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;viz&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineInterpolation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">,</span><span class="attr">&quot;lineStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;fill&quot;</span><span class="punctuation">:</span> <span class="string">&quot;solid&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineWidth&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;pointSize&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;scaleDistribution&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showPoints&quot;</span><span class="punctuation">:</span> <span class="string">&quot;never&quot;</span><span class="punctuation">,</span><span class="attr">&quot;spanNulls&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;stacking&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;group&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">,</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;thresholdsStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;off&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Bps&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byRegexp&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;.*netin&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.transform&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;negative-Y&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">6</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">12</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">10</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">15</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;max&quot;</span><span class="punctuation">,</span> <span class="string">&quot;last&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;displayMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;placement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;right&quot;</span><span class="punctuation">,</span><span class="attr">&quot;showLegend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;multi&quot;</span><span class="punctuation">,</span><span class="attr">&quot;sort&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;exemplar&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;(ikuai_network_send_kbytes_per_second&#123;id=~\&quot;iface/.*\&quot;&#125; and on(id) ikuai_uptime &gt; 0) * on (id) group_left(interface, comment) ikuai_iface_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;instant&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;comment&#125;&#125;-netout&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;(ikuai_network_recv_kbytes_per_second&#123;id=~\&quot;iface/.*\&quot;&#125; and on(id) ikuai_uptime &gt; 0) * on (id) group_left(interface, comment) ikuai_iface_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;comment&#125;&#125;-netin&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;timeseries&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;palette-classic&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;axisCenteredZero&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;axisColorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;text&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisLabel&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisPlacement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;left&quot;</span><span class="punctuation">,</span><span class="attr">&quot;barAlignment&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;drawStyle&quot;</span><span class="punctuation">:</span> <span class="string">&quot;line&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fillOpacity&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;gradientMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hideFrom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;viz&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineInterpolation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">,</span><span class="attr">&quot;lineStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;fill&quot;</span><span class="punctuation">:</span> <span class="string">&quot;solid&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineWidth&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;pointSize&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;scaleDistribution&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showPoints&quot;</span><span class="punctuation">:</span> <span class="string">&quot;never&quot;</span><span class="punctuation">,</span><span class="attr">&quot;spanNulls&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;stacking&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;group&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">,</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;thresholdsStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;off&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Bps&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byRegexp&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;.*recv&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.transform&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;negative-Y&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">8</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">13</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">16</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">13</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;max&quot;</span><span class="punctuation">,</span> <span class="string">&quot;lastNotNull&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;displayMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;placement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;right&quot;</span><span class="punctuation">,</span><span class="attr">&quot;showLegend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;sortBy&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Last *&quot;</span><span class="punctuation">,</span><span class="attr">&quot;sortDesc&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;single&quot;</span><span class="punctuation">,</span><span class="attr">&quot;sort&quot;</span><span class="punctuation">:</span> <span class="string">&quot;desc&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_send_kbytes_per_second&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left(ip_addr, comment) ikuai_device_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;ip_addr&#125;&#125;-&#123;&#123;comment&#125;&#125;-send&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_recv_kbytes_per_second&#123;id=~\&quot;device/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left(ip_addr, comment) ikuai_device_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;ip_addr&#125;&#125;-&#123;&#123;comment&#125;&#125;-recv&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Device Network IO/s&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;timeseries&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;description&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fieldConfig&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;defaults&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;palette-classic&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;custom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;axisCenteredZero&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;axisColorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;text&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisLabel&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;axisPlacement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;left&quot;</span><span class="punctuation">,</span><span class="attr">&quot;barAlignment&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;drawStyle&quot;</span><span class="punctuation">:</span> <span class="string">&quot;line&quot;</span><span class="punctuation">,</span><span class="attr">&quot;fillOpacity&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;gradientMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hideFrom&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;viz&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineInterpolation&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">,</span><span class="attr">&quot;lineStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;fill&quot;</span><span class="punctuation">:</span> <span class="string">&quot;solid&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;lineWidth&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;pointSize&quot;</span><span class="punctuation">:</span> <span class="number">5</span><span class="punctuation">,</span><span class="attr">&quot;scaleDistribution&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linear&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;showPoints&quot;</span><span class="punctuation">:</span> <span class="string">&quot;never&quot;</span><span class="punctuation">,</span><span class="attr">&quot;spanNulls&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;stacking&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;group&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">,</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;thresholdsStyle&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;off&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;decimals&quot;</span><span class="punctuation">:</span> <span class="number">2</span><span class="punctuation">,</span><span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;thresholds&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;absolute&quot;</span><span class="punctuation">,</span><span class="attr">&quot;steps&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;green&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">null</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;color&quot;</span><span class="punctuation">:</span> <span class="string">&quot;red&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="number">80</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;unit&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Bps&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;overrides&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;matcher&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;byRegexp&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="string">&quot;.*recv&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;custom.transform&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;negative-Y&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;gridPos&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;h&quot;</span><span class="punctuation">:</span> <span class="number">8</span><span class="punctuation">,</span><span class="attr">&quot;w&quot;</span><span class="punctuation">:</span> <span class="number">11</span><span class="punctuation">,</span><span class="attr">&quot;x&quot;</span><span class="punctuation">:</span> <span class="number">13</span><span class="punctuation">,</span><span class="attr">&quot;y&quot;</span><span class="punctuation">:</span> <span class="number">16</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;id&quot;</span><span class="punctuation">:</span> <span class="number">11</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;legend&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;calcs&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;max&quot;</span><span class="punctuation">,</span> <span class="string">&quot;mean&quot;</span><span class="punctuation">,</span> <span class="string">&quot;last&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;displayMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;table&quot;</span><span class="punctuation">,</span><span class="attr">&quot;placement&quot;</span><span class="punctuation">:</span> <span class="string">&quot;right&quot;</span><span class="punctuation">,</span><span class="attr">&quot;showLegend&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;tooltip&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;multi&quot;</span><span class="punctuation">,</span><span class="attr">&quot;sort&quot;</span><span class="punctuation">:</span> <span class="string">&quot;none&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;targets&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_send_kbytes_per_second&#123;id=~\&quot;iface/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left(interface, comment) ikuai_iface_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;interface&#125;&#125;-&#123;&#123;comment&#125;&#125;-send&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;A&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="punctuation">&#123;</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;editorMode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;code&quot;</span><span class="punctuation">,</span><span class="attr">&quot;expr&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ikuai_network_recv_kbytes_per_second&#123;id=~\&quot;iface/.*\&quot;, instance=\&quot;$instance\&quot;&#125; * on (id) group_left(interface, comment) ikuai_iface_info&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;legendFormat&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&#123;&#123;interface&#125;&#125;-&#123;&#123;comment&#125;&#125;-recv&quot;</span><span class="punctuation">,</span><span class="attr">&quot;range&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;B&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;Interface Network IO/s&quot;</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;timeseries&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;refresh&quot;</span><span class="punctuation">:</span> <span class="string">&quot;5s&quot;</span><span class="punctuation">,</span><span class="attr">&quot;revision&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;schemaVersion&quot;</span><span class="punctuation">:</span> <span class="number">38</span><span class="punctuation">,</span><span class="attr">&quot;style&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dark&quot;</span><span class="punctuation">,</span><span class="attr">&quot;tags&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;templating&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;list&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">&#123;</span><span class="attr">&quot;current&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;selected&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;text&quot;</span><span class="punctuation">:</span> <span class="string">&quot;10.0.1.253&quot;</span><span class="punctuation">,</span><span class="attr">&quot;value&quot;</span><span class="punctuation">:</span> <span class="string">&quot;10.0.1.253&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;datasource&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;prometheus&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;dp-AfZzIz&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;definition&quot;</span><span class="punctuation">:</span> <span class="string">&quot;label_values(ikuai_version, instance)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;hide&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;includeAll&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;multi&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;name&quot;</span><span class="punctuation">:</span> <span class="string">&quot;instance&quot;</span><span class="punctuation">,</span><span class="attr">&quot;options&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span><span class="attr">&quot;query&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;query&quot;</span><span class="punctuation">:</span> <span class="string">&quot;label_values(ikuai_version, instance)&quot;</span><span class="punctuation">,</span><span class="attr">&quot;refId&quot;</span><span class="punctuation">:</span> <span class="string">&quot;StandardVariableQuery&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;refresh&quot;</span><span class="punctuation">:</span> <span class="number">1</span><span class="punctuation">,</span><span class="attr">&quot;regex&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;skipUrlSync&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span><span class="attr">&quot;sort&quot;</span><span class="punctuation">:</span> <span class="number">0</span><span class="punctuation">,</span><span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;query&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">]</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;time&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="attr">&quot;from&quot;</span><span class="punctuation">:</span> <span class="string">&quot;now-6h&quot;</span><span class="punctuation">,</span><span class="attr">&quot;to&quot;</span><span class="punctuation">:</span> <span class="string">&quot;now&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;timepicker&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span><span class="attr">&quot;timezone&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">,</span><span class="attr">&quot;title&quot;</span><span class="punctuation">:</span> <span class="string">&quot;爱快网关监控&quot;</span><span class="punctuation">,</span><span class="attr">&quot;uid&quot;</span><span class="punctuation">:</span> <span class="string">&quot;IhtQfGZ4k&quot;</span><span class="punctuation">,</span><span class="attr">&quot;version&quot;</span><span class="punctuation">:</span> <span class="number">4</span><span class="punctuation">,</span><span class="attr">&quot;weekStart&quot;</span><span class="punctuation">:</span> <span class="string">&quot;&quot;</span><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h3 id="告警配置"><a href="#告警配置" class="headerlink" title="告警配置"></a>告警配置</h3><p>这里以创建公网出口监控告警为例。创建 A，B 两个表达式。A 是查询语句，B 是判断语句。</p><p>A 的 <code>Example Range</code> 配置为 <code>now-10m to now</code>，<code>Query</code> 为：</p><figure class="highlight plaintext"><figcaption><span>A</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">absent(ikuai_iface_info&#123;id=&quot;iface/adslct&quot;, internet=&quot;PPPOE&quot;&#125;) OR on() vector(0)</span><br></pre></td></tr></table></figure><p><code>Options</code> 设置 <code>type</code> 为 <code>Instant</code>。</p><p>B 设置操作为 <code>Math</code>, Expression 为 <code>$A == 1</code>。</p><p>接着 <code>Alert Condition</code> 选择 <code>B-expression</code>。既使用 B 的结果作为告警的条件。</p><p><code>Alert evaluation behavior</code> 配置为 <code>1m for 5m</code>。意思为每 1 分钟计算一次，持续 5 分钟就告警。</p><p><code>Add details for your alert</code> 配置 <code>Summary</code> 为</p><figure class="highlight text"><figcaption><span>Summary</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">公网出口 &#123;&#123; $labels.id &#125;&#125; 下线超过 5 分钟！</span><br></pre></td></tr></table></figure><p>其它内容根据自己需要配置，配置完上面说明的保存后就可以生效了，最终效果如下：</p><p><img src="https://images.imoe.tech/blog/Snipaste_2023-01-04_12-40-55.png" alt="出口监控效果"></p><p>Grafana 集成的 Alerts 告警城功能基本和 Alertmanager 是一样的，如果熟悉 Alertmanager 的话配置起来会更得心应手。</p><p><img src="https://images.imoe.tech/blog/Snipaste_2023-01-04_12-49-20.png" alt="告警邮件"></p><article class="message is-info">        <div class="message-header"><p><i class="far fa-edit mr-2"></i>修订历史</p></div>        <div class="message-body">            <ul><li><strong>2023-09-28</strong>：修复 Grafana 面板多实例兼容问题。Issued by @scoryyi</li><li><strong>2023-08-30</strong>：告警设置里 Options 参数补充说明。</li><li><strong>2023-08-26</strong>：修复导入面板后数据源错误问题。Issued by @kun</li><li><strong>2023-04-27</strong>：exporter 启动命令错误。Issued by @cybertech</li></ul>        </div>    </article>]]></content>
    
    
    <summary type="html">&lt;img alt=&quot;&quot; a=&quot;&lt;&quot; src=&quot;https://shields.imoe.tech/badge/爱快-v3.7.1--v3.7.6-brightgreen&quot;&gt;

&lt;p&gt;爱快软路由有非常完善的监控功能，但因为是爱快自己的体系，很不灵活，只能满足基本的使用，没法在 Prometheus 里使用，想要做些告警的功能就更不行了。&lt;/p&gt;
&lt;p&gt;再加上最近跑网心云的虚机挂了几天才发现，搞一套内网的监控告警体系迫在眉睫了。对于家庭内网的监控告警，有几个需求需要满足：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;能统计每天设备的流量&lt;/li&gt;
&lt;li&gt;能对公网在线状态进行监控告警&lt;/li&gt;
&lt;li&gt;能对设备的在线状态进行监控告警&lt;/li&gt;
&lt;/ol&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="工程技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%B7%A5%E7%A8%8B%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="实践" scheme="https://blog.imoe.tech/tags/%E5%AE%9E%E8%B7%B5/"/>
    
    <category term="HomeLab" scheme="https://blog.imoe.tech/tags/HomeLab/"/>
    
  </entry>
  
  <entry>
    <title>使用 JWT 实现会话认证</title>
    <link href="https://blog.imoe.tech/2022/11/14/47-jwt-based-token-validator/"/>
    <id>https://blog.imoe.tech/2022/11/14/47-jwt-based-token-validator/</id>
    <published>2022-11-13T16:00:55.000Z</published>
    <updated>2023-01-03T15:43:13.173Z</updated>
    
    <content type="html"><![CDATA[<p>基于 JWT 来实现 Token 认证很简单，相对复杂的签名认证算法都已经封装到工具包里，使用起来很容易。</p><p>而真正造成 JWT 的应用困难其实是在适应业务上，并不是所有业务都适合原生的 JWT 特性。我们很多时候选型使用 JWT 看重的是其无状态和校验方便的优点，但在实际的业务场景中经常是要 Revoke 功能的。</p><p>如果要实现 Token 的 Revoke 那 Token 会变成有状态，这让人使用起来非常纠结。 我都有状态了为啥还要用 JWT？直接生成一个 ID 存 Redis 不更简单？直接用 Redis 缓存有效性来控制 Token 的有效性，失效删除就好了。</p><p>最终选择使用 JWT 作为 Token 主要是基于以下考虑：</p><ol><li>JWT 实现完善，有丰富的开发工具包；</li><li>不需要自己再设计一套 Token 结构和校验算法；</li><li>可以利用 JWT 进行前置验证（JWT 正确性、是否超时失效），避免每次都查询缓存；</li><li>JWT 可以基于 Payload 存储数据，使用很灵活，可以提供保存一些数据在 Token 中避免查询；</li><li>JWT 签名方法很丰富，能满足以后演进的需要，能实现一处签发，多处验证。</li></ol><span id="more"></span><h2 id="功能点"><a href="#功能点" class="headerlink" title="功能点"></a>功能点</h2><p>使用 JWT 作为认证 Token 应该需要满足我们以下几个需求（部分需求为我当前业务需要的功能，在其它系统中并不常见）。</p><h3 id="Token-包含用户基本信息"><a href="#Token-包含用户基本信息" class="headerlink" title="Token 包含用户基本信息"></a>Token 包含用户基本信息</h3><p>包含用户的基本信息保证了登录认证时，系统可以通过 Token 携带的信息确定用户的身份。JWT 的 Payload 中可以携带信息，我们可以把用户的 uid 等信息存储在 payload 中。</p><h3 id="防止篡改"><a href="#防止篡改" class="headerlink" title="防止篡改"></a>防止篡改</h3><p>非法用户不能通过伪造或修改 Payload 的方式，篡改 Token 代表的实例。JWT 基于签名实现的校验，如果内容被篡改，签名校验无法通过，所以可以有效防止篡改。</p><h3 id="支持不同有效期"><a href="#支持不同有效期" class="headerlink" title="支持不同有效期"></a>支持不同有效期</h3><p>我正在开发的系统需要两种类型的 Token，一种是短期 Token，用户使用 API 可以生成；一种是长期 Token，由管理员通过管理系统生成。</p><p>JWT 在生成时支持指定签名有效期，可以生成需要的 Token。</p><h3 id="长期-Token-需要支持-Revoke"><a href="#长期-Token-需要支持-Revoke" class="headerlink" title="长期 Token 需要支持 Revoke"></a>长期 Token 需要支持 Revoke</h3><p>JWT 通常都是无状态的，因为 JWT 一般都只生成短期 Token，有效期也只是几个小时。就算 Token 泄漏也只是影响短暂的时间，过期后就自动失效，所以也不会有 Revoke 的需求。</p><p>这是 JWT 的一个优点，无需专门的维护和存储 Token。但在里需要生成长期 Token，生成后如果无状态将无法保证安全，被盗用后的不能失效会产生持续的安全风险。</p><p>为了实现 Revoke，很显然需要对 Token 进行持久化操作。可以将 Token 按照有效期不同进行区分：</p><ol><li>短期 Token：无状态管理，仅能通过 API 生成，不支持 Revoke；</li><li>长期 Token：有状态管理，由管理员在管理后台生成，生成时写表保存，通过 Redis 缓存 Token 数据，支持 Revoke。</li></ol><p>当对长期 Token 进行 Revoke 操作时，同步修改表数据和 Token 缓存中的 Token 状态，实时失效。</p><p>Token 在校验时，分为以下步骤：</p><ol><li>先校验 JWT 签名</li><li>根据 Token 失效时间，如果是长期 Token，查询缓存是否实效。</li></ol><h2 id="实现"><a href="#实现" class="headerlink" title="实现"></a>实现</h2><p>JWT 相关实现依赖于第三方的 Golang 包 <code>github.com/golang-jwt/jwt/v4</code>，通过以下方式获取：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">go get github.com/golang-jwt/jwt/v4</span><br></pre></td></tr></table></figure><p>首先实现 JWT 的 Payload 结构，用来存储我们的 Payload 数据。Payload 在这个包中叫作 Claims，是一个接口。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Token <span class="keyword">struct</span> &#123;</span><br><span class="line">jwt.RegisteredClaims</span><br><span class="line"></span><br><span class="line">Sub     <span class="type">string</span> <span class="string">`json:&quot;sub&quot;`</span>      <span class="comment">// User Name</span></span><br><span class="line">Uid     <span class="type">uint64</span> <span class="string">`json:&quot;uid&quot;`</span>      <span class="comment">// App Id</span></span><br><span class="line">TokenId <span class="type">uint64</span> <span class="string">`json:&quot;token_id&quot;`</span> <span class="comment">// Token Id，用于唯一标识 Token</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>在 Payload 中除了用户的信息，还包括了 Token Id，用于方便标识 Token。可以快速定位到 Token 的缓存信息，方便查询有效性。</p><p><code>jwt.RegisteredClaims</code> 是 JWT 包中的结构本，该结构体实现了 <code>Claims</code> 接口，通过引入这个结构体可以让 jwt 在解析的时候自动把 Payload 的数据写到我们的字段中。</p><h3 id="校验"><a href="#校验" class="headerlink" title="校验"></a>校验</h3><p>方法实现如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(j *JwtChecker)</span></span> Verify(token <span class="type">string</span>) (*ptoken.Token, <span class="type">error</span>) &#123;</span><br><span class="line">tokenObj, err := jwt.ParseWithClaims(token, &amp;ptoken.Token&#123;&#125;, <span class="function"><span class="keyword">func</span><span class="params">(token *jwt.Token)</span></span> (<span class="keyword">interface</span>&#123;&#125;, <span class="type">error</span>) &#123;</span><br><span class="line"><span class="keyword">if</span> _, ok := token.Method.(*jwt.SigningMethodHMAC); !ok &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;unexpected signing method: %v&quot;</span>, token.Header[<span class="string">&quot;alg&quot;</span>])</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> []<span class="type">byte</span>(j.secret), <span class="literal">nil</span></span><br><span class="line">&#125;, jwt.WithJSONNumber())</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> claims, ok := tokenObj.Claims.(*ptoken.Token); ok &amp;&amp; tokenObj.Valid &#123;</span><br><span class="line"><span class="keyword">return</span> claims, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;invalid token: %v&quot;</span>, token)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ol><li>方法首先通过 <code>Keyfunc</code> 校验签名方法是否 HMAC</li><li>如果 <code>ParseWithClaims</code> 没错误，那么 Token 正确解析</li><li>判断 Token 是否有效并进行类型转换</li></ol><h3 id="长期-Token-判断"><a href="#长期-Token-判断" class="headerlink" title="长期 Token 判断"></a>长期 Token 判断</h3><p>从解析好的 Token 对象中可以直接获取到 Token 的失效时间。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Token)</span></span> IsLongToken() <span class="type">bool</span> &#123;</span><br><span class="line"><span class="keyword">return</span> IsTimeLongToken(t.ExpiresAt.Time)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">IsTimeLongToken</span><span class="params">(expireTime time.Time)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">now := time.Now()</span><br><span class="line"><span class="keyword">return</span> expireTime.After(now.Add(DefaultShortTermTokenPeriod))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果 Token 是长期 Token，我们可以去缓存中查询 token 是否有效。</p><h3 id="生成-Token"><a href="#生成-Token" class="headerlink" title="生成 Token"></a>生成 Token</h3><p>生成 Token 很简单，只需要依次调用 <code>NewWithClaims</code> 和 <code>SignedString</code> 就能生成签名后的 Token，这里使用 HmacSHA256 的签名算法。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">DoGenerateToken</span><span class="params">(token *Token, secret <span class="type">string</span>, expire time.Duration)</span></span> (*TokenHolder, <span class="type">error</span>) &#123;</span><br><span class="line">now := time.Now()</span><br><span class="line">token.IssuedAt = jwt.NewNumericDate(now)</span><br><span class="line">token.ExpiresAt = jwt.NewNumericDate(now.Add(expire))</span><br><span class="line"></span><br><span class="line">jwtToken := jwt.NewWithClaims(jwt.SigningMethodHS256, token)</span><br><span class="line">signed, err := jwtToken.SignedString([]<span class="type">byte</span>(secret))</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> &amp;TokenHolder&#123;</span><br><span class="line">Token:    token,</span><br><span class="line">JwtToken: signed,</span><br><span class="line">&#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Token 的失效时间需要在签名前手动设置到 <code>ExpiresAt</code> 字段，并不是以某个参数的方式提供的。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>本文介绍的和一般 JWT 用法并无多少区别，JWT 大家也都是这样用的。而对于需要 Revoke 这个功能，我这里是使用有效时间，将其区分成两种情况。</p><blockquote><p>对于短期 Token，使用的还是 JWT 常规方法；对于长期 Token 则进行持久化，并对 Token 的信息进行多级缓存。</p></blockquote><p>在我们的使用场景中，长期 Token 是非常少的，而长期 Token 的 Revoke 就更少了，所以查询持久化的 Token 信息并不会很多。</p><p>如果未来有演进需要，大量使用长期 Token，也可以将 Token 信息缓存调整成为 Revoke 黑名单列表的方式。只对 Revoke 的 Token 进行记录和查询，毕竟 Revoke 的数据终究还是少数。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;基于 JWT 来实现 Token 认证很简单，相对复杂的签名认证算法都已经封装到工具包里，使用起来很容易。&lt;/p&gt;
&lt;p&gt;而真正造成 JWT 的应用困难其实是在适应业务上，并不是所有业务都适合原生的 JWT 特性。我们很多时候选型使用 JWT 看重的是其无状态和校验方便的优点，但在实际的业务场景中经常是要 Revoke 功能的。&lt;/p&gt;
&lt;p&gt;如果要实现 Token 的 Revoke 那 Token 会变成有状态，这让人使用起来非常纠结。 我都有状态了为啥还要用 JWT？直接生成一个 ID 存 Redis 不更简单？直接用 Redis 缓存有效性来控制 Token 的有效性，失效删除就好了。&lt;/p&gt;
&lt;p&gt;最终选择使用 JWT 作为 Token 主要是基于以下考虑：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;JWT 实现完善，有丰富的开发工具包；&lt;/li&gt;
&lt;li&gt;不需要自己再设计一套 Token 结构和校验算法；&lt;/li&gt;
&lt;li&gt;可以利用 JWT 进行前置验证（JWT 正确性、是否超时失效），避免每次都查询缓存；&lt;/li&gt;
&lt;li&gt;JWT 可以基于 Payload 存储数据，使用很灵活，可以提供保存一些数据在 Token 中避免查询；&lt;/li&gt;
&lt;li&gt;JWT 签名方法很丰富，能满足以后演进的需要，能实现一处签发，多处验证。&lt;/li&gt;
&lt;/ol&gt;</summary>
    
    
    
    <category term="Tech" scheme="https://blog.imoe.tech/categories/Tech/"/>
    
    <category term="工程技术" scheme="https://blog.imoe.tech/categories/Tech/%E5%B7%A5%E7%A8%8B%E6%8A%80%E6%9C%AF/"/>
    
    
    <category term="技术" scheme="https://blog.imoe.tech/tags/%E6%8A%80%E6%9C%AF/"/>
    
    <category term="实践" scheme="https://blog.imoe.tech/tags/%E5%AE%9E%E8%B7%B5/"/>
    
  </entry>
  
</feed>
