aboutsummaryrefslogtreecommitdiff
path: root/lume/src/notes/2024/essential-k8s.mdx
blob: 1905fcbab21a6482c72d8cc36260ae73875957b1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
---
title: "My first deploys for a new Kubernetes cluster"
desc: "This is documentation for myself, but you may enjoy it too"
date: 2024-11-03
hero:
  ai: Photo by Xe Iaso, iPhone 13 Pro
  file: cloudfront
  prompt: "An airplane window looking out to cloudy skies."
---

I'm setting up some cloud Kubernetes clusters for a bit coming up on the blog. As a result, I need some documentation on what a "standard" cluster looks like. This is that documentation.

<Conv name="Mara" mood="hacker">
  Every Kubernetes term is WrittenInGoPublicValueCase. If you aren't sure what
  one of those terms means, google "site:kubernetes.io KubernetesTerm".
</Conv>

I'm assuming that the cluster is named `mechonis`.

For the "core" of a cluster, I need these services set up:

- Secret syncing with the [1Password operator](https://developer.1password.com/docs/k8s/k8s-operator/)
- Certificate management with [cert-manager](https://cert-manager.io/)
- DNS management with [external-dns](https://kubernetes-sigs.github.io/external-dns/v0.15.0/)
- HTTP ingress with [ingress-nginx](https://kubernetes.github.io/ingress-nginx/)
- High-latency high-volume storage with [csi-s3](https://github.com/yandex-cloud/k8s-csi-s3) pointed to [Tigris](https://tigrisdata.com) (technically optional, but including it for consistency)
- The [metrics-server](https://github.com/kubernetes-sigs/metrics-server) so [k9s](https://k9scli.io) can see how much free CPU and RAM the cluster has

These all complete different aspects of the three core features of any cloud deployment: compute, network, and storage. Most of my data will be hosted in the default StorageClass implementation provided by the platform (or in the case of baremetal clusters, something like [Longhorn](https://longhorn.io)), so the csi-s3 StorageClass is more of a "I need lots of data but am cheap" than anything.

Most of this will be managed with [helmfile](https://github.com/helmfile/helmfile), but 1Password can't be.

## 1Password

The most important thing at the core of my k8s setups is the [1Password operator](https://developer.1password.com/docs/k8s/k8s-operator/). This syncs 1password secrets to my Kubernetes clusters, so I don't need to define them in Secrets manually or risk putting the secret values into my OSS repos. This is done separately as I'm not able to use helmfile

After you have [the `op` command set up](https://developer.1password.com/docs/cli/get-started/), create a new server with access to the `Kubernetes` vault:

```
op connect server create mechonis --vaults Kubernetes
```

Then install the 1password connect Helm release with `operator.create` set to `true`:

```
helm repo add \
  1password https://1password.github.io/connect-helm-charts/
helm install \
  connect \
  1password/connect \
  --set-file connect.credentials=1password-credentials.json \
  --set operator.create=true \
  --set operator.token.value=$(op connect token create --server mechonis --vault Kubernetes)
```

Now you can deploy OnePasswordItem resources as normal:

```yaml
apiVersion: onepassword.com/v1
kind: OnePasswordItem
metadata:
  name: falin
spec:
  itemPath: vaults/Kubernetes/items/Falin
```

## cert-manager, ingress-nginx, metrics-server, and csi-s3

In the cluster folder, create a file called `helmfile.yaml`. Copy these contents:

<details>
<summary>helmfile.yaml</summary>

```yaml
repositories:
  - name: jetstack
    url: https://charts.jetstack.io
  - name: csi-s3
    url: cr.yandex/yc-marketplace/yandex-cloud/csi-s3
    oci: true
  - name: ingress-nginx
    url: https://kubernetes.github.io/ingress-nginx
  - name: metrics-server
    url: https://kubernetes-sigs.github.io/metrics-server/

releases:
  - name: cert-manager
    kubeContext: mechonis
    chart: jetstack/cert-manager
    createNamespace: true
    namespace: cert-manager
    version: v1.16.1
    set:
      - name: installCRDs
        value: "true"
      - name: prometheus.enabled
        value: "false"
  - name: csi-s3
    kubeContext: mechonis
    chart: csi-s3/csi-s3
    namespace: kube-system
    set:
      - name: "storageClass.name"
        value: "tigris"
      - name: "secret.accessKey"
        value: ""
      - name: "secret.secretKey"
        value: ""
      - name: "secret.endpoint"
        value: "https://fly.storage.tigris.dev"
      - name: "secret.region"
        value: "auto"
  - name: ingress-nginx
    chart: ingress-nginx/ingress-nginx
    kubeContext: mechonis
    namespace: ingress-nginx
    createNamespace: true
  - name: metrics-server
    kubeContext: mechonis
    chart: metrics-server/metrics-server
    namespace: kube-system
```

</details>

Create a new admin access token in the [Tigris console](https://console.tigris.dev) and copy its access key ID and secret access key into `secret.accessKey` and `secret.secretKey` respectively.

Run `helmfile apply`:

```
$ helmfile apply
```

This will take a second to think, and then everything should be set up. The LoadBalancer Service may take a minute or ten to get a public IP depending on which cloud you are setting things up on, but once it's done you can proceed to setting up DNS.

## external-dns

The next kinda annoying part is getting [external-dns](https://kubernetes-sigs.github.io/external-dns/latest/) set up. It's something that looks like it should be packageable with something like Helm, but realistically it's such a generic tool that you're really better off making your own manifests and deploying it by hand. In my setup, I use these features of external-dns:

- The [AWS Route 53](https://aws.amazon.com/route53/) DNS backend
- The [AWS DynamoDB](https://aws.amazon.com/dynamodb/) registry to remember what records should be set in Route 53

You will need two DynamoDB tables:

- `external-dns-mechonis-crd`: for records created with DNSEndpoint resources
- `external-dns-mechonis-ingress`: for records created with Ingress resources

Create a terraform configuration for setting up these DynamoDB configuration values:

<details>
<summary>main.tf</summary>

```hcl
terraform {
  backend "s3" {
    bucket = "within-tf-state"
    key    = "k8s/mechonis/external-dns"
    region = "us-east-1"
  }
}

resource "aws_dynamodb_table" "external_dns_crd" {
  name           = "external-dns-crd-mechonis"
  billing_mode   = "PROVISIONED"
  read_capacity  = 1
  write_capacity = 1
  table_class    = "STANDARD"

  attribute {
    name = "k"
    type = "S"
  }

  hash_key = "k"
}

resource "aws_dynamodb_table" "external_dns_ingress" {
  name           = "external-dns-ingress-mechonis"
  billing_mode   = "PROVISIONED"
  read_capacity  = 1
  write_capacity = 1
  table_class    = "STANDARD"

  attribute {
    name = "k"
    type = "S"
  }

  hash_key = "k"
}
```

</details>

Create the tables with `terraform apply`:

```
terraform init
terraform apply --auto-approve # yolo!
```

While that cooks, head over to `~/Code/Xe/x/kube/rhadamanthus/core/external-dns` and copy the contents to `~/Code/Xe/x/kube/mechonis/core/external-dns`. Then open `deployment-crd.yaml` and replace the DynamoDB table in the `crd` container's args:

```diff
         args:
         - --source=crd
         - --crd-source-apiversion=externaldns.k8s.io/v1alpha1
         - --crd-source-kind=DNSEndpoint
         - --provider=aws
         - --registry=dynamodb
         - --dynamodb-region=ca-central-1
-        - --dynamodb-table=external-dns-crd-rhadamanthus
+        - --dynamodb-table=external-dns-crd-mechonis
```

And in `deployment-ingress.yaml`:

```diff
         args:
         - --source=ingress
-        - --default-targets=rhadamanthus.xeserv.us
+        - --default-targets=mechonis.xeserv.us
         - --provider=aws
         - --registry=dynamodb
         - --dynamodb-region=ca-central-1
-        - --dynamodb-table=external-dns-ingress-rhadamanthus
+        - --dynamodb-table=external-dns-ingress-mechonis
```

Apply these configs with `kubectl apply`:

```
kubectl apply -k .
```

Then write a DNSEndpoint pointing to the created LoadBalancer. You may have to look up the IP addresses in the admin console of the cloud platform in question.

<details>
<summary>load-balancer-dns.yaml</summary>

```yaml
apiVersion: externaldns.k8s.io/v1alpha1
kind: DNSEndpoint
metadata:
  name: load-balancer-dns
spec:
  endpoints:
    - dnsName: mechonis.xeserv.us
      recordTTL: 3600
      recordType: A
      targets:
        - whatever.ipv4.goes.here
    - dnsName: mechonis.xeserv.us
      recordTTL: 3600
      recordType: AAAA
      targets:
        - 2000:something:goes:here:lol
```

</details>

Apply it with `kubectl apply`:

```
kubectl apply -f load-balancer-dns.yaml
```

This will point `mechonis.xeserv.us` to the LoadBalancer, which will point to ingress-nginx based on Ingress configurations, which will route to your Services and Deployments, using Certs from cert-manager.

## cert-manager ACME issuers

Copy the contents of `~/Code/Xe/x/kube/rhadamanthus/core/cert-manager` to `~/Code/Xe/x/kube/mechonis/core/cert-manager`. Apply them as-is, no changes are needed:

```
kubectl apply -k .
```

This will create `letsencrypt-prod` and `letsencrypt-staging` ClusterIssuers, which will allow the creation of Let's Encrypt certificates in their production and staging environments. 9 times out of 10, you won't need the staging environment, but when you are doing high-churn things involving debugging the certificate issuing setup, the staging environment is very useful because it has a [much higher rate limit](https://letsencrypt.org/docs/staging-environment/) than [the production environment](https://letsencrypt.org/docs/rate-limits/) does.

## Deploying a "hello, world" workload

<Conv name="Mara" mood="hacker">
  Nearly every term for "unit of thing to do" is taken by different aspects of
  Kubernetes and its ecosystem. The only one that isn't taken is "workload". A
  workload is a unit of work deployed somewhere, in practice this boils down to
  a Deployment, its Service, any PersistentVolumeClaims, Ingresses, or other
  resources that it needs in order to run.
</Conv>

Now you can put everything into test by making a simple "hello, world" workload. This will include:

- A ConfigMap to store HTML to show to the user
- A Deployment to run nginx pointed at the contents of the ConfigMap
- A Service to give an internal DNS name for that Deployment's Pods
- An Ingress to route traffic to that Service from the public Internet

Make a folder called `hello-world` and put these files in it:

<details>
<summary>configmap.yaml</summary>

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: hello-world
data:
  index.html: |
    <html>
    <head>
      <title>Hello World!</title>
    </head>
    <body>Hello World!</body>
    </html>
```

</details>
<details>
<summary>deployment.yaml</summary>

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-world
spec:
  selector:
    matchLabels:
      app: hello-world
  replicas: 1
  template:
    metadata:
      labels:
        app: hello-world
    spec:
      containers:
        - name: web
          image: nginx
          ports:
            - containerPort: 80
          volumeMounts:
            - name: html
              mountPath: /usr/share/nginx/html
      volumes:
        - name: html
          configMap:
            name: hello-world
```

</details>
<details>
<summary>service.yaml</summary>

```yaml
apiVersion: v1
kind: Service
metadata:
  name: hello-world
spec:
  ports:
    - port: 80
      protocol: TCP
  selector:
    app: hello-world
```

</details>
<details>
<summary>ingress.yaml</summary>

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-world
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - hello.mechonis.xeserv.us
      secretName: hello-mechonis-xeserv-us-tls
  rules:
    - host: hello.mechonis.xeserv.us
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hello-world
                port:
                  number: 80
```

</details>
<details>
<summary>kustomization.yaml</summary>

```yaml
resources:
  - configmap.yaml
  - deployment.yaml
  - service.yaml
  - ingress.yaml
```

</details>

Then apply it with `kubectl apply`:

```
kubectl apply -k .
```

It will take a minute for it to work, but here are the things that will be done in order so you can validate them:

- The Ingress object has the `cert-manager.io/cluster-issuer: "letsencrypt-prod"` annotation, which triggers cert-manager to create a Cert for the Ingress
- The Cert notices that there's no data in the Secret `hello-mechonis-xeserv-us-tls` in the default Namespace, so it creates an Order for a new certificate from the `letsencrypt-prod` ClusterIssuer (set up in the cert-manager apply step earlier)
- The Order creates a new Challenge for that certificate, setting a DNS record in Route 53 and then waiting until it can validate that the Challenge matches what it expects
- cert-manager asks Let's Encrypt to check the Challenge
- The Order succeeds and the certificate data is written to the Secret `hello-mechonis-xeserv-us-tls` in the default Namespace
- ingress-nginx is informed that the Secret has been updated and rehashes its configuration accordingly
- HTTPS routing is set up for the `hello-world` service so every request to `hello.mechonis.xeserv.us` points to the Pods managed by the `hello-world` Deployment
- external-dns checks for the presence of newly created Ingress objects it doesn't know about, and creates Route 53 entries for them

This results in the `hello-world` workload going from nothing to fully working in about 5 minutes tops. Usually this can be less depending on how lucky you get with the response time of the Route 53 API. If it doesn't work, run through resources in this order in [k9s](https://k9scli.io/):

- The `external-dns-ingress` Pod logs
- The `cert-manager` Pod logs
- Look for the Cert, is it marked as Ready?
- Look for that Cert's Order, does it show any errors in its list of events?
- Look for that Order's Challenge, does it show any errors in its list of events?

<Conv name="Mara" mood="hacker">
  By the way: k9s is fantastic. You should have it installed if you deal with
  Kubernetes. It should be baked into kubectl. It's a near perfect tool.
</Conv>

## Conclusion

From here you can deploy anything else you want, as long as the workload configuration kinda looks like the `hello-world` configuration. Namely, you MUST have the following things set:

- Ingress objects MUST have the `cert-manager.io/cluster-issuer: "letsencrypt-prod"` annotation, if they don't, then no TLS certificate will be minted
- Workloads MUST have the `nginx.ingress.kubernetes.io/ssl-redirect: "true"` to ensure that all plain HTTP traffic is upgraded to HTTPS
- Sensitive data MUST be managed in 1Password via OnePasswordItem objects

<Conv name="Cadey" mood="enby">
  If you work at a cloud provider that offers managed Kubernetes, I'm looking
  for a new place to put my website, sponsorship would be greatly appreciated!
</Conv>

Happy kubeing all!