Skip to main content
czerasz.com: notes
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Kubernetes

Core Concepts

Core components:

  • master nodes
    • etcd - distributed, reliable key-value store that is simple, secure and fast
    • kube-apiserver - only this component communicates with etcd
    • kube-scheduler
    • controller-manager
      • Node Controller
      • Replication Controller
  • worker nodes
    • kublet
    • kube-proxy
    • container runtime engine: Docker, rkt

etcd

View demo here

To play with Docker:

  • run etcd as container:

    ETCD_DOCKER_IMAGE='quay.io/coreos/etcd:v3.3.25@sha256:ff9226afaecbe1683f797f84326d1494092ac41d688b8d68b69f7a6462d51dc9'
    docker run -it --name=etcd --rm "${ETCD_DOCKER_IMAGE}"
    
  • run commands:

    docker exec -e ETCDCTL_API=3 -it etcd etcdctl put name czerasz
    
    docker exec -e ETCDCTL_API=3 -it etcd etcdctl get name
    

List all keys:

ETCDCTL_API=3 etcdctl get / --prefix --keys-only

To talk to the etcd on the master do NOT forget to specify the certificates:

kubectl -n kube-system exec etcd-master -- sh -c "ETCDCTL_API=3 etcdctl get / --prefix --keys-only --limit=10 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt  --key /etc/kubernetes/pki/etcd/server.key"

NOTE

etcd is distributed as binary or can be run as Docker container (kubeadm)

YAML

Almost all resources consists of:

1
2
3
4
5
6
7
8
9
kubectl edit configmap app
apiVersion: v1
kind: Pod
metadata:
  name: app
  labels:
    name: app
spec:
  ...

Kubectl Basic Commands

Get resource:

kubectl get configmap

Get by label:

kubectl get configmap -l tier=frontend
kubectl get pods -l env=production,tier=frontend
kubectl get pods -l 'env in (production, qa),tier notin (frontend)'

Get pods and display their labels:

kubectl get pods --show-labels

Get pods and display specific label:

kubectl get pods -L version

Get pods sorted by created timestamp:

kubectl get pods --sort-by=.metadata.creationTimestamp

Get pods and show specific columns:

kubectl get pods -o=custom-columns="NAME:.metadata.name, STATUS:.status.containerStatuses[].state"

Describe resource:

kubectl describe configmap app
kubectl describe configmap app -o yaml

Edit resource:

kubectl edit configmap app

Useful Commands

Get nodes:

kubectl get node -o wide

Get cluster metrics:

kubectl top node
kubectl top pod

NOTE

For this commands to work the metrics-server needs to be installed.

The simplest way to create a pod YAML definition file:

kubectl run nginx --image=nginx --dry-run -o yaml > pod.yaml

Create a pod YAML file:

kubectl run --generator=run-pod/v1 nginx --image=nginx --dry-run -o yaml > pod.yaml

Create a deployment YAML file:

kubectl create deployment --image=nginx nginx --dry-run -o yaml > deployment.yaml

Get endpoints:

kubectl get endpoints

Delete pod with force:

kubectl delete pod app --grace-period=0 --force

NOTE

The related containers could still run on the worker instance.

Create deployment and then scale it imperatively:

kubectl create deployment app --image=nginx
kubectl scale deployment/app --replicas=2

Create a pod and expose it via service:

$ kubectl run httpd --image=httpd:alpine --port=80 --expose
service/httpd created
pod/httpd created

Create a service called app to expose the web application within the cluster on port 8080:

kubectl expose pod web --port=8080 --name app

Check previous logs of the crashed pod:

kubectl logs app --previous

Wait for deployment to be available:

while ! kubectl wait --for=condition=available --timeout=600s deployment/kiali -n istio-system; do sleep 1; done

Client Communication

Clients always talk to kube-apiserver. For the communication one can use:

  • kubectl

  • curl

    curl 
    

Scheduling

Scheduler uses the following algorythm to decide where pods should be placed:

  • fiter nodes:

    • filter out nodes which don’t meet the resource requirements - have to less available CPU or memory
    • filter nodes based on taints and tolerations
    • filter nodes based on affinity rules
  • rank nodes: rank is number between 0-10 and

    If a node has more available CPU or memory, then it has a higher rank.

Get Kubernetes events:

kubectl get events

Static Pods

The path of the directory holding static pod definition files can be found with:

$ grep staticPodPath /var/lib/kubelet/config.yaml
staticPodPath: /etc/kubernetes/manifests

Manual Pod Scheduling

pod-bind-definition.yml:

1
2
3
4
5
6
7
8
apiVersion: v1
kind: Binding
metadata:
  name: 
target:
  apiVersion: v1
  kind: Node
  name: node01
1
2
3
curl  -XPOST -H 'content-type: application/json' \
  --data $(cat pod-bind-definition.yml | tojson) \
  "https://${SERVER}/api/v1/namespaces/default/pod/${POD}/binding/"

Simple node assignment:

1
2
3
4
5
6
7
8
9
apiVersion: v1
kind: Pod
metadata:
  name: app
  labels:
    name: app
spec:
  nodeName: node01
  ...

Taints

NOTE

Pods which tolerate a taint can be scheduled on a different node, which is NOT tainted at all.

Set taint on a node:

kubectl taint nodes node1 key=value:taint-effect

The taint effect can be one of:

  • NoSchedule
  • PreferNoSchedule
  • NoExecute - will apply to existing pods. Existing pods will be evicted if they don’t tolerate the taint

Example:

kubectl taint nodes node1 tier=backend:NoSchedule

Add toleration to a pod:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
  name: app
  labels:
    name: app
spec:
  containers:
    - name: app
      image: "nginx:latest"
  tolerations:
    - key: tier
      operator: Equal
      value: backend
      effect: NoSchedule

View master nodes taint:

$ kubectl describe node master | grep -i taint
Taints:             node-role.kubernetes.io/master:NoSchedule

An empty key with operator Exists matches all keys, values and effects which means this will tolerate everything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  labels:
    name: filebyte
spec:
  selector:
    matchLabels:
      name: filebeat
  template:
    metadata:
      name: filebeat
      labels:
        name: filebyte
    spec:
      containers:
        - name: filebyte
          image: "filebyte:latest"
      tolerations:
        - operator: "Exists"

Resource:

Node Affinity

Add test-node-affinity=test label to node:

kubectl label nodes node1 test-node-affinity=test

Specify nodeAffinity in a DaemonSet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  labels:
    name: filebyte
spec:
  selector:
    matchLabels:
      name: filebeat
  template:
    metadata:
      name: filebeat
      labels:
        name: filebyte
    spec:
      containers:
        - name: filebyte
          image: "filebyte:latest"
        affinity:
          nodeAffinity:
            # requiredDuringScheduling - the pod must be scheduled to node(s) that match the expressions listed under matchExpressions
            # IgnoredDuringExecution - Node affinity only applies during pod scheduling, it doesn’t apply to already running pods
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: test-node-affinity
                  operator: In
                  values:
                  - test

Types of node affinity

  • preferredDuringSchedulingIgnoredDuringExecution
  • requiredDuringSchedulingIgnoredDuringExecution

Pod Affinity and Anti-Affinity

Put pods close to other pods:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
  ...
  template:
    metadata:
      labels:
        naem: test
    spec:
      affinity:
        podAntiAffinity:
        # or
        # podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: my-label
                operator: In
                values:
                - test
            # topologyKey defines that one should schedule on node basis
            topologyKey: kubernetes.io/hostname

Node Selector

Add disktype=ssd label to node:

kubectl label nodes node1 disktype=ssd

Use nodeSelector in the pod definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: server
    image: nginx
  nodeSelector:
    disktype: ssd

Security

Authentication - who can access?

  • Service Accounts - for machines

  • Identity Services - External Authentication Providers

  • Certificates

  • Static password file - simple file with passwords

    Use the following flag when launching the kube-apiserver:

    --basic-auth-file=user-pass.csv
    

    user-pass.csv looks like:

    1
    2
    3
    
    password1,user1,userID
    password2,user2,u0002
    ...
    

    or

    1
    2
    3
    
    password1,user1,userID,groupID
    password2,user2,u0002,g0002
    ...
    

    Communicating via curl:

    curl -k https://master-node-ip:6443/api/v1/pods -u "user:password"
    
  • Static token file - simple file with tokens the same just with --token-auth-file

Authorisation - what can they do?

  • RBAC Authorisation
  • ABAC Authorisation
  • Node Authorisation
  • Webhook Mode

All communication between the Kube API Server is encrypted using TLS.

Clients who access the cluster:

  • Administrators
  • Developers
  • End Users - access managed by the applications
  • Bots - Service Accounts

TLS Certificates

Private Key is related to a Public Key (public lock).

Simple Analogy

The public lock (kłódka) can be only unlocked with the specific private key

Both the private and public keys can encrypt data.

For the core group leave the apiGroup blank:

1
2
3
4
5
6
kind: Role
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get"]
    resourceNames: ["blue", "orange"]

Test user permissions:

kubectl cani create pods --as dev-user

View resources:

  • kubectl api-resources --namespaced=true - can be used with Roles and ClusterRoles (across all namespaces)
  • kubectl api-resources --namespaced=false - can be used with ClusterRoles only

ConfigMap

Imperative way:

kubectl create configmap my-app --from-literal=APP_ENV=dev --from-literal=APP_PORT=8080
kubectl create configmap my-app --from-file=./app.properties

Declarative way:

app.configmap.yml:

1
2
3
4
5
6
7
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_ENV: dev
  APP_PORT: 8080
kubectl apply -f app.configmap.yml

Pod integration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: v1
kind: Pod
metadata:
  name: app
  labels:
    name: app
spec:
  containers:
    - name: app
      image: "ubuntu:latest"
      envFrom:
        - configMapRef:
            name: app-config
      ...
      env:
        - name: APP_ENV
          valueFrom:
            configMapKeyRef:
              name: app-config
              key: APP_ENV
  ...
  volumes:
    - name: app-config-volume
      configMap:
        name: app-config

Secrets

Imperative way:

kubectl create secret generic my-app --from-literal=DB_PASSWORD=passwd --from-literal=API_TOKEN=token

Declarative way:

app.secret.yml:

1
2
3
4
5
6
7
apiVersion: v1
kind: Secret
metadata:
  name: app-secret
data:
  DB_PASSWORD: passwd
  API_TOKEN: token
kubectl apply -f app.secret.yml

Pod integration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: v1
kind: Pod
metadata:
  name: app
  labels:
    name: app
spec:
  containers:
    - name: app
      image: "ubuntu:latest"
      envFrom:
        - secretRef:
            name: app-secret
      ...
      env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secret
              key: DB_PASSWORD
  ...
  volumes:
    - name: app-secret-volume
      secret:
        secretName: app-secret

Volume contains the following files:

$ ls -l /opt/app-secret-volume
-rw-rw-r-- 1 root root  253 Sep 14 11:05 DB_PASSWORD
-rw-rw-r-- 1 root root  253 Sep 14 11:05 API_TOKEN

Pods

Multicontainer patterns:

  • side car - uses a helper container to assist the primary container:
    • logging agents
    • file syncing
    • watchers
  • ambassador - proxy container used to communicate from and to the primary container
    • commonly used to communicate with the DB
  • adapter - present a standardized interface across multiple pods:
    • commonly used for narmalizing output logs and monitoring data

Init Containers

Each init container is run one at a time in sequential order.

If any of the initContainers fail to complete, Kubernetes restarts the Pod repeatedly until the init container succeeds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'git clone <some-repository-that-will-be-used-by-application> ; done;']
  containers:
  - name: myapp-container
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']

Liveness and Readiness Probes

Deployments

View rollout status:

kubectl rollout status deployment/myapp-deployment

View rollout history and revision:

kubectl rollout history deployment/myapp-deployment

Deployment strategies:

  • rolling-update

    strategy:
      rollingUpdate:
        maxSurge: 25% # how many pods we can add at a time
        maxUnavailable: 25% # how many pods can be unavailable during the rolling update
      type: RollingUpdate
    
  • recreate

Resources:

Volumes

Empty volume:

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  volumes:
  - name: logs
    emptyDir: {}
  containers:
  - image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo 'Hi I am from Main container' >> /var/log/index.html; sleep 5;done"]
    name: app
    volumeMounts:
    - name: logs
      mountPath: /var/log

Cluster Maintenance

If a node is down for more than 5 minutes (default --pod-eviction-timeout on kube-controller-manager) the pods are terminated from that node.

Remove pods from node:

kubectl drain node01

The drained node is cordoned (marked as unschedulable).

Uncordon the node:

kubectl uncordon node01

Cordon the node (mark node as unschedulable):

kubectl cordon node01

See specific node Kubernetes version:

kubectl get nodes

Components which follow the same Kubernetes versioning:

  • kube-apiserver - version X
  • controller-manager - can be at version [X-1; X]
  • kube-scheduler - can be at version [X-1; X]
  • kublet - can be at version [X-2; X]
  • kube-proxy - can be at version [X-2; X]
  • kubectl - can be at version [X-1; X+1]

Components which do NOT follow the Kubernetes versioning:

Kubernetes supports up to 3 minor versions. The recommended way is to update 1 minor version at the time. Firt update master nodes, then the worker nodes.

Update via kubeadm

Update master nodes:

  • update kubeadm

  • update the cluster

    kubeadm upgrade apply v1.12.0
    
  • kubeadm does not update the kublet

    kubectl get nodes will still show the old version.

  • update kublet manually:

    apt-get upgrade -y kublet=1.12.0-00
    

    Restart kublet service

    systemctl restart kublet
    

Update worker nodes:

Resources:

Backup

  • Backup all Kubernetes resources:

    kubectl get all --all-namespaces -o yaml > all-resources.yml
    

    One can also use velero to backup and migrate Kubernetes resources and persistent volumes.

  • snapshot etcd:

    ETCDCTL_API=3 etcdctl snapshot save ./snapshot.db
    

    or take a volume snapshot of /var/lib/etcd (default for --data-dir of etcd).

    NOTE

    View snapshot status with:

    ETCDCTL_API=3 etcdctl snapshot status ./snapshot.db
    

    View demo here

Ports

Component Port
etcd 2379
etcd client 2380
kube-api 6443
kublet 10250
kube-scheduler 10251
kube-controller-manager 10252
exposed services 30000-32767

Kubernetes: required ports

DNS

Get the app service in test namespace through following domains:

  • app
  • app.test
  • app.test.svc
  • app.test.svc.cluster.local - FCDN (Fully Qualified Domain Name)

NOTE

test, test.svc, test.svc.cluster.local are defined as search in each pods /etc/resolv.conf

Get the pod with IP 10.244.10.5 in test namespace through following domains:

  • 10-244-10-5
  • 10-244-10-5.test
  • 10-244-10-5.test.pod
  • 10-244-10-5.test.pod.cluster.local - FCDN (Fully Qualified Domain Name)

Each pod has a mounted /etc/resolv.conf file which points to the DNS server:

$ cat /etc/resolv.conf
...
nameserver 10.80.0.6

The IP specified for the nameserver is a cluster IP defined for the kube-dns service:

kubectl -n kube-system get svc kube-dns

The DNS server is deployed within the cluster. Kubernetes usually comes with a CoreDNS deployed as a ReplicaSet.

The /etc/coredns/Corefile configuration file is stored within a ConfigMap:

kubectl -n kube-system get configmap coredns

Ingress

Ingress is an layer 7 load balancer build in into Kubernetes.

Environment Variables

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
...
env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: POD_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace

Pod Disruption Budget

Create PodDisruptionBudget with minimum 2 pods available:

kubectl create pdb pdb-name --min-available 2 --selector app=test

Create PodDisruptionBudget with minimum 50% of pods available:

kubectl create pdb pdb-name --min-available 50% --selector app=test
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: pdb-name
spec:
  minAvailable: 50%
  selector:
    matchLabels:
      app: test

NOTE

Pod Disruption Budget can’t be edited. One can only recreate - delete and create again

Kubernetes Dashboard

Create resource definition file:

cat <<EOF > kube-dashboard-access.yaml
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: czerasz-dashboard-access
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: czerasz-dashboard-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: czerasz-dashboard-access
  namespace: kube-system
EOF

Create ServiceAccount and ClusterRoleBinding:

$ kubectl apply -f kube-dashboard-access.yaml
serviceaccount/czerasz-dashboard-access created
clusterrolebinding.rbac.authorization.k8s.io/czerasz-dashboard-access created

View the token:

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep czerasz-dashboard-access | awk ‘{print $1}’)

Expose the proxy locally:

kubectl proxy --address=0.0.0.0

Visit http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ and authenticate with the token.

Find more in the documentation.

Prometheus Queries

K8s Monitor Pod CPU and memory usage with Prometheus

Pod/Container Logs

$ ls -1 /var/log/containers/
czerasz-fluentbit-debugging_monitoring_app-879869c3c864e0ebaa519f9d5db10a93a8f29a965c7446d8cae48c0adc88d2ea.log
fluentbit-fluent-bit-6blmm_monitoring_fluent-bit-696c6ef4298cfb80ad37a72a765049e62647b44ff050d52bc59e10d51346e88e.log
vault-2_vault_vault-317f59eadbfb60fa9aa2fe0860082c5238eaf6529b7aa206b8e5a01cab40f55d.log

Debug EKS access to AWS via Service Account

kubectl -n atlantis run --generator=run-pod/v1 awscli --rm -it \
 --image amazon/aws-cli:latest \
 --env=http_proxy=http://proxy.example.com:8888 \
 --env=https_proxy=http://proxy.example.com:8888 \
 --env=no_proxy=10.10.0.1/16,localhost,127.0.0.1,169.254.169.254,.internal,.s3.eu-central-1.amazonaws.com \
 --serviceaccount=atlantis  \
 --command bash

Make sure the ServiceAccount has the required annotation:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/oidc-atlantis
...

Resources: