Kubernetes
Core components:
- master nodes
- etcd - distributed, reliable key-value store that is simple, secure and fast
- kube-apiserver - only this component communicates with etcd
- kube-scheduler
- controller-manager
- Node Controller
- Replication Controller
- worker nodes
- kublet
- kube-proxy
- container runtime engine: Docker, rkt
View demo here
To play with Docker:
-
run etcd as container:
ETCD_DOCKER_IMAGE='quay.io/coreos/etcd:v3.3.25@sha256:ff9226afaecbe1683f797f84326d1494092ac41d688b8d68b69f7a6462d51dc9' docker run -it --name=etcd --rm "${ETCD_DOCKER_IMAGE}" -
run commands:
docker exec -e ETCDCTL_API=3 -it etcd etcdctl put name czeraszdocker exec -e ETCDCTL_API=3 -it etcd etcdctl get name
List all keys:
ETCDCTL_API=3 etcdctl get / --prefix --keys-only
To talk to the etcd on the master do NOT forget to specify the certificates:
kubectl -n kube-system exec etcd-master -- sh -c "ETCDCTL_API=3 etcdctl get / --prefix --keys-only --limit=10 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key"
NOTE
etcd is distributed as binary or can be run as Docker container (kubeadm)
Almost all resources consists of:
Get resource:
kubectl get configmap
Get by label:
kubectl get configmap -l tier=frontend
kubectl get pods -l env=production,tier=frontend
kubectl get pods -l 'env in (production, qa),tier notin (frontend)'
Get pods and display their labels:
kubectl get pods --show-labels
Get pods and display specific label:
kubectl get pods -L version
Get pods sorted by created timestamp:
kubectl get pods --sort-by=.metadata.creationTimestamp
Get pods and show specific columns:
kubectl get pods -o=custom-columns="NAME:.metadata.name, STATUS:.status.containerStatuses[].state"
Describe resource:
kubectl describe configmap app
kubectl describe configmap app -o yaml
Edit resource:
kubectl edit configmap app
Get nodes:
kubectl get node -o wide
Get cluster metrics:
kubectl top node
kubectl top pod
NOTE
For this commands to work the
metrics-serverneeds to be installed.
The simplest way to create a pod YAML definition file:
kubectl run nginx --image=nginx --dry-run -o yaml > pod.yaml
Create a pod YAML file:
kubectl run --generator=run-pod/v1 nginx --image=nginx --dry-run -o yaml > pod.yaml
Create a deployment YAML file:
kubectl create deployment --image=nginx nginx --dry-run -o yaml > deployment.yaml
Get endpoints:
kubectl get endpoints
Delete pod with force:
kubectl delete pod app --grace-period=0 --force
NOTE
The related containers could still run on the worker instance.
Create deployment and then scale it imperatively:
kubectl create deployment app --image=nginx
kubectl scale deployment/app --replicas=2
Create a pod and expose it via service:
$ kubectl run httpd --image=httpd:alpine --port=80 --expose
service/httpd created
pod/httpd created
Create a service called app to expose the web application within the cluster on port 8080:
kubectl expose pod web --port=8080 --name app
Check previous logs of the crashed pod:
kubectl logs app --previous
Wait for deployment to be available:
while ! kubectl wait --for=condition=available --timeout=600s deployment/kiali -n istio-system; do sleep 1; done
Clients always talk to kube-apiserver. For the communication one can use:
-
kubectl -
curlcurl
Scheduler uses the following algorythm to decide where pods should be placed:
-
fiter nodes:
- filter out nodes which don’t meet the resource requirements - have to less available CPU or memory
- filter nodes based on taints and tolerations
- filter nodes based on affinity rules
-
rank nodes: rank is number between 0-10 and
If a node has more available CPU or memory, then it has a higher rank.
Get Kubernetes events:
kubectl get events
The path of the directory holding static pod definition files can be found with:
$ grep staticPodPath /var/lib/kubelet/config.yaml
staticPodPath: /etc/kubernetes/manifests
pod-bind-definition.yml:
Simple node assignment:
NOTE
Pods which tolerate a taint can be scheduled on a different node, which is NOT tainted at all.
Set taint on a node:
kubectl taint nodes node1 key=value:taint-effect
The taint effect can be one of:
NoSchedulePreferNoScheduleNoExecute- will apply to existing pods. Existing pods will be evicted if they don’t tolerate the taint
Example:
kubectl taint nodes node1 tier=backend:NoSchedule
Add toleration to a pod:
View master nodes taint:
$ kubectl describe node master | grep -i taint
Taints: node-role.kubernetes.io/master:NoSchedule
An empty key with operator Exists matches all keys, values and effects which means this will tolerate everything:
|
|
Resource:
- stackoverflow.com: Running a daemonset on all nodes of a kubernetes cluster
- Taints and tolerations, pod and node affinities demystified
Add test-node-affinity=test label to node:
kubectl label nodes node1 test-node-affinity=test
Specify nodeAffinity in a DaemonSet:
|
|
Types of node affinity
preferredDuringSchedulingIgnoredDuringExecutionrequiredDuringSchedulingIgnoredDuringExecution
Put pods close to other pods:
|
|
Add disktype=ssd label to node:
kubectl label nodes node1 disktype=ssd
Use nodeSelector in the pod definition:
Authentication - who can access?
-
Service Accounts - for machines
-
Identity Services - External Authentication Providers
-
Certificates
-
Static password file - simple file with passwords
Use the following flag when launching the kube-apiserver:
--basic-auth-file=user-pass.csvuser-pass.csvlooks like:or
Communicating via curl:
curl -k https://master-node-ip:6443/api/v1/pods -u "user:password" -
Static token file - simple file with tokens the same just with
--token-auth-file
Authorisation - what can they do?
- RBAC Authorisation
- ABAC Authorisation
- Node Authorisation
- Webhook Mode
All communication between the Kube API Server is encrypted using TLS.
Clients who access the cluster:
- Administrators
- Developers
- End Users - access managed by the applications
- Bots - Service Accounts
Private Key is related to a Public Key (public lock).
Simple Analogy
The public lock (kłódka) can be only unlocked with the specific private key
Both the private and public keys can encrypt data.
For the core group leave the apiGroup blank:
Test user permissions:
kubectl cani create pods --as dev-user
View resources:
kubectl api-resources --namespaced=true- can be used withRolesandClusterRoles(across allnamespaces)kubectl api-resources --namespaced=false- can be used withClusterRolesonly
Imperative way:
kubectl create configmap my-app --from-literal=APP_ENV=dev --from-literal=APP_PORT=8080
kubectl create configmap my-app --from-file=./app.properties
Declarative way:
app.configmap.yml:
kubectl apply -f app.configmap.yml
Pod integration:
|
|
Imperative way:
kubectl create secret generic my-app --from-literal=DB_PASSWORD=passwd --from-literal=API_TOKEN=token
Declarative way:
app.secret.yml:
kubectl apply -f app.secret.yml
Pod integration:
|
|
Volume contains the following files:
$ ls -l /opt/app-secret-volume
-rw-rw-r-- 1 root root 253 Sep 14 11:05 DB_PASSWORD
-rw-rw-r-- 1 root root 253 Sep 14 11:05 API_TOKEN
Multicontainer patterns:
- side car - uses a helper container to assist the primary container:
- logging agents
- file syncing
- watchers
- ambassador - proxy container used to communicate from and to the primary container
- commonly used to communicate with the DB
- adapter - present a standardized interface across multiple pods:
- commonly used for narmalizing output logs and monitoring data
Each init container is run one at a time in sequential order.
If any of the initContainers fail to complete, Kubernetes restarts the Pod repeatedly until the init container succeeds.
|
|
View rollout status:
kubectl rollout status deployment/myapp-deployment
View rollout history and revision:
kubectl rollout history deployment/myapp-deployment
Deployment strategies:
-
rolling-update
strategy: rollingUpdate: maxSurge: 25% # how many pods we can add at a time maxUnavailable: 25% # how many pods can be unavailable during the rolling update type: RollingUpdate -
recreate
Resources:
Empty volume:
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
volumes:
- name: logs
emptyDir: {}
containers:
- image: busybox
command: ["/bin/sh"]
args: ["-c", "while true; do echo 'Hi I am from Main container' >> /var/log/index.html; sleep 5;done"]
name: app
volumeMounts:
- name: logs
mountPath: /var/log
If a node is down for more than 5 minutes (default --pod-eviction-timeout on kube-controller-manager) the pods are terminated from that node.
Remove pods from node:
kubectl drain node01
The drained node is cordoned (marked as unschedulable).
Uncordon the node:
kubectl uncordon node01
Cordon the node (mark node as unschedulable):
kubectl cordon node01
See specific node Kubernetes version:
kubectl get nodes
Components which follow the same Kubernetes versioning:
kube-apiserver- version Xcontroller-manager- can be at version [X-1; X]kube-scheduler- can be at version [X-1; X]kublet- can be at version [X-2; X]kube-proxy- can be at version [X-2; X]kubectl- can be at version [X-1; X+1]
Components which do NOT follow the Kubernetes versioning:
etcd- CoreDNS
Kubernetes supports up to 3 minor versions.
The recommended way is to update 1 minor version at the time.
Firt update master nodes, then the worker nodes.
Update master nodes:
-
update
kubeadm -
update the cluster
kubeadm upgrade apply v1.12.0 -
kubeadmdoes not update thekubletkubectl get nodeswill still show the old version. -
update
kubletmanually:apt-get upgrade -y kublet=1.12.0-00Restart
kubletservicesystemctl restart kublet
Update worker nodes:
Resources:
-
Backup all Kubernetes resources:
kubectl get all --all-namespaces -o yaml > all-resources.ymlOne can also use velero to backup and migrate Kubernetes resources and persistent volumes.
-
snapshot
etcd:ETCDCTL_API=3 etcdctl snapshot save ./snapshot.dbor take a volume snapshot of
/var/lib/etcd(default for--data-dirofetcd).NOTE
View snapshot status with:
ETCDCTL_API=3 etcdctl snapshot status ./snapshot.dbView demo here
| Component | Port |
|---|---|
| etcd | 2379 |
| etcd client | 2380 |
| kube-api | 6443 |
| kublet | 10250 |
| kube-scheduler | 10251 |
| kube-controller-manager | 10252 |
| exposed services | 30000-32767 |
Get the app service in test namespace through following domains:
appapp.testapp.test.svcapp.test.svc.cluster.local- FCDN (Fully Qualified Domain Name)
NOTE
test,test.svc,test.svc.cluster.localare defined assearchin each pods/etc/resolv.conf
Get the pod with IP 10.244.10.5 in test namespace through following domains:
10-244-10-510-244-10-5.test10-244-10-5.test.pod10-244-10-5.test.pod.cluster.local- FCDN (Fully Qualified Domain Name)
Each pod has a mounted /etc/resolv.conf file which points to the DNS server:
$ cat /etc/resolv.conf
...
nameserver 10.80.0.6
The IP specified for the nameserver is a cluster IP defined for the kube-dns service:
kubectl -n kube-system get svc kube-dns
The DNS server is deployed within the cluster. Kubernetes usually comes with a CoreDNS deployed as a ReplicaSet.
The /etc/coredns/Corefile configuration file is stored within a ConfigMap:
kubectl -n kube-system get configmap coredns
Ingress is an layer 7 load balancer build in into Kubernetes.
Create PodDisruptionBudget with minimum 2 pods available:
kubectl create pdb pdb-name --min-available 2 --selector app=test
Create PodDisruptionBudget with minimum 50% of pods available:
kubectl create pdb pdb-name --min-available 50% --selector app=test
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: pdb-name
spec:
minAvailable: 50%
selector:
matchLabels:
app: test
NOTE
Pod Disruption Budget can’t be edited. One can only recreate - delete and create again
Create resource definition file:
cat <<EOF > kube-dashboard-access.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: czerasz-dashboard-access
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: czerasz-dashboard-access
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: czerasz-dashboard-access
namespace: kube-system
EOF
Create ServiceAccount and ClusterRoleBinding:
$ kubectl apply -f kube-dashboard-access.yaml
serviceaccount/czerasz-dashboard-access created
clusterrolebinding.rbac.authorization.k8s.io/czerasz-dashboard-access created
View the token:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep czerasz-dashboard-access | awk ‘{print $1}’)
Expose the proxy locally:
kubectl proxy --address=0.0.0.0
Visit http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ and authenticate with the token.
Find more in the documentation.
K8s Monitor Pod CPU and memory usage with Prometheus
$ ls -1 /var/log/containers/
czerasz-fluentbit-debugging_monitoring_app-879869c3c864e0ebaa519f9d5db10a93a8f29a965c7446d8cae48c0adc88d2ea.log
fluentbit-fluent-bit-6blmm_monitoring_fluent-bit-696c6ef4298cfb80ad37a72a765049e62647b44ff050d52bc59e10d51346e88e.log
vault-2_vault_vault-317f59eadbfb60fa9aa2fe0860082c5238eaf6529b7aa206b8e5a01cab40f55d.log
kubectl -n atlantis run --generator=run-pod/v1 awscli --rm -it \
--image amazon/aws-cli:latest \
--env=http_proxy=http://proxy.example.com:8888 \
--env=https_proxy=http://proxy.example.com:8888 \
--env=no_proxy=10.10.0.1/16,localhost,127.0.0.1,169.254.169.254,.internal,.s3.eu-central-1.amazonaws.com \
--serviceaccount=atlantis \
--command bash
Make sure the ServiceAccount has the required annotation:
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::XXXXXXXXXXXX:role/oidc-atlantis
...
Resources: