Kubernetes DaemonSet Not Running on All Nodes

Kubernetes DaemonSets are a powerful resource type that ensure a copy of a Pod runs on all selected nodes within a cluster. They are commonly used for tasks like monitoring agents, log collectors, and network plugins. However, it’s not uncommon to encounter situations where a DaemonSet is not running on all nodes as expected. This blog post will delve into the core concepts, provide typical usage examples, discuss common practices, and offer best practices to help intermediate-to-advanced software engineers understand and troubleshoot this issue.

Table of Contents

  1. Core Concepts
  2. Typical Usage Example
  3. Common Reasons for DaemonSet Not Running on All Nodes
  4. Troubleshooting and Common Practices
  5. Best Practices
  6. Conclusion
  7. References

Core Concepts

DaemonSet Basics

A DaemonSet is a Kubernetes resource that ensures a Pod is running on all nodes that match a specified set of labels. When a new node is added to the cluster, the DaemonSet controller automatically schedules a Pod on that node. Similarly, when a node is removed, the associated Pod is deleted.

Node Selection

DaemonSets use node selectors or node affinity to determine which nodes should run the Pods. Node selectors are simple key-value pairs that match labels on nodes. Node affinity provides more flexibility, allowing for more complex matching rules.

Tolerations

Tolerations are used to allow Pods to be scheduled on nodes with taints. Taints are a way to mark nodes so that certain Pods cannot be scheduled on them. By adding tolerations to a DaemonSet, you can ensure that the Pods are scheduled on nodes with specific taints.

Typical Usage Example

Let’s consider a scenario where you want to run a log collector on all nodes in your Kubernetes cluster. Here is an example DaemonSet YAML file:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector-daemonset
spec:
  selector:
    matchLabels:
      name: log-collector
  template:
    metadata:
      labels:
        name: log-collector
    spec:
      containers:
      - name: log-collector-container
        image: log-collector-image:latest
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log

In this example, the DaemonSet named log-collector-daemonset ensures that a Pod with the log-collector label is running on all nodes in the cluster. The Pod mounts the host’s /var/log directory to collect logs.

Common Reasons for DaemonSet Not Running on All Nodes

Node Selector and Affinity Mismatch

If the node selector or affinity rules defined in the DaemonSet do not match the labels on some nodes, the Pods will not be scheduled on those nodes. For example, if your DaemonSet has a node selector env: production, but some nodes do not have the env=production label, the Pods will not run on those nodes.

Taints and Tolerations

Nodes with taints can prevent DaemonSet Pods from being scheduled if the DaemonSet does not have the appropriate tolerations. For instance, if a node has a taint node-role.kubernetes.io/master:NoSchedule, and the DaemonSet does not have a toleration for this taint, the Pods will not run on the master node.

Resource Constraints

If a node does not have enough resources (CPU, memory, etc.) to run the DaemonSet Pod, the Pod will not be scheduled on that node. Kubernetes will try to schedule the Pod on other nodes with sufficient resources.

Pod Unschedulable Conditions

There could be other unschedulable conditions on the node, such as network issues, disk pressure, or node readiness problems. These conditions can prevent the DaemonSet Pods from being scheduled on the affected nodes.

Troubleshooting and Common Practices

Check Node Labels and Taints

Use the following commands to check the labels and taints on nodes:

kubectl get nodes --show-labels
kubectl describe node <node-name> | grep Taints

If the node labels or taints do not match the DaemonSet’s requirements, you can either update the node labels or add tolerations to the DaemonSet.

Check Resource Utilization

Use the following command to check the resource utilization of nodes:

kubectl top nodes

If a node is running out of resources, you can either scale the node or reduce the resource requests of the DaemonSet Pods.

Check Pod Events

Use the following command to check the events of the DaemonSet Pods:

kubectl describe pod <pod-name>

The events can provide valuable information about why a Pod is not being scheduled or is in a failed state.

Best Practices

Use Flexible Node Selection

Instead of using strict node selectors, consider using node affinity with more flexible rules. This allows you to have more control over which nodes the DaemonSet Pods are scheduled on.

Add Appropriate Tolerations

If your cluster has nodes with taints, make sure to add the appropriate tolerations to the DaemonSet. This ensures that the Pods are scheduled on all nodes, including those with taints.

Monitor Resource Utilization

Regularly monitor the resource utilization of your nodes to ensure that there are enough resources available for the DaemonSet Pods. You can use tools like Prometheus and Grafana for monitoring.

Test DaemonSet Deployments

Before deploying a DaemonSet to a production cluster, test it in a staging or development environment. This helps you identify and fix any issues before they affect the production environment.

Conclusion

In conclusion, a Kubernetes DaemonSet not running on all nodes can be caused by various factors, including node selector and affinity mismatch, taints and tolerations, resource constraints, and pod unschedulable conditions. By understanding the core concepts, following common practices, and implementing best practices, you can effectively troubleshoot and ensure that your DaemonSet Pods are running on all desired nodes in your Kubernetes cluster.

References

This blog post provides a comprehensive overview of the issue of Kubernetes DaemonSet not running on all nodes. By following the guidelines and best practices outlined here, you can effectively manage and troubleshoot your DaemonSets in a Kubernetes cluster.