How to debug Docker Swarm service using docker experimental feature
Debugging docker swarm services is no easy task.
I was trying to set-up Jenkins as a swarm service with a compose stack deployment. After deploying the stack, jenkins service was running but the containers kept crashing. Since the stack was deployed in swarm service mode, it kept spinning the containers automatically. However, before the container logs could be viewed, the containers exited.
It becomes difficult in such scenarios to debug the underlying cause of the crashing service. We can see the status of the service with ‘docker service ls’ or ‘docker service ps’, but it just gives an overall idea of the state of the running service. In case of such scenarios as above it becomes important to be able to view the detailed debug logs to identify the exact cause of the backend issue.
Luckily docker service logs is a feature introduced in recent docker versions. This gives us a multiplexed output of logs from all the containers spun by the service. This is at present available as an experimental feature.
In below snippet I’m trying to check the logs of my jenkins service and it is not supported in the default daemon configuration.
$docker service logs t7w only supported with experimental daemon
Enable Docker Daemon experimental features
Edit the daemon configuration as below to add the experimental features option. If the file ‘/etc/docker/daemon.json’ is not available, you can create it.
root@ip-172-31-22-115:/etc/docker# cat daemon.json { "experimental": true }
Restart the docker service.
root@ip-172-31-22-115:/etc/docker# docker version -f '{{.Server.Experimental}}' true
Debug the docker service logs
You can now check the docker service logs.
Below snippet enlists the service status which shows that my tasks are spawn, shut and failed within seconds. The specific error here “task: non-zero exit (1)” gives no details of the underlying cause. This error could be due to multiple reasons. Clueless!
root@ip-172-31-22-115:~/jenkins# docker service ps t7w ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS tsv42txxhkn9 jenkins_jenkins.1 bitnami/jenkins:latest ip-172-31-22-115 Ready Ready 3 seconds ago gmqgkfsndw50 \_ jenkins_jenkins.1 bitnami/jenkins:latest ip-172-31-22-115 Shutdown Failed 3 seconds ago "task: non-zero exit (1)" oz2ctrxi74qk \_ jenkins_jenkins.1 bitnami/jenkins:latest ip-172-31-22-115 Shutdown Failed 11 seconds ago "task: non-zero exit (1)" pgtoo3eaeqj7 \_ jenkins_jenkins.1 bitnami/jenkins:latest ip-172-31-22-115 Shutdown Failed 17 seconds ago "task: non-zero exit (1)" kheptpifm3v9 \_ jenkins_jenkins.1 bitnami/jenkins:latest ip-172-31-22-115 Shutdown Failed 23 seconds ago "task: non-zero exit (1)"
It’s time to check the logs and identify the root cause of the issue.
Below snippet shows the docker logs of all the tasks of my service that were spun above and failing. Observe the task IDs.
root@ip-172-31-22-115:~/jenkins# docker service logs t7w jenkins_jenkins.1.tsv42txxhkn9@ip-172-31-22-115 | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory jenkins_jenkins.1.oz2ctrxi74qk@ip-172-31-22-115 | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory jenkins_jenkins.1.gmqgkfsndw50@ip-172-31-22-115 | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory jenkins_jenkins.1.pgtoo3eaeqj7@ip-172-31-22-115 | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory jenkins_jenkins.1.c9lw4mx5igzs@ip-172-31-22-115 | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory
I can see above that the containers in my service are not able to find the absolute path in the start-up script defined in my Dockerfile. Once I rectify this the service should be up and running.
As simple as that! This helps us in debugging and identifying the exact root cause of docker swarm service failures.
Hope this helps for quick troubleshooting!