Ran into a situation today where a colleague saw build’s tests were failing when they were run on the Jenkins slave farm that ran containers, but worked just fine on the Jenkins slave farm that used physical machines.
Check the strace. Nothing useful in there. Check the test traceback. Seems to fail because a C++ String had a NULL byte in it.
After poring over the source, my colleague found that the string being created was attempting to use
the environment variable,
LOGNAME to create a path string. Now why the test tries to do this is
something that I won’t go into, it ideally could’ve used something a lot more reliable such as
getuid(), avoiding the problem entirely.
Now to the interesting bit, why would
$LOGNAME be defined on the physical machines, but not in the
The answer, is… it doesn’t have anything to do with the container or the physical machine itself, but how they are being used.
In the physical machine farm, the way jenkins slaves are run is that master launches slave agent
basically the master SSH’s into the physical machine and fires up the slave agent with something
java -jar slave.jar. The important thing to note is that the master SSH’s into the physical
machine, we’ll come back to why this is important.
In the container farm, the way jenkins slaves are run is that slaves are launched
what this means is that when the container is launched, the first executable that the container runs
is essentially the
slave.jar itself or a wrapper script that calls
slave.jar after some
slave.jar executable then dials home to actually connect to the Jenkins
master, which it knows how to find using certain pre-set environment variables. Note there was no
Now what’s different between the first and the second case? Well the difference lies in the kind
of shell that the slave agent is spawned from. There are different modes a shell (bash in our
particular case), can run in. They are
non-login. Any combination of the four possible combinations of [login,non-login] and
[interactive,non-interactive] is possible, although some combinations are a lot more common than
One of the important aspects of these modes are:
- Login shells will read and load the
profilefamily of files, such as
- Interactive shells will read and load the
rcfamily of files, such as
When you normally SSH into a machine for example, you will get an
interactive login shell.
When you try to execute a command over SSH, with something like
ssh user@host '/usr/bin/binary',
you get a non-interactive non-login shell, BUT SSH sets some additional environment variables of
its own accord. When you start a binary with a script outside a shell (as the init process for
example), it seems to start it in a non-interactive non-login shell, which means
Practically what this means is, the physical machines, with their
slave.jar started via SSH, have
$LOGNAME variable set, because SSH sets that always. This is inherited by the
For the container machines, this variable is not set. Because it is usually set in
with something like
$USER is set by bash for login shells and since the
container runs the
slave.jar in a non-interactive, non-login shell that is not spawned by SSH,
it doesn’t source
/etc/profile and doesn’t set
$LOGNAME on its own.
Not having LOGNAME set caused crypitc build failures.
Moral of the story: Don’t rely on environment variables being set in compiled binaries unless you
explictly state that you need or use them. In this case, the right way would’ve been simply to use
getuid() function instead.