Poor Man's Parallelism
I really like orchestration tools such as Ansible or SaltStack. They can make running tasks on a group of machines a breeze. But sometimes you can’t or don’t want to install these tools on a machine. In cases like these, it is helpful to know how to parallelize some tasks in the shell.
You can do this via Unix/shell job control:
cmd="systemctl enable --now docker.service"
hosts=(host{1..4})
for host in ${hosts[@]}
do
ssh & $host $cmd
done
However from experience, this can be very error prone.
For example, The placement of the & is important so as to background the ssh command and not the command on the remote machine.
Additionally, what if you had a lot of hosts and you didn’t want to run all of them at once.
Instead, you want to utilize a bounded pool of processes.
There are a few ways of doing this: most ways are messy or or fairly non-portable.
On systems with the util-linux installed you might use flock or lockfile, but then you have essentially to implement semaphores using mutex locks and shell arithmetic.
If you don’t have util-linux you can accomplish the same thing taking advantage of the atomicity of mkdir on most (but not all) file systems:
However rather than doing this, take a look at the fairly pervasive xargs command.
According the manpage, xargs “builds and executes command lines from standard input.”
It has an option -P <NUM_PROCS> that takes determines how many commands to run in parallel.
With this, it is just a matter of formatting commands in a way that xargs understands.
cmd="systemctl enable --now docker.service"
hosts=(host{1..4})
numprocs=8
echo ${hosts[@]} | xargs -P $numprocs -d" " -I{} -n 1 ssh {} $cmd
Admittedly this looks a bit cryptic.
It helps to know that -d is setting the delimiter from the default newline to space,
-n <NUM_ARGS> sets the number of arguments to pass to each command, and
-I <REPLACE_STR> is setting the replacement string for xargs so that ssh {} $cmd becomes ssh host1 $cmd for the first command and so on.
The xargs command also accepts an input file option (-a <file>) where we could put each host on a newline to simplify the call.
Now we can easily create process pools in a mostly portable fashion in shell scripts. There are lots of useful things you could do with this, but here are two recipes that I came up with:
#copy a file to many nodes
function pcopy(){
filename=$1
dest=$2
shift 2
echo $* | xargs -d" " -P 8 -I{} -n 1 scp $filename {}:$dest
}
pcopy somefile.txt host{1..8}
#retrieve files from many nodes
function pget(){
filename=$1
shift
echo $* | xargs -d" " -P 8 -I{} -n 1 scp {}:$filename $(basename $filename).{}
}
Happy shell scripting!