profiles/cluster_server: Re-add nodes to cluster after reboot
Description of changes
Slurm kicks out cluster nodes when they reboot "unexpectedly" (without using scontrol reboot
or similar), which includes any regular reboot
invocation, or rebooting from a desktop environment.
I doubt anyone has the patience to keep manually reacting to these events and bring nodes up by hand.
This MR changes Slurm behavior to always bring any node with valid configuration up. However, it may bring up nodes that failed for bad reasons up automatically as well, so in the future it's probably wise to investigate how to undo this change and make all regular reboots go through slurm so that nodes go back up automatically in those instances.
Things done
-
Tested -
Updated documentation (Wiki/NetBox) -
Breaking change