profiles/cluster/common: Fix race condition in enable/disable-linger
Description of changes
Fixes synchronization issue where disable-linger could be called before all of a user's jobs complete on a node, removing /run
, potentially causing havoc.
Note that calling disable-linger
may be a bad idea altogether due to the 100% CPU usage in logind problem described in !120 (closed) (which removes calls to it).
!121 (merged) may help provide more information on this issue.
This MR is complemented by !120 (closed) because lingering is a persistent operation: in the event of a forced shutdown, users will remain lingering when the node is rebooted. However, merging the two may be difficult since !120 (closed) removes the slurm epilog in favour of lingering.
Things done
-
Tested -
Updated documentation (Wiki/NetBox) -
Breaking change
Edited by Carlos Jorge Simão Nogueira Vaz