Skip to content

profiles/cluster/common: Fix race condition in enable/disable-linger

André Breda requested to merge ist189409/nixrnl:flock-linger into master

Description of changes

Fixes synchronization issue where disable-linger could be called before all of a user's jobs complete on a node, removing /run, potentially causing havoc.

Note that calling disable-linger may be a bad idea altogether due to the 100% CPU usage in logind problem described in !120 (closed) (which removes calls to it). !121 (merged) may help provide more information on this issue.

This MR is complemented by !120 (closed) because lingering is a persistent operation: in the event of a forced shutdown, users will remain lingering when the node is rebooted. However, merging the two may be difficult since !120 (closed) removes the slurm epilog in favour of lingering.

Things done

  • Tested
  • Updated documentation (Wiki/NetBox)
  • Breaking change
Edited by Carlos Jorge Simão Nogueira Vaz

Merge request reports

Loading