Skip to content

profiles/cluster: Avoid calling disable-linger

André Breda requested to merge ist189409/nixrnl:more-linger into master

Description of changes

Previously, we called loginctl disable-linger on the Slurm epilogue (runs on all nodes used by a job when it ends). However this has a synchronization issue when users have multiple jobs running on the same node (lingering may be disabled too soon when the first job completes). Additionally, logind/systemd seems to ocasionally flood D-Bus and get stuck at 100% CPU usage which may be related to the old epilogue.

So, we now avoid calling disable-linger and just defer it to the next reboot.

This has some consequences:

  • /run/user/<uid> stays created until reboot
  • systemd --user stays running until reboot
  • (only relevant if KillUserProcesses is enabled in logind) User processes/services will continue running after logout

The rationale behind this MR is incompatible with !129 (merged), which implements a semaphore with flock to avoid the synchronization issue. However, it is still useful to complement it, removing lingering users persisted before a forced poweroff.

Things done

  • Tested
    • Should be enough to put it on a single node and confirm that 1) it can reboot and 2) users get un-lingered after reboot
  • Updated documentation (Wiki/NetBox)
  • Breaking change
Edited by André Breda

Merge request reports

Loading