profiles/cluster: Avoid calling disable-linger
Description of changes
Previously, we called loginctl disable-linger
on the Slurm epilogue (runs on all nodes used by a job when it ends).
However this has a synchronization issue when users have multiple jobs running on the same node (lingering may be disabled too soon when the first job completes).
Additionally, logind/systemd seems to ocasionally flood D-Bus and get stuck at 100% CPU usage which may be related to the old epilogue.
So, we now avoid calling disable-linger and just defer it to the next reboot.
This has some consequences:
-
/run/user/<uid>
stays created until reboot -
systemd --user
stays running until reboot - (only relevant if KillUserProcesses is enabled in logind) User processes/services will continue running after logout
The rationale behind this MR is incompatible with !129 (merged), which implements a semaphore with flock
to avoid the synchronization issue. However, it is still useful to complement it, removing lingering users persisted before a forced poweroff.
Things done
-
Tested - Should be enough to put it on a single node and confirm that 1) it can reboot and 2) users get un-lingered after reboot
-
Updated documentation (Wiki/NetBox) -
Breaking change