Can't login after openssh 9.8p1-1 upgrade, MUST restart sshd

added priority3-normal severity4-low statusunconfirmed labels

assigned to @freswa, @anthraxx, @toolybird, and @gromit

The old (not restarted) sshd master process forks a sshd child, which no longer works:

[pid 884906] execve("/usr/bin/sshd", ["/usr/bin/sshd", "-D", "-R"], 0x59b6f18de320 /* 8 vars */) = 0

The new sshd master process invokes a new sshd-session handler instead:

[pid 894412] execve("/usr/lib/ssh/sshd-session", ["/usr/lib/ssh/sshd-session", "-D", "-R"], 0x59eb07c6f320 /* 9 vars */ <unfinished ...>

From https://www.openssh.com/releasenotes.html#9.8p1:

 * sshd(8): the server has been split into a listener binary, sshd(8),
   and a per-session binary "sshd-session". This allows for a much
   smaller listener binary, as it no longer needs to support the SSH
   protocol. As part of this work, support for disabling privilege
   separation (which previously required code changes to disable) and
   disabling re-execution of sshd(8) has been removed. Further
   separation of sshd-session into additional, minimal binaries is
   planned for the future.

added scopebug statusconfirmed labels and removed statusunconfirmed label

assigned to @grazzolini, @dvzrv, and @lfleischer and unassigned @freswa, @toolybird, and @gromit

I think it has something to do with this change to the sshd.service file.

I think it's unrelated. The functionality of the sshd server binary has been reduced (see above ChangeLog entry), it needs to be restarted to start spawning new sshd-session handlers instead.

It'd probably be good to add a post_upgrade warning about this.

I guess it's a bit too late for that.

A pkgrel bump with a post_upgrade warning would still be valuable for people that did not upgraded yet

The update was made only a few hours ago...

Yes, but now we need to do damage control. I am also affected by it because it broke my remote server access and now it will be some time consuming work to restart the sshd service somehow.

It also broke ssh remote unlock if I read it correctly so a remote restart is not an option. https://github.com/dracutdevs/dracut/issues/2661

I mean, after you updated the package, it's already too late for this message. Your running sshd service is already broken at this point.

It's a shame really that the update wasn't in testing before rolling it out. In fact this warning probably worth being on the main news page.

If you can somehow remote reboot your server, you should be fine.

The issue is only with old (non restarted) sshd master binary spawning new sshd forks, which are not compatible anymore. Restarting sshd (or rebooting the whole server) will fix the issue.

The dracut issue you linked to is missing the new sshd-session executable in its (limited) boot environment, which is not the use case here.

We can add to the install file a restarting of the daemon, just for a while. I don't really like it, but I guess it is needed. @anthraxx what do you think? I can add it if you can't right now.

I mean, when you update the package, it's already too late for this message. Your running sshd service is already broken at this point.

No, it's not. The shell where you ran pacman still works, this running sshd instance is not affected. You can restart sshd from there.

Only NEW connections won't work until you restart sshd.

It's a shame really that the update wasn't in testing before rolling it out. In fact this warning probably worth being on main news page.

Yes, since openssh is a core package, it should have gone through testing first... @anthraxx?

No, it's not. The shell where you ran pacman still works, this running sshd instance is not affected. You can restart sshd from there.

Only if it was the case... I run it as ssh 'server' pacman -Syu, without starting a dedicated session. That way only runs a single command and then exits. And chances are a lot of other people do that too.

The [testing] first then [core] rule can be bent in case of high profile security issues like this one. I don't think what @anthraxx did was wrong, it would be hard to get 2 approvals in short time. Also, I think a note on post upgrade + restart is better than just note, because a lot of people don't read pacman's output.

But this is a major upgrade (which altered an internal interface between sshd and its children), not just a security update...

The package went through [testing] and received 2 approvals within minutes. archlinux/packaging/state@22b54835

I see, but we also don't want rubber stamping things. Anyway, I don't think this is news worthy, if we add a restart to post ugprade.

We can add to the install file a restarting of the daemon

I think it is a good idea in general. Outdated daemon could break things and I personally tend to restart it after update anyway. That doesn't drop existing sessions.

FWIW, that's what gentoo did: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=b9aab3e

I'm personally not completely sold on that though... This is rather unconventional for Arch. IMO a message should be enough, the fact that users are supposed to read pacman output for such messages and should restart upgraded services is already documented.
Although no strong feeling... Whatever feels better :)

The sshd package upgrade should always do an sshd restart following a successful config test via sshd -t and checking the sshd service is already actually running to prevent starting it where it previously weren't.

It can only ever fix things, it does not disconnect existing sessions, and it ensures the server is up to date and not vulnerable due to hanging previous versions.

This is not the first time Arch has broken sshd this way, something should be done to prevent this, and I think restarting on upgrades is the way to go.

I'll work on it after lunch, but if someone beats me to it, make a MR.

@c0rn3j

The sshd package upgrade should always do an sshd restart following a successful config test via sshd -t and checking the sshd service is already actually running to prevent starting it where it previously weren't.

Should the package upgrade do it or should users do it?

This is not the first time Arch has broken sshd this way

Arch did not break anything, this is not specific to Arch (c.f. the Gentoo link I posted above) and this is a similar situation as the grub one. People are expected to run grub-install [...] after grub updates. Yet some doesn't and ends up with a borked grub installation. This is not Arch breaking grub though.

@antiz

The upgrade should do it.

Arch did break it, post-update without a restart it breaks, and this is not the expected behavior or mentioned anywhere (last time it ended up in the News, eventually). This has happened twice so far to my knowledge.

Arch also has a history of trying to do the restarts automatically when needed, so this wouldn't really be anything new anyway.

https://archlinux.org/news/sshd-needs-restarting-after-upgrading-to-openssh-82p1/

Maybe sshd is an exception due to the remote lockout potential, but I'm not a fan of unconditional/unsupervised service restarts, as it could break other things (eg. due to configuration incompatibilities).

While I always advocate users should read the documentation and follow the distro best practices, I am also a pragmatist, I think that adding a restart on major versions of openssh, like gentoo did, wouldn't hurt.

@ghen

Restarting on configuration incompatibilities is prevented with my suggestion to do sshd -t beforehand, and is how Gentoo does it too, apparently.

@grazzolini Please create an MR if you want to make changes to the package. You haven't packaged too much in the past months, so please give the other maintainers some time to discuss your idea.

Make sure to use systemctl try-restart (not restart), to avoid accidentally starting sshd on client-only machines.

@freswa I can make a MR sure, but I fail to see packaging activity relation's with this. I suggest keeping things on topic.

@c0rn3j

Arch did break it, post-update without a restart it breaks, this has happened twice so far to my knowledge.

I suppose Arch's guiltiness is a matter of interpretation (given that the restart is required because of an upstream change) but this debate is not really useful regarding our common goal to solve the issue so nevermind ^^

Arch also has a history of trying to do the restarts automatically when needed, so this wouldn't really be anything new anyway.

https://archlinux.org/news/sshd-needs-restarting-after-upgrading-to-openssh-82p1/

Thanks for the relevant info! I missed that somehow at the time (or forgot about it), I guess using the same mechanism that was used for the 8.2p1 upgrade seems reasonable then.

We can do a post_upgrade that only does a restart if:

the version is 9.8 and only 9.8, next major release won't have automatic restart
we check with sshd -t
we use try-restart

I think this would satisfy expectations of arch not restarting things on whim. As for a news entry, if we do it fast, I don't see the need for one.

I posted a note on the forum anyway, but that won't have as much visibility.

@grazzolini Sounds good to me (for what it's worth). As for the news entry, I'd say it doesn't hurt (at least to make sure people that already got hit by the issue know why and how to fix it). I don't mind writing it.

@antiz Sure, don't forget the protocol to send it to staff@ and mention it's urgent. I'll work on the MR in the next hour or so.

Generally, I'd be happy about a news item first of all. After that we can work on improving the package as needed and without time pressure.

That said: I'm not a fan of a rushed addition that potentially breaks further/ other things. If we indeed want to restart a central, running service such as sshd, we need to get it right.

I am currently fairly afk until tomorrow and won't be able to assist with this much though, but if I can, I will review a MR.

@dvzrv I submitted a news draft to ADP, based on the one sent for openssh-8.2p1 back in the day. I'm fine posting it right now, but in that case, I would prefer not mentioning the future restart fix until it is validated and released.

@dvzrv I agree with you, which is why I'm going to do a MR and not straight merge and package. I don't think we should restart it for future versions, just for this one.

News posted: https://archlinux.org/news/the-sshd-service-needs-to-be-restarted-after-upgrading-to-openssh-98p1/

Can we still add a post_upgrade warning to the package itself as well, while debating/designing the automatic restart?

IMHO, I think it is best if the post_upgrade simply warns the user that they must restart the daemon if they want OpenSSH to continue working. I am not in favor of restarting the daemon automatically because I understand that it is something that the user does not expect to happen in an upgrade since it has never happened before.

IMHO, I think it is best if the post_upgrade simply warns the user that they must restart the daemon if they want OpenSSH to continue working

With the sheer amount of packages that get upgraded (especially if you have any Haskell things on the system...), it is so easy to miss post_install/post_upgrade messages that I don't think they're suitable for messages of any importance. (Though on the other hand, they let you blame the user for being so dumb as to miss one super-important gray line of text amidst 100 other gray lines of text, so who's to say that they're bad)

A post_upgrade warning is still better than nothing, to be sure – but I don't like it continuously being the main method of such warnings; I've always wished Arch had some kind of post-transaction hook that would gather up such messages from all packages and display them at the end. Maybe even in a way that lets you re-read them later in case you missed some, without having to grep /var/lib/pacman; wouldn't that be nice.

Maybe in the future, package post_upgrades could do a systemctl set-property Markers=needs-restart and then a hook at the end of transaction could warn in bold that you need to systemctl reload-or-restart --marked.

I am not in favor of restarting the daemon automatically because I understand that it is something that the user does not expect to happen in an upgrade since it has never happened before.

It has – Arch currently restarts cron and atd every time glibc is upgraded (because otherwise they fail horribly; it used to be that you'd do a -Syu and then weeks later realize all of your cron jobs had been failing since).

Arch also reexecs systemd (init) after each upgrade, which happens to solve a very similar issue with systemd v255->v256 (with an old systemd-executor failing to load new libraries, and therefore completely failing to start any new service units after upgrade).

Both of those restarts are unnoticed since they don't interrupt anything, and normally that should apply to restarting sshd as well.

Makes me wonder if there could be an opt-in/opt-out system that works a little like systemd presets (which is how each distro defines which units should be enabled by default) – e.g. personally I have custom hooks to restart smbd and php-fpm, even though those restarts are slightly disruptive, they're still less disruptive than not restarting. Debian loves to start/restart everything but has an opt-out via policy-rc.d, though it is a bit annoying to deploy.

This could be debated for "stateless" daemons like cron, systemd, sshd, ... that typically won't require any manual intervention, but what about complex databases like MySQL or Postgres, where (major) upgrades often require other (manual) steps to upgrade data or config first?

As far as I know they don't need to be urgently restarted after an upgrade, so they wouldn't be part of this discussion in the first place.

Indeed, those are not critical for normal OS operation, so that may be a good demarcator for what to auto-restart (systemd, sshd, cron) and what not (apache, mysql, ...)

Right, though my intention was more to distinguish between "old process continues to work fine after upgrade" and "old process stops working after upgrade".

(E.g. in my case, smbd fails to accept new connections after upgrade because it can't load vfs modules anymore, so I would opt in to auto-restarting it regardless of it interrupting connections. Apache is in the middle, it'll continue working until e.g. certbot tries to reload it without restarting, which will fail.)

If there's a possibility that the current running sshd master binary is vulnerable, and you applied update to it without restarting, will it be still vulnerable without a restart? If so, the user or update procedure must restart it?

I don't think this security issue should enforce Arch ecosystem to restart this service against its all principles, as I don't think anyone expects Arch to restart its services after an upgrade.

Unless there are already packages which gets restarted after upgrading which I'm not aware of, is there any?

I don't think this security issue

AFAIU, the restart isn't because of a security issue – it's about an incompatibility issue that leads to sshd no longer working at all until restarted.

Unless there are already packages which gets restarted after upgrading which I'm not aware of, is there any?

It restarts cronie, atd, and systemd (already mentioned in another subthread here), the former two for the same reason that every libc upgrade makes them stop working until restarted; the latter for other reasons but incidentally starting with v256 it solves the same kind of incompatibility problem as well.

A few more have their configs reloaded without restart (systemd again, also udev, dbus-daemon).

For follow-up: I've opened https://bugzilla.mindrot.org/show_bug.cgi?id=3706 upstream to try to avoid this reproduces in the future.

(at first tried a few things to make the new binary work with the old sshd running, but that doesn't appear to be likely to work -- if someone has ideas any answer is welcome)

@ghen Given a new entry was added, and a larger discussion about restarting the daemon on ugprades is needed, I think we can close this issue. I started working on a post_upgrade script to restart just for the 9.8 version, but I think that if it is something we will need a bigger discussion, it kind of defeats the purpose.

closed

Can't login after openssh 9.8p1-1 upgrade, MUST restart sshd

Description:

Additional info:

Steps to reproduce:

Child items ...

Activity

Admin message

Can't login after openssh 9.8p1-1 upgrade, MUST restart sshd

Description:

Additional info:

Steps to reproduce:

Activity