Friday, 17 May 2002
A trick I just discovered if Mac OS X gets completely stuck.
I was trying to switch from a Classic application to Mozilla, the window server crashed. It restarted, as usual, without killing all its children, and so all my Mac OS X applications except ATSServer and TruBlueEnvironment (Classic) died. The screen displayed the login screen background with MacsBug over the top of it, and the color wheel of death™. I couldn't interact with MacsBug, so, as I do about once a day when OS X crashes, I ssh'ed in from the O2 and killed the processes I was still running. Sometimes this works; sometimes not. Next I killed loginwindow and the Window Server, but all that did was give me a plain blue screen with the color wheel. Again, nothing unusual, this happens all the time (not that I don't wish it didn't!)
Next I tried the 'shutdown now' trick I had learned from a Mac OS X Hints article, which should (theoretically) kill all processes and bring the machine to single-user mode on the console. But even that didn't work: the shutdown process kept running, blocked on something or other, so I started killing all nonessential processes (pretty much everything but the pager and lookupd).
Still, the window server kept relaunching itself and dying immediately afterward. What was doing that? The -j flag to ps displays parent PIDs: loginwindow was started by Window Server and Window Server was started by /sbin/init. Since I had nothing else to lose, I killed /sbin/init (its pid is 1). This kicked me off my SSH session, I switched back to OS X, and saw the spinning wheel… yet a few seconds later, I got the 'hostname#' prompt that indicated I was in single-user mode; a quick control-D and a few seconds later, and I was back looking at a now-functional loginwindow. This might be attributable to the still-running shutdown process timing out and deciding to do something more drastic, but I'd prefer to think it was my action that did it.
So my procedure to recover from OS X window server failures is now:
- Kill everything owned by the current user.
- As root, kill the windowserver, ATS server, and any other graphical-type servers that are running.
- shutdown now
- kill 1
If a step doesn't help, try the next one. Depending on the severity of the problem, I've met success at each step.
The other common OS X-crashing annoyance I experience stems from lookupd dying. If the screensaver isn't active, I can usually manage to kill everything and restart gracefully, but if the screensaver is running, I have no choice but to hit the reset button. No lookupd, no password (no username lookup even—I can tell that's the problem when the screensaver unlock panel displays my username instead of my full name).