People like Joe Armstrong have done really good work with the sharednothing approach. You see that a lot in custom systems in browser implementations. Chrome is big on it. We do it our own way in our JavaScript implementation. And shared nothing is not even interesting to academics, I think. Transactional memory is more interesting, especially with the sort of computer-architecture types because they can figure out ways to make good instructions and hardware support for it. But it’s not going to solve all the problems we face.
I think there will be progress and it should involve programming languages. That’s why I do think the talk about the second golden age isn’t wrong. It’s just that we haven’t connected the users of the languages with the would-be developers with the academics who might research a really breakthrough language.
Seibeclass="underline" You got a Masters but not a PhD. Would you generally recommend that people who want to be programmers should go get a PhD in computer science? Or should only certain kinds of people do that?
Eich: I think only certain kind of people. It takes certain skills to do a PhD, and sometimes you wonder if it’s ultimately given just because you endured. But then you get the three letters to put after your name if you want to. And that helps you open certain doors. But my experience in the Valley in this inflationist boom of 20 years or so that we’ve been living through—though that may be coming to an end—was certainly it wasn’t a good economic trade-off. So I don’t have regrets about that.
The ability to go study something in a systematic, and maybe even leisurely, way is attractive. The go-to-market, ride Moore’s law, and compete and deal with fast product cycles and sometimes throwaway software—seems like a shame if that’s all everybody does. So there’s a role for people who want to get PhDs, who have the skills for it. And there is interesting research to do. One of the things that we’re pushing at Mozilla is in between what’s respected in academic research circles and what’s already practice in the industry. That’s compilers and VM stuff, debuggers even—things like Valgrind—profiling tools. Underinvested-in and not sexy for researchers, maybe not novel enough, too much engineering, but there’s room for breakthroughs. We’re working with Andreas Gal and he gets these papers rejected because they’re too practical.
Of course, we need researchers who are inclined that way, but we also need programmers who do research. We need to have the programming discipline not be just this sort of blue-collar thing that’s cut off from the people in the ivory towers.
Seibeclass="underline" How do you feel about proofs?
Eich: Proofs are hard. Most people are lazy. Larry Wall is right. Laziness should be a virtue. So that’s why I prefer automation. Proofs are something that academics love and most programmers hate. Writing assertions can be useful. In spite of bad assertions that should’ve been warnings, we’ve had more good assertions over time in Mozilla. From that we’ve had some illumination on what the invariants are that you’d like to express in some dream type system.
I think thinking about assertions as proof points helps. But not requiring anything that pretends to be a complete proof—there are enough proofs that are published in academic papers that are full of holes.
Seibeclass="underline" On a completely different topic, what’s the worst bug you ever had to track down?
Eich: Oh, man. The worst bugs are the multithreaded ones. The work I did at Silicon Graphics involved the Unix kernel. The kernel originally started out, like all Unix kernels of the day, as a monolithic monitor that ran to completion once you entered the kernel through a system call. Except for interrupts, you could be sure you could run to completion, so no locks for your own data structure. That was cool. Pretty straightforward.
But at SGI the bright young things from HP came in. They sold symmetric multiprocessing to SGI. And they really rocked the old kernel group. They came in with some of their new guys and they did it. They stepped right up and they kept swinging until they knocked the ball pretty far out of the field. But they didn’t do it with anything better than C and semaphores and spin locks and maybe monitors, condition variables. All hand-coded. So there were tons of bugs. It was a real nightmare.
I got a free trip to Australia and New Zealand that I blogged about. We actually fixed the bug in the field but it was hellish to find and fix because it was one of these bugs where we’d taken some single-threaded kernel code and put it in this symmetric multiprocessing multithreaded kernel and we hadn’t worried about a particular race condition. So first of all we had to produce a test case to find it, and that was hard enough. Then under time pressure, because the customer wanted the fix while we were in the field, we had to actually come up with a fix.
Diagnosing it was hard because it was timing-sensitive. It had to do with these machines being abused by terminal concentrators. People were hooking up a bunch of PTYs to real terminals. Students in a lab or a bunch of people in a mining software company in Brisbane, Australia in this sort of ’70s sea of cubes with a glass wall at the end, behind which was a bunch of machines including the SGI two-processor machine. That was hard and I’m glad we found it.
These bugs generally don’t linger for years but they are really hard to find. And you have to sort of suspend your life and think about them all the time and dream about them and so on. You end up doing very basic stuff, though. It’s like a lot of other bugs. You end up bisecting—you know “wolf fence.” You try to figure out by monitoring execution and the state of memory and try to bound the extent of the bug and control flow and data that can be addressed. If it’s a wild pointer store then you’re kinda screwed and you have to really start looking at harder-to-use tools, which have only come to the fore recently, thanks to those gigahertz processors, like Valgrind and Purify.
Instrumenting and having a checked model of the entire memory hierarchy is big. Robert O’Callahan, our big brain in New Zealand, did his own debugger based on the Valgrind framework, which efficiently logs every instruction so he can re-create the entire program state at any point. It’s not just a time-traveling debugger. It’s a full database so you see a data structure and there’s a field with a scrogged value and you can say, “Who wrote to that last?” and you get the full stack. You can reason from effects back to causes. Which is the whole game in debugging. So it’s very slow. It’s like a hundred times slower than real time, but there’s hope.
Or you can use one of these faster recording VMs—they checkpoint only at system call and I/O boundaries. They can re-create corrupt program states at any boundary but to go in between those is harder. But if you use that you can probably close in quickly at near real time and then once you get to that stage you can transfer it into Rob’s Chronomancer and run it much slower and get all the program states and find the bug.
Debugging technology has been sadly underresearched. That’s another example where there’s a big gulf between industry and academia: the academics are doing proofs, sometimes by hand, more and more mechanized thanks to the POPLmark challenge and things like that. But in the real world we’re all in debuggers and they’re pieces of shit from the ’70s like GDB.
Seibeclass="underline" In the real world one big split is between people who use symbolic debuggers and people who use print statements.
Eich: Yeah. So I use GDB, and I’m glad GDB, at least on the Mac, has a watch-point facility that mostly works. So I can watch an address and I can catch it changing from good bits to bad bits. That’s pretty helpful. Otherwise I’m using printfs to bisect. Once I get close enough usually I can just try things inside GDB or use some amount of command scripting. But it’s incredibly weak. The scripting language itself is weak. I think Van Jacobson added loops and I don’t even know if those made it into the real GDB, past the FSF hall monitors.