Computer failures can happen any time, but it’s been so long since I’ve had a hard disk failure that I rarely worry about such problems. Part of my relaxed stance has to do with backups, which I always keep in triplicate, so when I discovered Friday afternoon that one of my hard disks had failed — quickly and catastrophically — it was more of a nuisance than anything else. It meant taking out the old disk, going out to buy a new one and installing same, and then loading an operating system on it. Because I do 90 percent of my work in Linux, I opted for Linux Mint as a change of pace from Ubuntu, making it the tenth version of Linux I’ve used over the years.

My weekend was mildly affected, but the new disk went in swiftly and the operating system load went without incident, so I was still able to get to two concerts, one of them an absolutely brilliant handling of Elgar’s ‘Enigma Variations,’ and to see the new Tommy Lee Jones movie ‘Emperor.’ Hardware failures in the midst of an urban environment, and with adequate backups on hand, are thus easily handled. But then I started thinking about robotics and deep space. Ponder the hardware failures that are inevitable on missions lasting decades or even centuries. An unexpected failure in a key circuit could wreck a lot more than a weekend on such a probe.

From Wardens to Self-Healing

Remember the ‘wardens’ that were built into the Project Daedalus plan? They were designed to take care of the vessel on its 50-year run to Barnard’s Star, an acknowledgment of what happens to complex systems over time. These days we’re focusing in on self-healing electronics that can repair themselves in microseconds, integrated chips that spring back from potential disaster, rebuilding themselves faster than any human intervention could manage. Members of the High-Speed Integrated Circuits laboratory at Caltech have been experimenting with self-healing integrated chips that can recover all but instantaneously from serious levels of damage.


Image: Some of the damage Caltech engineers intentionally inflicted on their self-healing power amplifier using a high-power laser. The chip was able to recover from complete transistor destruction. This image was captured with a scanning electron microscope. Credit: Caltech.

The chips in question are high-frequency power amplifiers useful for communications, imaging, sensing and other applications. Each of these chips holds more than 100,000 transistors along with a custom-made application-specific integrated-circuit (ASIC) that monitors the amplifier’s performance and adjusts the system’s actuators when changes are called for. The idea is to let the system itself determine the need to use the actuators without humans overseeing the process. Researchers therefore target the chips with a high-power laser over and over again, observing the chips as they come up with split-second workarounds to the damage.

“It was incredible the first time the system kicked in and healed itself. It felt like we were witnessing the next step in the evolution of integrated circuits,” says Ali Hajimiri (Caltech). “We had literally just blasted half the amplifier and vaporized many of its components, such as transistors, and it was able to recover to nearly its ideal performance.”

This Caltech news release compares the healing properties of these integrated-circuit chips to the human immune system, which can likewise respond quickly to a wide range of attacks. Interestingly, the team discovered that the amplifiers with self-healing capacity consumed about half the amount of power as standard amplifiers, while their overall performance was more predictable. By demonstrating self-healing in a highly complex system like this one, the Caltech researchers have shown that it can be extended to many other electronic systems.

All this is good news for our starship. We naturally think about catastrophic problems that damage parts of the circuits, but when we’re thinking long-term, the issues are likely to be more subtle. Problems can emerge as continual load stresses the system and causes changes to its internal properties, while variations in temperature and supply voltage can also degrade operations. For that matter, variation across components can play a role, making an electronic system with a built-in immune function an insurance policy for deep space robotic missions.

Meanwhile, my own computer operations continue with extensive human intervention, though I’m pleased to see that the new hard disk I installed checks out perfectly. We are all learning through experience how our lives are supplemented and changed by digital technologies. But robotic probes operating at the edge of the Solar System and beyond have no repair team on staff to open up a housing and plug in a new chip, We’re now learning that beyond redundancy and backups a new set of tools are emerging that will keep long-haul missions healthy.

The paper is Bowers et al., “Integrated Self-Healing for mm-Wave Power Amplifiers,” IEEE Transactions on Microwave Theory and Techniques Vol. 61, Issue 3 (2013), pp. 1301-1315 (abstract). Thanks to Eric Davis for the pointer to this work.