Buggy Software: Achilles Heel of Big-Data-Powered Science?

The Atlantic

We've heard a lot about scientific fraud recently, and it's a serious concern. But how reliable are honest research results? On the science website iSGTW, the journalist Adrian Giordani points to a growing concern with software defects:

In October 2012, a workshop about maintainable software practices in e-sciencehighlighted that unchecked errors in code have caused retractions inmajor research papers. For example, in December 2006, Geoffrey Changfrom the Department of Molecular Biology at the Scripps Research Institute, California, US, was horrified when a hand-written program flipped two columns of data, inverting an electron-density map. As a result, a number of papers had to be retracted from the journal Science.
In an earlier paper co-authored with a professor of forensic software engineering, Leslie Hatton, he had written:

I (Hatton) have worked for 40 years in meteorology, seismology, andcomputing, and most of the software I've used has been corrupted tosome extent by such defects - no matter how earnestly the programmersperformed their feats of testing. The defects, when they eventuallysurface, always seem to come as a big surprise.

The defects themselves arise from many causes, including: arequirement might not be understood correctly; the physics could bewrong; there could be a simple typographical error in the code, such asa + instead of a - in a formula; the programmer may rely on a subtlefeature of a programming language which is not defined properly, suchas uninitialized variables; there may be numerical instabilities suchas over-flow, under-flow or rounding errors; or basic logic errors inthe code. The list is very large. All are essentially human in one formor another but are exacerbated by the complexity of programminglanguages, the complexity of algorithms, and the sheer size of thecomputations.

A site called RunMyCode, developed by the Columbia University computer scientist Victoria Stodden, helps scientists discover errors by sharing code and data, accelerating replication of experiments.  And fortunately many software glitches, as in more familiar consumer products, occur only occasionally and don't seriously affect functionality most of the time. The real problem is that once in a while a software error can turn lethal, as it did notoriously in radiation therapy in the 1980s.

Replicability can help scientists correct inevitable bugs. But what about the many other programs that govern everyday life, from forensic lab tests to credit scores and anti-terrorism watch lists, of which the codes are often commercial or national security secrets? In the Defense Department's troubled F-35 jet program,

the "gorilla in the room," [the project manager Air Force Major] General [Christopher] Bogdan said, is testing and securing the 24 million lines of software code for the plane and its support systems, a mountain of instructions that goes far beyond what has been tried in any plane.

In civilian life, for example in credit score calculations, there is often no effective appeal from these programs' results. The question now is whether we can develop better tools for catching false positives and false negatives before they do serious damage.

More From The Atlantic
View Comments (0)