Strong Typing is for People With Weak Memories

I’m an “old guy” in the software world, old enough to have had to enter code on punch cards and be forced to debug assembly by flipping through a 12-inch-thick fan-fold printout. In my first real job at IBM, one of the programmers I worked with (who was an “old guy” even back then) gave me a paper copy of what would now be called a “meme”. It was titled “Real Programmers Don’t Write Specs” (here’s the closest thing I could find online: http://multicians.org/thvv/realprogs.html). The last quip always springs to mind when the static vs. dynamic language “debate” rears its head:

Real Programmers don’t write in PASCAL, or BLISS, or ADA, or any of those pinko computer science languages. Strong typing is for people with weak memories.

Anybody out there remember BLISS? Anybody? Bueller?

Anyway, that one is always worth a chuckle. I don’t entirely agree with it, but as with the best humor, there’s a hard kernel of truth lurking there. Static type analysis is certainly a useful tool, with the usual disclaimer that the utility is a function of the problem. I do think we’re missing the point when the main argument in favor of static analysis is to do things like catch function calls that don’t match the function signature. Ensuring correct function calls is not the long pole in the tent of software quality. More broadly we need to be more cognizant of the array of tools available to us, the trade-offs they present, and not focus on any single approach. Static type systems can be a useful tool, when applied to the right problems, but they are not the only tool, nor will they ever be a complete solution (and yes, sideways nerdy reference to Turing completeness was absolutely intentional).

The tl;dr here is that software quality is generally about enforcing constraints, and the constraints themselves are what’s important, not the specifics of how they are applied to a particular problem. I was a research physicist for a number of years. Physics spends a lot of time being concerned with “symmetries”, the idea that I can change something while other quantities remain invariant. Generally, symmetries imply “conservation laws”, e.g. the equations describing the motion of matter in a galaxy are invariant under rotations. Rotational invariance in turn implies the conservation of angular momentum. That conservation law then can enter into a constraint for solving problems. For instance, consider the motion of a large spherical cloud of matter like a proto-galaxy, with some net angular momentum, undergoing internal friction. To solve for how the matter distribution evolves over time, we apply the constraint that angular momentum is conserved, and find that the spherical cloud evolves to a rotating disk. The specific algorithm used to find that solution isn’t important, provided the solution it provides satisfies our constraints.

So constraints are important. In the galaxy example, the constraint served to guide us to a particular function describing the ultimate distribution of matter in a galaxy. Software engineering is really all about constraints. Programs are really a way of computing functions, where some input value leads to a particular output value. Assuming something like a Turing machine model, we have an entire space of functions which can (theoretically) be expressed as computer programs. But only some of those functions are useful to us, and “useful” means that they satisfy the constraints that define whatever problems we’re trying to solve. Those constraints might be simple, such as this function takes in two integers and returns an integer; or they could have more complex domain information: input today’s date, and send an email to all customers in our database whose birthday is that day, handling leap years etc.

Static types and the associated static analysis are clearly one way to express and enforce constraints. It is, in some sense, the most powerful way, because “static analysis” means that we can prove certain propositions about our program for all possible inputs. That’s the siren song of static typing: static types provide absolute certainty as to the truth of certain propositions about your code. And if we know that pieces of code, like functions, satisfy our constraints, we can also deduce that compositions of those pieces will also satisfy those constraints — or not, in which the compiler tells you about it.

Where static typing applies, it’s a powerful tool. The problem is that you can mathematically prove that static analysis is intrinsically limited for Turing-complete languages. Flipping that around, the language used to express types is necessarily not Turing-complete, or else you couldn’t do static analysis. An interesting example of this pops up with monads in Haskell. Monads are types (or more correctly, type classes), and Haskell has its own embedded sublanguage for specifying types. Code written using monads can be statically verified, even though it often incorporates what in other languages would be considered “side-effects”, things outside the purview of the compiler to verify. So that’s pretty cool. But when you write your own monad type, Haskell can’t statically verify that your monad satisfies the monad laws. That’s on you, either to eyeball it and convince yourself, or perhaps do simulation testing to at least have some level of confidence. This is not a deficiency in Haskell. It just isn’t mathematically possible to do in the general case, because Haskell is Turing-complete.

What are our other options for verifying constraints on our code? If you want, with full generality, to stay in the realm of absolute certainty, where reasoning about code becomes logical deduction, your only choice is to give up Turing-completeness (at least, I don’t know of any other options). That’s not as crazy as it might sound on the surface. SQL, arguably one of the most successful languages, was not originally Turing-complete. That in turn allowed for powerful query optimizers, which could analyze mathematically equivalent forms of a query and pick the one which executed most efficiently, which at least partly drove the growth and dominance of RDBMS systems for several decades. Later extensions to SQL made it Turing-complete, for better or worse. Datalog, another query language, is based on a subset of Prolog which allows for writing logic programs which are guaranteed to terminate (though not necessarily in a reasonable amount of time). If you poke around Google, you can probably find other examples of DSL’s which are not Turing-complete, for defining configuration tasks (e.g. Dhall) etc. Tangent: if you are considering writing a customer-facing DSL for your product, to allow scripting or do some other customization, do not make it Turing-complete. If you do that, you might as well just turn them loose with Javascript or C#. Either way, you’re handing them the same box of hand-grenades. With all the pins pulled.

A really interesting idea is that of total functional programming. We are still giving up Turing-completeness, but in return get much stronger static guarantees about our code. The key realization is that Turing-complete languages necessarily don’t specify the entire domain of functions expressed in those languages. You can specify the signature of a function as int -> int -> int in Haskell or any other statically typed Turing-complete language, but all that guarantees is that if the ever returns a value, it will be an int. You can’t guarantee that the function ever stops, or that it doesn’t die with some sort of error like divide by zero. Total functional languages require complete specification of a function’s domain, as well as only allowing a reduced form of recursion which can be statically verified to terminate. The upside is that you get much more powerful guarantees that your code conforms to your constraints. For example, the far-too-common and dreaded buffer overflow goes away. That said, you do have to work within the constraints of the tool itself, and in particular iterative algorithms require more head-scratching. And of course you’re giving up Turing completeness, but as the original paper argues, that may not be terribly limiting. Ethereum is moving toward using a total language for contract specification, which certainly seems compelling, because I can’t see how the hell you can have a system that verifies contractually constrained transactions if the language used to define the contracts is not 100% verifiable.

If you don’t want to give up Turing completeness, you can give up certainty. Our constraints are no longer 100% verifiable, but we can inform the degree-of-belief that our code respects the constraints. Unit tests are one example of this approach, albeit a pretty weak one. A unit test will verify that a single set of inputs yields the “correct” output. So now you have at least one data point which increases your belief that the code is correct for all inputs. Simulation testing (e.g. Haskell QuickCheck) takes this a step further, testing a large number of different input values. So now you have a lot of data, and the generators of the inputs can be tuned to ensure they hit “interesting” points. A function which takes an int probably should be tested with a wide sampling of integer values, and include typical “problem” points, like zero and the min/max value for the particular integer representation. Such test-based approaches can be augmented with other tools, such a test coverage, which further inform your belief that the code will satisfy the constraints for all possible inputs.

I think we can start to get an idea of why the “static vs. dynamic” language debate is so polarized. Static analysis is grounded in math, and gives definitive answers, something which some people no doubt find extremely compelling. I remember seeing a comment somewhere like StackOverflow that using monads without static typing was “insanity”, or something like that. My experience has been more like this guy’s. There is clearly some advantage to typed monads, as Brian Beckman says, you just “line up the types” in your computation. But I’ve used monads in Clojure for some pretty complex stuff, and even without types it proved to be a useful pattern. I think I’m probably doing a lot of the type analysis in my head. Perhaps those drawn to dynamic languages may have that sort of skill of mapping lots of code to the constraints in their brain.

Where does that leave us? Static types and the associated analysis give exact answers to some set of propositions about our code and the constraints to which that code should adhere. It thus provides a way to reason about code which is purely deductive logic, and thus can be performed by the computer. The problem is that relatively few propositions can be proven in this way. We then turn to a less exact approaches, like unit tests (weak) or simulation testing (better) which give us at least some evidence that our constraints are satisfied, thus turning our process of reasoning from deduction to inference. This inference currently is largely done in people’s heads, rather than with a software tool, trading off exactness and transparency for the ability to reason about a much wider range of constraints. How can we improve the situation? Certainly one approach is to broaden the scope of the deductive approach as much as possible with approaches like total functional programming. But I’ve rarely (ever?) seen any discussion of putting the inferential approach on more solid quantitative ground, increasing the transparency and allowing for more automated tools to perform this reasoning, rather than having it slosh around in someone’s brain.

What I’m going to describe here is pure spit-ball-in-the-wide-blue-sky conjecture, but I think it’s worth exploring. To motivate that belief, I’ll give you another example from my time in astrophysics research. A lot of my work was around image reconstruction for gamma-ray telescopes. Gamma-ray telescopes, like a Compton telescope, work via quantum optics. You don’t get to use a lens to focus gamma-ray photons on a nice pixel array and make a picture. Instead you have to make measurements of quantum interactions of the gamma-ray photon with the matter in your detector, and then try to back out the original direction and energy of that photon. Further, the general rule in astronomy is that the higher you go in energy, the fewer photons are available, and on top of that you have to make the observations in space (or near-space) where the non-photon cosmic ray background adds a lot of background noise. Backing out an image of the gamma-ray sky is thus a fairly difficult statistical inverse problem, and must utilize regularization techniques similar to what’s commonly done with neural networks.

The problem was that (just like it is with neural networks) the choice of regularization was arbitrary, and had nothing to do with the scientific questions at hand. For example, one bit of science to be performed by COMPTEL was to try and determine the distribution of 1.6 MeV gamma-rays produced by the decay of aluminum-26. Aluminum-26 is a radioactive isotope of aluminum which can be made by various stellar processes such as supernovae. These various sources have different spatial distributions in the galaxy (e.g. “clumpy”, “smooth”), and if you can analyze the spatial distribution of 1.6 MeV gamma radiation from the Milky Way, you get a lot of information about heavy element nucleosynthesis. Great stuff, except that picking some arbitrary regularization technique now gives you a knob to turn such that the reconstructed gamma-ray image can pretty much be as smooth or clumpy as you want, with no real way to ask a quantifiable scientific question around the various hypotheses for heavy element nucleosynthesis.

One of my last research projects before leaving for industry was an attempt to address this problem. Guess what? It involved the idea of applying constraints. Those constraints could be derived from the competing scientific hypotheses, be used (via the method of maximum entropy) to derive a prior on the spatial distribution of gamma-rays, which leads to things like posterior odds ratio analyses comparing the different hypotheses. The particular form of those constraints had to do with quantifying the regularity of the function describing the intensity of gamma-rays on the sky. This “regularity” is very much tied to the “regularization” used to stabilize the statistical inverse problem — regularization procedures are essentially just a way of enforcing constraints on the derived function, where “regularity” is some quantitative measure which classifies similar functions. My particular choice was the Besov norm, which can be thought of as sort of a generalized measure of how “smooth” (really differentiable) a function is. The Besov norm has a fairly simple expression in terms of wavelet coefficients of the image, which in turn lead to a nice closed-form prior distribution.

Anyway, that’s a long and super-geeky digression, but I promise I have a point, which is that I have a hunch this notion of regularity -> prior -> posterior may be a path toward quantifying inference about programs. Think about the mental process that has to occur to have confidence in simulation testing, where you ultimately are probably hitting a very small fraction of the possible inputs, yet come away with a strong belief that your code generally satisfies the desired constraints. A key to that process has to be assumptions about the regularity of the functions in your program. Let’s take our simple int -> int -> int function again. Assuming 64-bit integers, there are 2¹²⁸ possible inputs, of which maybe you test, I don’t know, a million or so? To have any belief that your code is correct for all 2¹²⁸ possibilities means that you believe strongly in the regularity of your function, which is to say that you feel it behaves roughly the same for all inputs, and does not blow up for some special value. The generator used to test that function reflects this, which is why zero and min/max int values are generated with high probability, and generally values that are otherwise common in code (like +/- 1) are perhaps generated with higher probability than some other specific value like 188366531. If you can quantify that regularity belief, then you can theoretically derive a prior, which when combined with the “data” from simulation test leads to a posterior probability that your function is correct. And doing that for individual functions now leads you to be able to perform inference over compositions of functions, and ultimately your whole program. I think you even can combine this with static analysis to ultimately lead to higher degrees of belief in code correctness. Indeed, it would give you a quantified assessment of the benefit static typing adds to your code.

Admittedly, it’s all hand-waving at this point. But I think this sort of inference-based approach is another worthwhile avenue of research, because we already know that deductive methods are intrinsically limited. Quantifying this inference makes it much more powerful, leading to automated tools and avoiding the errors and biases which are inevitable if humans simply perform inference by intuition. Finally, we may be forced into this approach anyway, as a way of dealing with the coming revolution of quantum computing.