Monthly Archives: November 2017

Programming language evolution

It’s been a long time. I got married, and now have even less time to write stuff on the net than I had before. But here I am, sharing another thought when I have a little bit of time available right now.

This is inspired by Uncle Bob’s post Living on the Plateau. He speaks of “the metal”, and how the metal drove the evolution of the programming languages. Indeed, we jumped from assembly to C because we could afford that level of abstraction and get rid of the low-level dependency on the particular hardware. Then we could afford garbage collection and virtual machines, so we did just that. Then Robert states that functional languages were driven by the need for more CPU cores, which is probably true (I’m not an expert on functional languages).

But there’s one thing that was always nagging me a bit. If we can afford to think of our resources as infinite (unless we’re trying to solve NP-complete problems of huge sizes or load petabytes of data into RAM), then why on earth should our languages evolution be driven by the metal? Robert speaks of language evolution changing into craftsmanship evolution. While craftsmanship evolution is definitely a good thing, I still think we could use better tools to become better craftsmen. And, surprisingly, we’re still too close to the metal for that.

Think of it. We have very powerful hardware, we have virtual machines and interpreted languages. But take Java or C# for example. In Java, we still have these bare metal types, such as int or double. They are even called “primitive”! In C#, they are no longer that primitive, in the sense they at least inherit from Object, which is definitely a good thing (just look at java.util.stream API to understand what I’m talking about). But they are still too close to the metal. Why on earth, from the user’s perspective, 2147483647+1 should equal −2147483648? OK, Python goes further along and makes integer types virtually infinite, but… if you think of it, these are just very tiny steps to what I think should be the next step in the language evolution.

Look at this code:

Makes sense? Not to me. Why on earth userIndex has the inttype? The type should reflect what it is, and user index is definitely not just a random integer. Indeed, this would compile just fine:

It compiles, but it feels so terribly wrong that one might wonder if there’s something wrong with our language. And I think there is. The root of this problem is that we’re still thinking too close to the metal. Much closer than we can afford to, given the sheer computing power available to us.

Indeed, I think that low-level types, limited in size and capabilities, should still be available, but they must be very rare and exotic, not ubiquitous like they are in today’s software. For example, the code above might look like this:

Now that’s better, but still feels wrong. Why on earth all lists should use the same type for indexing?

Now that’s something that makes sense to me.

OK, but what if we’re working with a pure math problem? Shouldn’t we use raw types then? Suppose they’re not prone to low-level effects like overflow, as they aren’t in Python? No. Even in math, there are all kinds of integers that are not exactly interchangeable. You wouldn’t assign an X coordinate to a variable holding a Y coordinate unless you’re performing some kind of transpose operation, in which case your code should probably explicitly convert these types. Working with doubles, you wouldn’t assign an angle to a coordinate. Come to think of it, the very word double comes from the metal. It just reflects hardware precision.

I was recently explaining a simple LeetCode problem to my wife (she isn’t a programmer at all). I was able to explain this code to her pretty well:

But she was stuck at this bit: (m & 1) == 1. And indeed, it just doesn’t look right. If m is just an integer, int the math sense, then why on earth am I fiddling with bits instead of just saying m.isOdd()? Now that would look much cleaner. Indeed, I’d expect this code to look more like this:

I’m still using some literals here, and I’m not sure what to do with them. Perhaps our language should be able to implicitly convert them to the needed types, as long as it makes sense (if that’s even possible to detect at compile time). One thing that still feels terribly wrong about this code is count /= 2. Why on earth should we assume that division rounds down by default? That is another low-level bit that comes to bite us. There’s probably more, but my mindset is probably still too low-level to figure it all out.

Nowadays some libraries, especially those that use so-called fluent API, go along the route I’m talking about, to a some extent. For example, in AssertJ, we often write:

Not only it helps to make code more readable and resolves possible signature clashes, but it also prevents wrong code from compiling.

But these are again very small steps. These tricks exists only at the API level, whereas even the standard library continues to use ubiquitous ints, longs and whatnot. And in the next stage of programming language evolution, I’m eager to see higher-level concepts introduced at the language level, at the standard level. We must stop thinking in bits and bytes and start thinking in concepts. And even at low level, wouldn’t it be better to see

rather than

If anything, it would stop us from making stupid bugs when we accidentally mistake the order of the arguments or get confused between from, to and from, length kinds of arguments. How often did you write Arrays.copyOfRange(array, index, size)instead of Arrays.copyOfRange(array, index, index + size)? I know I did that many times. Unit tests catch these pretty quickly, but wouldn’t it be wonderful if it was plain impossible to even make this kind of mistake?

And while we’re not quite there, I encourage all developers of higher-level software to employ tricks like the one above to introduce more custom types, to force code look cleaner, to make it easier to understand and harder to get it wrong. Don’t think like a programmer unless you have to! Think as a business area specialist! This is like a premature optimization. Unless you have proven that using a custom type slows your software terribly, don’t go for that int and string types! You don’t work with separate bits when you can work with whole bytes, and you don’t work with whole bytes when you can work with integers, right? So don’t work with integers when you can work with list indexes, coordinates, object counts and other high-level stuff that really makes sense in the application area you’re working with.