Rule of Representation of The Art of Unix programming

Tremble in fear, oh lj-cut worshippers! Especially, “Russian-only” ones. Today is another day I step upon your beliefs. Well, sorry for that (as if!).

Once, I had to implement one C++ wrapper class over a poorly designed library, so its poor design would not affect the rest of my application. This library uses a lot of callbacks, each of which may be called in different situations and therefore should be interpreted differently. For example, a “serviceStopped()” callback can mean that the service has just been stopped (surprised?), but it can as well mean that the service has failed to start (surprised!). Another really “nice” thing was that most calls in this library are non-blocking, using callback mechanism to notify application about events, but “connect()” call used to establish a network connection, was blocking. Great! Now I have to fork separate thread only to establish a connection!

And so, to correctly handle each possible situation (for it is really important thing in real-time telemetry transmission software, you know!) I created this wrapper class and introduced a few concepts: state, target and operation. State is where exactly are we now: this whole thing can have three stable states (“no connection” or “idle”, “connection, but no service” and “connection and service”), and four intermediate states (“connecting”, “starting service”, “stopping service”, “disconnecting”). Target is what we want to achieve – this is actually a subset of states: “disconnected” (we want no connection), “connected” (we want connection, but not service), “running” (we want connection and service). And operation is what exactly is going on right now – these correspond to four intermediate states plus “no operation”, which means that we are not actively doing anything right now. Difference between “connecting” state and “connecting” operation, for example, is that we can be stuck in the “connecting” state, which means that we are trying to establish a connection, but current operation may be “connecting” (if there is and ongoing connection attempt) or “no operation” (if we are waiting between connection attempts). We really need this “operation” concept because if we suddenly want to cancel connection attempt and there is “connecting” operation going on right now, we can do nothing but wait until it is complete (just because there is no way in the library to cancel it). But if we are in “connecting” state, but “no operation”, we can just switch state to “idle” and forget about pending connection attempt. Get it?

Now for implementation. Okay, I must have a lot of callbacks in my class. Only a couple of them are doing really useful and significant work. Others are just there to watch over the library and make sure things will not break completely just because something nasty happened. Not an easy task if you do not (and I really do not) know what to expect! So I ended up with a lot of “switch (state)”, “switch (operation)” and alike. About 800-900 lines of code. Well, it is not very much, but still a lot. But then I thought “well, it is much more better than having to deal with this library directly, and maybe is just a price I have to pay to avoid all that mess”.

Now for another part. That one actually was about receiving telemetry. But once it has been received, I must convert it into another format (a big problem, too, but outside of this post) and send it to the processing server. The good thing about processing server is that I do not need a special library to do it – just connect to the specified TCP port, send a little of magic data, and go on! Good. The bad thing is that there may be problems with the processing server and they are to be treated with care, because just ignoring them would easily just break everything on the server, including possible ongoing ISS telemetry sessions. Okay, so what am I to do? Well, first thing I did was to implement a kind of library to send telemetry to the processing server. Much simpler and much cleaner than one I have to use to receive telemetry. Easy part. Just a simple mechanism, nothing more.

Okay, now I have to plug it in my application. But I still have to control it! I have to make a different decisions basing on what is going on right now. Well, I have better knowledge about what is going on (because the library is written by myself), but this is not freeing me from making decisions. Okay. Having a (relatively) positive experience with the receiving part, I implemented similar concepts for the sending class, except that there was no “service” concept, so only four states (two stable and two intermediate), two operations (three including no-op) and two targets. Nice. Then I started to implement “decision making”, but quickly realised that it gives me a little bit too much of those switch statements. While I though that was okay for the receiving part, here it became clear that here, where I have the complete control over the code, it would be definitely better to have simpler but still clean and powerful mechanism. But how do I get around with making decisions without using “if” or “switch” statements?!

Fortunately, a few days ago I stumbled upon a great book called “The Art of Unix programming“. I have just started to read it, but at the very beginning I found "Rule of Representation": "Fold knowledge into data, so program logic can be stupid and robust". And then I asked myself a question. What exactly determines what should I do at the some point of execution? I thought a while, and realised that it is operation-state-target combination. I though a little more and found out that the time when the current operation has started (or when the last operation was completed, if current operation is no-op) also matters. For example, what to do if we are in “connecting” state, “connecting” operation and “connect” target? Nothing, obviously. Just wait until connection operation is completed… Oops, did I just said “wait”? But how long? What if this operation never completes? So we have to check the time, and if too much has passed since operation start maybe we better just abort the connection attempt, print something in red font, and switch to the “idle” target. And if the time is not run out yet, well, we have to come back to decision making after it runs out – unless something happens earlier (like operation completion or failure). Looks clear enough, does not it?

So I came up with a solution, when “what to do” was just a structure (called “action”) with two (yes, that many!) fields, one of them being function name to call and another is time that must pass after the beginning of the current or the end of the last operation. If function name is NULL, do nothing. If time is zero, call function immediately. If it is not, check the time, and if it has run out already, call the function, otherwise wait remaining time and go back to decision making again. Then I created three-dimensional (operation-state-target) array filled with these structures, writing “errorAction” as function name in each “impossible” element (like “starting” operation in “stopping” state). This “errorAction()” prints out current operation-state-target combination, so I can easily debug it. And I implemented a “takeActions()” function that was actually doing this “decision-making” logic using that “actions” array. I had to add something else in it, like updating current state, handling program shutdown, and a little of such things, but still this function was small enough to fit on a single screen! And no single “switch” statement in the whole class! Only a few “ifs”, but those are unworthy to fight with “Rule of Representation” principle. As the result, about 350-400 lines of code – yes, twice as less as in the receiving class. Well, we do not have a “service” concept in it, and do not have to deal with that messy library, but it still looks surprisingly small, for example, compared to just a draft of the “switch-based” implementation. Just think about: trade a lot of huge switches for just one array of just two-field structures! Another good thing about is that if I realise that there is something else that affects my decisions or describes what exactly to do – no problem! Just add another field to the action structure or introduce another state, operation or target. And change one function a little.

Now I am wondering is it worth it to reimplement the receiving part using similar mechanism. Probably it is, and maybe it would be nice to implement this logic on more abstract level, without tying it to the specific problem: after all, nothing in “takeActions()” function is related to sending telemetry at all!

Summary: one may be a good coder, but there is much more to the software development than just coding. One have to be able to find better solutions, and for that one have to smell “bad” solutions (like that switch-based one). And in order to achieve that, valuable sources of information exist, like that “The Art of Unix programming” book and other stuff written by experienced developers (like Joel I mentioned earlier), who really know what they are talking about – not those who only theorize about everything without actually trying it and living it.

Leave a Reply