MVC, MVP and MVVM, pt. 1: The Ideas

There is a lot of confusion going on about GUI design patterns such as Model–View–Controller, Model–View–Presenter and Model–View–View Model. I’m starting this series of blog posts to share my own knowledge and experience with these patterns, hoping to clear up things a bit. I’m not going to dive deep into the history behind these patterns. Instead, I’m going to concentrate on things as they are today.

I’ll start with the ideas behind these patterns. There is one single idea behind them all: separation of concerns. It’s a well-known idiom, closely related to the single responsibility principle, the S part of the SOLID principles. The most clear form of it says: there should be only one reason for a class to change. Separation of concerns takes that to the architecture level: there should be only one reason for a layer to change. The granularity of that reason is different, though. One may say: the Money class should only change if the logic of working with money changes. On the architecture level one would say instead: the view layer should only change when appearance should change (for example, money should now be displayed using a fixed-width font). In particular, the view layer should not change if the business logic changes (money should now be calculated to 2nd digit after the decimal point) or if presentation logic changes (money should now be formatted with 1 digit after the decimal point).

Model–View–Controller

With these ideas in mind, let’s go over the three patterns, starting with MVC. It’s probably the most confusing of them all, and I think it’s mainly because separation of concerns is not complete in MVC.

mvc

To add to the confusion, there are many variations of MVC, and there is no single agreement on what exactly the components do. The view is the easiest part: its job is to display things and receive interactions from the user. You can’t really separate these two concerns: how would you separate displaying text that the user is editing and actually editing this text? There should be a single graphical component that does both of these things. You can do the next best thing, though: delegate user interactions to another component. And here is where the controller comes from.

The controller receives user interactions from the view and processes them. Depending on the interface between the view and the controller you may be able to reduce coupling between them, and that’s a good thing. Suppose your view is implemented with Swing, and there is the apply button. Instead of making the controller implement ActionListener, implement it inside the view and delegate the apply button click to the apply method of the controller, which is UI-agnostic (it doesn’t depend on Swing at all).

That was the easy part. But what happens next? The controller acts on the model, which contains the actual data the application works with. Then, at some point, it may be needed to display the updated data back to the user. Here is where the confusion starts. One possibility is that there is the observer pattern acting between the view and the model. In this case, the view subscribes to certain events of the model, and the model either sends the updated data to the view (the push model of the observer pattern) or just events (the pull model). In the latter case, the view needs to pull the necessary data whenever it receives the appropriate event.

Note that even though the model sends data to the view, it has no idea of its existence because of the observer pattern. This is especially important if the model is in fact the domain model, which should be isolated as much as possible. It should only communicate to the outside world through clean interfaces that belong to the model itself.

Another variation of the MVC pattern is often seen in web frameworks, such as Spring MVC. In this case, the model is a simple DTO (data transfer object), basically a hash map, easily serialized into JSON or something. The controller prepares the model and sends it to the view. Sometimes it’s just a matter of passing the object inside a single process, but sometimes the model is literally sent over the wire. This is different from a typical desktop observer approach where the controller doesn’t even know anything about the view. To keep this coupling loose, the controller often doesn’t send the model directly to the view, but rather sends it to the framework which then picks up an appropriate view and passes the model to it.

web_mvc

What makes this pattern especially confusing is that the model is no longer the domain model. Rather we have two models now: the M part of the MVC pattern is the data transfer model, whereas the controller acts on the domain model (maybe indirectly through a service layer), gets back the updated data, then packs that data into a DTO and passes it to the view to display. This very idea of the data transfer model is exactly what makes this pattern so suitable for web applications, where the controller may not even know in the advance what to do with the data: you may have to wrap it into HTML and send to the browser, or maybe you serialize it into JSON and send it as a REST response.

Either way, one problem with MVC is that view is too complicated. One thing about UI is that it tends to be both heavyweight and volatile, so you usually want to keep it as clean as possible. In MVC, view doesn’t only display data but it also has the responsibility of pulling that data from the model. That means the view has two reasons to change: either requirements for displaying data are changed or the representation of that data is changed. If the model is the domain model, then it’s especially bad: the UI should not depend on how data is organized in the domain model. If the model is a DTO model, then it’s not that bad, but it still can be changed, for example, to accommodate the need for a new view (or a REST client). Still, MVC is often the best choice for web applications, and therefore is the primary pattern of many web frameworks.

One major disadvantage of MVC is that the view is not completely devoid of logic, and therefore it can be hard to test, especially when it comes to unit tests. Another disadvantage is that you have to reimplement all that logic if you’re porting your view to another tech (say, Swing to JavaFX).

Model–View–Presenter

One natural way to improve MVC is to reduce the coupling between the view and the model. If we make a rule that all interactions between the view and the model must go through the controller, then the controller becomes the single point for implementing presentation logic. That’s what we call a presenter. The term presentation logic refers to any kind of logic that is directly related to the UI but not to the way how the components actually look (that’s the view’s responsibility). For example, we may have a rule that if a certain value exceeds a certain threshold, then it should be displayed in red color. We split this rule into three parts:

  1. If a value exceeds a certain threshold, then it’s too high.
  2. If it’s too high, then it should be displayed in a special way.
  3. If it’s too high, then it should be displayed in red.

The first part is the domain logic. It could be implemented, say, with an isTooHigh method, but it really depends on the domain. The second part is the presentation logic, and if it looks like a generalization of the third part, that’s exactly what it is. The presenter knows from the model that the value is too high, and therefore, passes it to the view with some kind of Status.TOO_HIGH enum constant. Note that it has no idea of colors yet. It’s the job of the view to actually map that constant to a color. Or maybe it could be something else than a color, like a warning sign next to the value.

mvp

In the MVP pattern, the view is completely decoupled from the model. The presenter is something similar to the mediator pattern. In fact, if the view is implemented as a set of independent graphical components (like multiple windows), and the model also consists of multiple objects (as it almost always the case), it would be exactly the mediator pattern.

Unlike MVC, there is no observer pattern between the presenter and the view. The reason for this is that the view contains so little logic there is no place for any event handling there, except for view-specific UI events (button clicks etc.). It’s the presenter’s job to figure out when to update the view and do so by calling appropriate methods. These methods typically belong to an interface fully defined by the presenter, which is an excellent example of the dependency inversion principle (the D in SOLID). The presenter doesn’t depend on any technologies the view uses. Well, in theory at least. For example, it would be very tricky to implement exactly the same interface with Swing, JavaFX and HTML. How do you call methods on HTML? You could have some server-side object that sends the data to the browser using AJAX or even Web Socket, I suppose, but it would be very tricky and at the same time not as powerful as MVC, where controller is free of presentation logic and therefore can be shared between views with different presentation needs (such as HTML and JSON).

The positive side is that since all presentation logic is in one place, porting to another view tech is a breeze. You just reimplement your view interface with another tech, and you got it. Well, that’s at least in theory. In practice you may run into various problems. Threading, for example. Who is responsible that view methods are only called in the appropriate threads? Should the view enforce that? Probably yes, because the presenter has no way of figuring out which thread is right if it has no idea what GUI framework is used in the first place. But that imposes additional burden on the view. But still, MVP is probably as close as you can get to the perfection of total independence from the GUI framework used.

The bad news is that presenter now contains a lot of boilerplate code. It was a part of the view before, so it’s not like it became any worse than it was with MVC, but still it’s always a nice idea to get rid of as much boilerplate code as possible. That’s where MVVM comes into the picture.

Model–View–View Model

mvvm

MVVM is basically the same thing as MVP, except for one major difference. In MVP, the view only delegates user interactions to the presenter. Whenever the feedback is needed, it’s the presenter who takes action. It does it by literally calling methods on the view such as displayFilesList(files), setApplyEnabled(true), setConnectionStatus(ConnectionStatus.GOOD) and so on. That’s boilerplate code. With MVVM, the presenter becomes the view model, that is, a model that provides access to the ready-to-display data through the observer pattern, much like in desktop MVC. Except that now the view model can really prepare that data for display by filtering, sorting, formatting etc. So whatever presentation logic was in the view in MVC, it’s now in the view model. And while in MVP the presenter pushed that data from to the view, in MVVM the view pulls that data from the view model. This sounds like we’re adding responsibility to the view, and that’s a Bad Thing, right? Well, to a certain extent, yes. But the point is, this responsibility is typically almost entirely implemented by the framework. This is done through data binding, where you just specify that this component should display that property of the view model, and that’s basically it.

When your framework doesn’t support data binding, it’s usually a bad idea to use MVVM because you’ll essentially be moving the boilerplate code from the presenter to the view, which is indeed a Bad Thing. And even if you have data binding, it’s usually not that simple. Sometimes you have complicated structures to bind. Sometimes the order of updates is important and you have race conditions in your UI. Sometimes you have values of custom types that are tricky to display directly, you need to employ some sort of converters for that.

The good part is that with MVVM you typically only have problems when you have a non-standard situation. For most cases, it really decreases the amount of boilerplate code and displaying a person’s name in a text field becomes as simple as writing Text=”{Binding Person.Name}” in XAML.

Moreover, delegating user interactions is often implemented with data binding too. Well, as I say “often”, I really can’t think of any other MVVM implementation than .Net/WPF, so I guess it’s 100% of all cases, even though there is only one case in total! Nevertheless, using the command pattern, we can expose possible interactions as properties of the view model. The view then binds to them and executes appropriate commands when the user does something. One big advantage of it is that we can easily change these commands dynamically and the view will automatically update its interactions.

When choosing between MVVM and MVP, it’s important to consider several factors:

  • If your framework doesn’t support data binding, MVVM is probably a bad idea.
  • If it does, then how likely that you’ll want to switch UIs? How painful is it likely to be?
  • How difficult it would be to port your application from MVVM to MVP or MVC or vice versa?

All things being equal, it’s often the case that reimplementing the view interface for a new framework in MVP pattern is about as hard as switching from MVVM to MVP or whatever. In this case, it’s probably worth to use MVVM if that’s the thing with your framework. The same really goes about using MVC. When your framework offers you MVC, you probably don’t want to force yourself to use MVP instead unless you really plan to switch frameworks and design for it beforehand. Say, you’re using Swing right now, but you know you’ll have to move to JavaFX within 5 years.

One last thing to note is that it is possible to combine these patterns, although in most cases it’s likely to lead to over-engineering. For example, if your framework forces you to use MVC, you can really turn your controller into a view model, and then consider the whole view–view model part to be just a view for the MVP pattern. So when user does something, the view delegates that to the controller, which immediately delegates to the presenter. When the presenter gets updated data from the domain model, it sends that data to the controller (using a clean interface), which then stores it locally and fires an event to the actual view which pulls that data to actually display it. Sounds crazy enough as it is, doesn’t it? But sometimes it may be worth it, only experience can tell you. It’s probably best to start with the simplest thing possible, and complicate things only when you actually need it.

That’s it for now. I plan later to demonstrate all three patterns with a simple application. I’ll probably use Java for that, even though implementing MVVM would be tricky. But there is some limited data binding in Java, so it could actually work, if only for demonstration purposes.

Hide and show JTable columns, pt. 2

Now that we’re done with the menu, let’s make it work. The simplest test I can think of clicks on a menu item and checks that the number of visible columns has decreased:

I’ve changed getMenuItems to accept JTable instead of JPopupMenu to make the code less verbose. This test obviously fails as the menu does nothing yet. On one hand, the fix is not so simple. We need to add appropriate listeners to the items. On the other hand, something as silly as this will do the trick:

Okay, this is really silly. But our test passes now, and that means the problem is with our test. Why not use a data provider to parameterize our test?

This is better. Now it fails again. If you’re wondering about DP_COLUMN_INDEXES, it’s just a constant set to “columnIndexes”. Why bother? Because I don’t like hardcoding function names. What if I need to rename it later? Like this, I can name the provider function anything I want. Of course, it’s better to rename and change the constant as well, just to keep it consistent. But that’s just three changes: function name, constant name and the constant literal itself. And even if I forget to touch the constant, nothing breaks.

OK, so now the fix looks like this:

But it’s still obviously wrong. Well, maybe not so obviously, but remember that Swing has two coordinate systems: the view and the model. Columns can be rearranged in the view, so the mapping can change, even though it’s the identity by default. So we need yet another test. But first, we need to decide whether we want the menu items to rearrange when the columns are rearranged. I think not. First of all, it’s too much of a hassle. And it can be confusing for the user too. Besides, the menu contains all columns, and the view only some of them. Where do we put the hidden columns in the menu if they are not a part of the view order? So let’s keep them in the model order.

Now I have refactored setup code into a separate method. I haven’t annotated it with @BeforeMethod because it’s not a universal test fixture: the install method checks that the menu is not null, so obviously we can’t use it even before the test is started. I’ve introduced an @AfterMethod cleanup method, though (not shown), that just sets everything to null, just to be 100% sure that one test can’t possibly use something created by another.

Now to the fix:

I actually got it wrong on the first try—put the conversion call outside the listener. That froze vIndex forever, which is obviously not what we want. Note that this one of the cases where Hungarian notation is tremendously useful: both model and view indexes have the same type, so Hungarian notation lets us see immediately what kind of index we’re looking at.

Now I don’t really like the looks of the install method. And we haven’t even got to showing hidden columns. So let’s refactor it a little bit.

This is much better. More methods, the code is much longer, but each method is pretty clean and readable. Note that I’ve created a field for the JTable. It was captured by the lambda anyway, so I haven’t actually introduced any new state here. Just moved it from the lambda to the top-level class.

Now let’s check that the columns are properly shown (which, of course, they aren’t).

This fails, but with a very obscure message: java.lang.ArrayIndexOutOfBoundsException: -1. Why? Because the second click is trying to hide an already hidden column. Can we do something about this message? Well, not by modifying the testing code. But we can modify the hideColumn method! What should it do when the column is already hidden? Nothing? Or throw an exception? And should we even bother at all? Probably not at this point. Later on, we may want to make that method public or add some other API to hide and show columns programmatically. Then we’ll have to solve this problem. For now, let’s bear with the obscure message and fix the class.

The change is not trivial. I had to add a hash map of removed columns so we can show them later. The key is the model index.

The action listener became a bit large, so I should either refactor it into an inner class or make it smaller. It’s not large enough to justify a class, so I’ll opt for another private method:

This is better.

We got it almost working! One last obvious thing is that we append the column to the end when we show it. This is not good. But where should we put it? The order of columns could have changed since we hid it. There is no perfect solution, so I’ll choose a simple one: put it where it belongs if the columns are not rearranged, otherwise put it somewhere reasonable. With this vague requirement we only need to test that the column reappears in its place if we keep the model order, and that it reappears at all otherwise.

Let’s start with the simple case.

And the fix is:

Now this can easily break if the model index is not a valid view index. How? Well, imagine that we removed some other columns and the column count is now very low. If the column removed was somewhere near the end, its model index may be very well outside the bounds now. Let’s test it.

The fix:

At this point I felt like this thing is going to work now. So I ran the demo I created last time. But as I tried to hide a column I got this:

The funny thing is, there are no our methods in this trace! That shows clear enough that TDD is not a magic silver bullet. Even though from our tests we expected that our code should work fine at least under normal circumstances, it crashes immediately. Why? After investigating a little bit, I think it’s because of a subtle bug in Swing. When we right-click on a column to show the menu, it thinks we’re about to drag that column. Indeed, dragging with right mouse button works, sort of. Sometimes it leaves the column floating in mid-drag. Then, after the column is removed, dragging breaks because the column has no valid view index (hence the index out of bounds exception).

There are different possible workarounds. We could try to install a custom table header that would override the getDraggedColumn method and return null if the column is hidden. But that would prevent users from using their own header. Of course, we could wrap the existing header into our own. But that would require delegating all of its methods to the wrapped instance, and there is a lot of them.

Another possible way is to consume the right click to prevent it from dragging anything. Alas, the default event handler is the first in the line. By the time we consume the event, it’s too late.

A really silly way is to just set dragged column to null whenever we hide it. It’s so simple and stupid it might actually work. Let’s try it.

Yay! It works, and with no visible glitches too.

At this point I’d like to conclude this self-educating tutorial. Of course, there is a lot of things still to be done, such as: provide a way to uninstall it, check what happens if we install it twice on the same table or try to reinstall on another, provide an API to show and hide columns from code, prevent the user from hiding the last visible column (or the header will disappear and it will be impossible to get them back). These are just the ones I can think of right off the bat.

For those interested to improve it or study the full history of its evolution, the code is available at

https://github.com/stachenov/jtable-column-selector

The point at which this tutorial ends is tagged tutorial-pt2.

Hide and show JTable columns

Swing is getting old, but is still widely used. To my surprise, it turns out that its JTable doesn’t support hiding and showing columns at user’s whim. Well, it’s time to fix that!

Note that a similar work has already been done. So the purpose of this post is mainly educational. I’m going to do it using TDD and keeping the code as clean as I can. But first, it’s time for some design.

What we need is a menu. So it looks reasonable to extend JPopupMenu. However, we need more than that. We also need some boilerplate logic that will bind that menu together with JTable. We can extend JTable to do that, but that doesn’t sound like a good idea because that would prevent anyone with their own derivatives of JTable from using our code.

So it looks like ideally we would like to have a class that we could instantiate and install on a JTable to handle all that logic. Perhaps it will use a separate class for a menu, perhaps some other classes. Let’s not bother with these details for now. Instead, we should think of a name. JTableColumnSelector sounds fine: the J hints that it’s a Swing class, and it openly tells us that it is used to select columns. Maybe it is not very clear that it hides or shows columns, but JTableColumnHiderShower just doesn’t sound right, and besides, shower is something entirely different.

Before I begin, I should mention that for TDD I’m using TestNG, AssertJ and Mockito. That’s my usual set of tools.

The first TDD iteration looks rather stupid: write a single line test with new JTableColumnSelector(), then make it compile by creating an empty class. At this point I’m making an important design decision: by choosing to use a no-args constructor, I’m making life easier for anyone willing to extend my class. Because I’m going to have a separate install method instead of passing a JTable directly to the constructor, it is guaranteed that install will only be called after the object is fully initialized.

Speaking of install method, we need another test:

What I like about Swing here is that it doesn’t really care that we’re calling its methods from a random testing thread. As long as it’s just one random thread, it runs just fine. What I don’t like about it, though, is that this very same feature makes it fail mysteriously at random moments when you call its methods from a wrong thread, thus violating the rule of repair terribly. Useful for testing, dangerous in production, as it often happens. Ideally, there should be a way to control this behavior, something like -Djavax.swing.allowCallsFromAnyThread.

But let’s get back to TDD. To make the test above pass, we need the appropriate method. And I also correct the constructor javadoc while I’m at it:

Now we need to test that it does what it should do. What should it do? Well, for one thing, it must create a popup menu on the table header, so let’s test it:

It fails. Good! Now let’s fix it:

Now we need to check that the menu contains… what? Obviously, a list of items. There should be as many of them as there are columns in the model. Wait, our table doesn’t have a model yet. So maybe here is where we should start using Mockito:

Here, I set A_REASONABLE_COLUMN_COUNT to 10. The test fails, but isn’t terribly readable, so I’m going to refactor it a bit first.

This looks a bit better. Now we need to make it pass.

OK, what next? I’m worried about two things now. First, the model might be null. Will getColumnCount() properly return zero or will it just throw a NullPointerException? And is it even a good idea to ask the table about column count? Shouldn’t we ask the model instead? What if some columns are already hidden by some other code? Should we display them in our menu? Let’s assume for now that we want to list all model columns. But then the code is wrong and we need a test that shows it.

Another thing I’m worried about is that we incorrectly created JMenuItems, while we should have used JCheckBoxMenuItem or whatever it’s really called. But that should become apparent later, when we start selecting menu items. So let’s deal with column counts now.

It fails. Cool. Let’s fix:

Now we have a real problem if model is null. We need another test for that:

Hmm… It passes! Why? Oh, I forgot that the model can’t be null! The table creates a default empty model for that case. Good. I hate nulls. But then we need to rename our test. installsProperlyWhenTableHasDefaultEmptyModel is a bit too long, but descriptive enough, so I’ll keep it.

Cool. Now let’s get back to install test. We need to check that all menu items have the right labels. But first we need to make our mock return that labels. Unfortunately it isn’t terribly easy to do with Mockito. No, wait, it’s actually easy, but not very elegant:

Maybe I should have used a real model instead of the mock. But it doesn’t that bad, so I’ll keep it like this for now. Only refactor this ugly class into a nested static class.

Now we need a couple of helper methods to extract column names from both the model and the menu. The lists should be the same. I’m feeling functional, so these methods turned like this:

And the new test is:

This is appended to the end of install, but I’m not repeating everything again and again. The proponents of the one-assert-per test idiom are probably cursing me now, but I think I’m doing the right thing here: I’m still testing that this thing installs properly. If I need three asserts for that, so be it!

Of course the test fails, and with a clear message too except that it’s too long. So I’ve changed A_REASONABLE_COLUMN_COUNT to 3. Now we have to fix it the test. Just one line has to be changed:

And now for the last piece of installation. We need to check that all of the menu items are selected. We need another helper method for that. Or maybe I’ll refactor this one:

And the test is now:

Oops! Looks like JMenuItem has isSelected method too, just like JCheckBoxMenuItem or whatever. Well, for now let’s just fix the test:

This goes into the loop of the install method. OK, what about the wrong class? We could just continue and then let it surface later, perhaps during manual testing. But I find it rather silly. Since I’ve noticed it already, why not fix it now? Changing JMenuItem to JCheckBoxMenuItem everywhere in the test seems to do the trick. Now the test fails with a clear message: javax.swing.JMenuItem cannot be cast to javax.swing.JCheckBoxMenuItem. Cool.

That’s it for today, except that I want to see how it looks on the screen, so I create a very simple demo:

Aaaaaand… it works!

menu

Next time we will add some logic to make it really work and do what it’s supposed to.

Getting started with JavaFX 8 custom controls

I need to develop a custom control for JavaFX 8. Unfortunately, most of the tutorials concentrate on the FXML way to do it, but I need to code in some custom painting.

How would I do it in Swing? Extend some base class and override paint. That’s it. In JavaFX, the right way seems to be overriding two classes: the control itself and the skin. OK, this actually looks like a good idea: the control is responsible for behavior, and the skin is responsible for the painting. So let’s look at the skin API:

What? Where is the paint method? According to the docs, getSkinnable() simply returns the associated control, dispose() detaches the skin from the control and getNode() “Gets the Node which represents this Skin”. What the…? So we have one node that is the control itself and another node which is the skin? I hope we don’t need to skin the skin, considering that it’s a kind of node itself!

After looking at some examples, I got the general idea. The skin is just a bunch of nodes, and getNode() just returns the root node. If you want to really customize your paining, you can always use a canvas as a skin. But I decided to try to use some shape nodes instead.

OK, I can create some shapes, put then into a Group, for example, and then what? The skin obviously needs to handle resizing. But how does it know when to resize exactly? I could just subscribe to the control’s width and height properties (and unsubscribe in dispose). But that feels ugly. Still, Han Solo himself does exactly that, so maybe it’s the right way after all?

After trying a lot of various things, I still couldn’t get it right:

  • If I just put my shapes into a Group, the control doesn’t resize properly.
  • If I put my shapes into a Group and inherit from SkinBase instead of implementing Skin, the control does resize, but…
  • All shapes are centered and I can’t position them. Looking at SkinBase sources, turns out it’s hardcoded.
  • If I draw a vertical line of length exactly equal to the control’s height, the control automatically increases its size by one pixel at each repaint. So if I keep resizing it horizontally, for example, it keeps growing vertically forever.

All of that didn’t make any sense. After further studying SkinBase sources, I got a feeling that a skin acts like a layout manager. That is, it’s responsible for managing the relative positions of its children. It is done by applying the appropriate transformations the result of which can be queried by calling getLayoutX() and getLayoutY() on the components.

Another thing is that SkinBase cheats around getChildren() being protected in the Control class. That allows it to directly manipulate the children of the control—no Group needed.

So in the end I concluded that:

  • A skin is best implemented by inheriting SkinBase.
  • To add components, just call getChildren().addAll(children).
  • To position the components needed to draw the skin, override layoutChildren. From it, call layoutInArea for every child that needs to be positioned.
  • All shapes should be drawn in an imaginary coordinate system that is tied to the shape itself. If you need a line, you might as well start it from (0, 0). layoutInArea will move it to the required position anyway, so the lines (0, 0)–(10, 10) and (10, 10)–(20, 20) will look exactly the same in the end.

The resulting control prototype is this:

The resulting graphics:

bvn_prototype

As you can see, it resizes nicely and the lines are positioned exactly as I want them.

P. S. Further prototyping revealed that it still resizes randomly sometimes, especially as I update values and/or resize window with lots of controls in it. The reason is that by default, SkinBase calculates preferred width/height based on preferred widths/heights of its children. The problem is that preferred width/height of a primitive equals to its actual size (since it’s not directly resizable). Therefore, once a control is resized, its preferred size is now different. If it was the same size as other controls before that, not only it’s no longer the case, but the preferred sizes are different, so layout gives different sizes to different controls. This is repeated on each resize, which leads to a funny “rich get richer” scenario where bigger controls are given more and more space because their preferred size is greater. This issue is fixed by overriding computePrefWidth/Height to return something sensible.

Replacing javadoc for a Maven artifact

I was playing around with a library depending on the JMS API. It downloaded geronimo-jms_1.1_spec-1.1.1.jar as a dependency. Unfortunately, this JAR goes with a javadoc JAR that is virtually empty! It is actually there, but I couldn’t find anything useful in it. And NetBeans IDE insists on displaying javadoc from nowhere else.

Turns out it is quite easy to replace this abomination with docs from the Java EE SDK:

  1. Download the JDK.
  2. Pack the contents of the glassfish4\docs\api dir into a ZIP (the root must contain the package-list file).
  3. Rename it to geronimo-jms_1.1_spec-1.1.1-javadoc.jar.
  4. Move it to %USERPROFILE%\.m2\repository\org\apache\geronimo\specs\geronimo-jms_1.1_spec\1.1.1\.
  5. Nuke the geronimo-jms_1.1_spec-1.1.1-javadoc.jar.sha1 file there, just in case somebody checks the hash and finds out it’s wrong. Or recompute it and edit the file if you feel like it, but it seemed to work fine for me without the file.
  6. Restart the IDE and enjoy the well-written docs.

Of course, it should work just as well with any random JAR file. Of course, Maven can re-download the file, but why should it? Unless you move to another version, everything should be fine.

TestNG and NetBeans: testing exceptions with externalized messages

It all started with two things that should have been obvious for me for a long time:

  1. Every message in a program should be localizable. For Java programs, that means externalized using a resource bundle. And that includes exceptions. Even if those are never shown to users (it’s debatable whether it’s a good or bad idea), keeping them separated from code is still a good idea. For one thing, it makes documenting error messages much easier!
  2. Every unit should have a unit test. In the extreme case, adhering to XP/TDD principles, those should even be written before the actual code.

Now, I’ve been doing unit testing for quite a long time using JUnit, but I never got around to externalizing messages because our software is used internally (and nobody cares). Now for my open source project (ain’t giving a link because it’s nothing there yet) I decided to try TestNG because of its very nice parametrization of tests, and at the same time externalize messages (because who knows, maybe someone someday will want to translate it).

Turned out that these two things don’t mix easily. Here is a short summary of what needs to be done to get everything running. I’m using NetBeans and Maven, but the approach itself may be applied to any setup that uses TestNG and resource bundles. If you’re experienced with your IDE and TestNG you may wish to skip to step 4, where I explain how to use annotation transformers for testing localized exception messages.

Step 1. Configuring NetBeans

This one is tricky for current NetBeans version which is 8.1 and current Maven Surefire plugin version which is 2.19.1 . Assuming you’ve got everything installed (bundled Maven will do), let’s start by creating an example project. Press Ctrl+Shift+N and select Maven—Java Application (the simplest project that can be created). Choose location, sensible group id like name.yourdomain or name.yourdomain.testproject, sensible artifact name like test, and a sensible package name similar to group id, like this:

nb1 nb2

Now switch to the files tab and create a new folder under src called test. Create a subfolder in it called java:

 

nb3

Now it should look like this:nb4

Switching back to the projects tab, you’ll now see the Test Packages element in the tree:nb5These steps were needed to get on with the TDD approach, creating the first test before coding anything. So far it is all easy as long as you follow the Maven conventions (that’s why it’s src/test/java and not something else, this is just a magical incantation). Now create a package under Test Packages with exactly the same name as the source package (name.tachenov.test in my case). Create a class there, say, MyClassTest or MyClassNGTest (both names will allow you to use Ctrl+Shift+T to navigate between MyClass and the test, once both are created) which should look like

Now it doesn’t even compile for two reasons: there is no MyClass, and, which is far more serious, NetBeans has no idea what @Test is. So ask it to create a stub for MyClass inside the source packages (which is a nice feature available since NetBeans 8.1) and add dependency for TestNG.nb6

nb7

Note that I set Scope to test because we obviously don’t need a testing library unless we’re testing. Now this step is somewhat tricky because if you used NetBeans to generate tests in a regular way (code-first) it would automatically add the TestNG dependency… only it would pick up some version NetBeans thinks is right. Well, turns out you need slightly better than that! The version that NetBeans picked doesn’t work quite right with other tools we need.

Now add import for org.testng.annotations.Test and hit Alt+F6. You should see something like

nb9

OK, TDD step 1 complete! We’ve got a failing test. Now we have to fix it! Once you do something like this the test should run fine:

At this time, even if you screw something up, the test will probably run fine nevertheless. Things start to get weird a bit later.

Step 2. Testing exception message

Now we’ve got to break the test again:

Let’s fix it:

Now you have this bad feeling about copy-pasting a message like that. Not only it’s in our code, but it’s in two different places! What if we want to change it? What if we want to translate it? Although Java provides a weird getLocalizedMessage() method for exceptions, it seems to be a real pain to get it working because there is no way to set that localized message! So might as well just localize the non-localized message, even though it feels weird, but that’s what Brian Goetz himself actually recommends!

Step 3. Externalizing the message

Without touching the test, let’s externalize the message in our class. That’s allowed by TDD since it’s the refactoring step: we aren’t changing any logic, we only moving things around.nb10

In the internationalization dialog, press Select, type in a reasonable name, say, “exceptions”, and then press Create New:nb11

Now enter a reasonable key for the message, say, MyClass.objectIsNull, and press Replace and Close.

nb12

After a bit of further refactoring (extracting constants, adding imports) you should get something like this:

One last step is to move the newly created exceptions.properties from src/main/java to src/main/resources, under the same package that it was. Your file structure should look like this:

nb13

This is needed because Maven looks for resources in src/main/resources, but NetBeans created our bundle in src/main/java. So we fixed that and run our test again. We see that it passed, so it’s time to refactor the test.

Step 4. Making the test work with externalized messages

Our first idea may look like this:

We face two problems here. One is that EX_OBJECT_IS_NULL is private. Well, no big deal—just make it package private. Feels weird exposing internals just like that, but package private is still private, and it’s just a constant, so no harm done!

The other problem is far more serious. We get the “element value must be a constant experession” error. Makes sense, since we need it at compile time! But how on earth…? That’s what I asked on Stack Overflow. Thanks to the quick answer I got there, I was able to implement a very nice solution. Like the answer says, create a new annotation type. I call it ExpectedExceptionMessageKey.

And create a class called, say ExceptionRegExpTransformer in our test package.

Now our test should look like this:

I intentionally left expectedExceptionsMessageRegExp there, but changed the message. This allows us to check whether the new way of testing really works. Turns out it isn’t! Well, no big surprise since we never told TestNG to actually use our annotation transformer! At this point, it would be really nice to have it injected magically in our class by using some sort of @AnnotationTransformer annotation on the transformer class. Alas, there is no such magic! Or at least I haven’t found any. So we have to configure TestNG manually through the Surefire plugin. Open pom.xml under Project Files in the Projects view. Insert something like this after </dependencies>:

Whew. This doesn’t look neither elegant, nor short, nor intuitive. For some reason NetBeans stops auto-completing tags under properties, so you actually have to type that by memory, although by that point it’s intuitive (property/name/value—makes sense). The argLine tag is only needed to avoid an annoying warning about encoding.

The important part here is version 2.18.1! At the time of writing the latest version is 2.19.1. NetBeans 8.1 uses 2.10 by default for whatever reason. But if you specify 2.19.1, you get no beautiful green test results window! That is a known bug. Another known bug is that if you use an older version of TestNG (see step 1), for example, 6.8.1 that NetBeans 8.1 uses by default, you may get a “test skipped” window with exceptions in the output window saying that your transformer class can’t be loaded. It doesn’t mean that’s there’s a problem with your class! It means that those versions don’t work well. By experimenting, I found that TestNG 6.9.10 with Surefire 2.18.1 work perfectly.

Step 5. Parametrizing messages

We’re almost done. But sometimes we want our message to contain parameters. Java provides a reasonable way to do it by using MessageFormat. Let’s break our TDD thing here and start with modifying the class (or just consider it refactoring):

And the resource bundle is now

The double quotes are needed to avoid actual quoting of the {0} string, otherwise the message would be literally “The {0} parameter must not be null”, which is obviously not what we want. This is a confusing syntax, but it’s there due to historical reasons.

Now if we run our test, we find out that our “refactoring” broke something. But we didn’t break the class. We broke the test! It is now expecting the message to be exactly what is in the resource bundle. At this point we need to decide whether to test that the message looks like what it’s supposed to look like or that is exactly what it’s supposed to be. What if our test method itself is parametrized? What if we test for IllegalArgumentException and pass various invalid values? Do we need to check that the message actually contains the invalid value? We probably do, but that is, unfortunately, impossible to achieve with annotation transformers. The reason is they do just that—transform annotations. And that happens only once. So if our method is called 200 times, it will have the same annotation over and over again. So it would expect the same message. In this case, maybe it’s simpler to just do try-and-catch and check the message manually with assertEquals (and add fail in case there is no exception).

But if we need to check that the exception looks like what we expect it to look, then there is a solution. To do it right, we really should have two annotations, like @ExpectedExceptionsMessageKey and @ExpectedExceptionsMessageFormatKey. And the second one should really parse format in a manner similar to how MessageFormat.applyPattern does it. But we can probably get away with just this ugly hack for some time:

When will it break? Well, obviously when a real message, not a message format, contains something like {0} or a double quote. But, really, how often do we see those? And if it ever happens, we can easily fix it, probably by another ugly hack.

Another case when it will break is if the format is something more than just an argument number. But then again, even though we often see those in UI messages, exception messages are usually much simpler, along the lines of “Value {0} is invalid” or “Couldn”t open {0}: {1}”.

But if you feel like it, by all means, do it right! I’ve shown you a great tool, it’s up to you how to use it!

Given 12 coins, find one that is lighter or heavier

This is a problem of a well-known type: you’re given some coins (balls, rings), one of them is lighter, heavier, or maybe it can be either lighter or heavier. You need to find that one using no more than some specified number of weighings. In this particular example we have 12 coins, one of them is forged and may be heavier or lighter, we don’t known which. We need to find it in no more than 3 weighings.

In simpler problems, like when you’re given 8 balls and one of them is heavier, and we’re limited to two weighings, we can often succeed using brute-force approach. Try splitting them 4/4, then you quickly figure out that you can’t find one among 4 in one weighing. Well, just try a 3/3 split and then you’re done.

With 12 coins it is not so simple. You could guess that 4/4/4 first split is a good idea, but then you could spend hours trying to figure out the rest. So a smarter approach would be useful. And this is exactly what I’m about to show. But first let’s think about bounds.

Lower bound of the number of weighings

What is the lower bound of the number of weighings for this problem? There are 12 possible answers and each weighing gives us one of the three possible results. This leads to a ternary decision tree that represents our algorithm. To minimize the number of weighings we should minimize the height of the tree by balancing it out. The height of such tree is \lceil \log_3 12 \rceil=3. Makes sense. But only we assume that our algorithm only tells us which coin is forged without actually determining whether it’s lighter or heavier. Such an algorithm is hard to imagine, and if it gives us the complete answer, then there are total of 24 answers. Not that it changes the lower bound, though, as \lceil \log_3 24 \rceil is also 3.

Note that it is only a lower bound. It may or may not be actually reachable, and that’s much harder to prove (but the proof does exist with exact numbers). Obviously if we can find an algorithm that works, then it’s reachable. But if we can’t find one, it doesn’t mean that it doesn’t exist. Unless we’ve actually brute-forcibly tried all of them. It isn’t that hard for small problems, and I’ve actually done it for 12 and 13 coins. Turns out the lower bound is reachable in case of 12 coins, but in case of 13 coins it’s only reachable if we don’t need to know whether the forged coin is lighter or heavier. So even though that \lceil \log_3 26 \rceil = 3, we can’t build an algorithm that finds all 26 answers.

A heuristics for building the right solution

How do we produce an algorithm for a given problem without brute-forcing it? I hereby present a heuristics that gives three hints as to what to do next.

U -> L, H, G; L -> G; H -> G

Possible coin transitions

Let’s start by partitioning coins into four groups. Initially all coins belong to the group U, which is the group of coins we know nothing of. Then there are groups L and H which contain coins that may be lighter (but not heavier) and vice versa. If we weigh some coins from U and they don’t balance out, then the heavier pile goes to H and the lighter one goes to L. The rest of the coins, which did not participate in the weighing, are obviously genuine so they go to the last group, G. When we weigh suspicious coins with some genuine ones and they balance out, then the suspicious coins go to G no matter which group they belonged to.

Our goal is to maximize the information gained by each weighing. And since we have no idea what the result of the weighing will be, we should try to maximize worst-case results, much like in the minimax algorithm.

But how do we measure information gained? Obviously each coin promoted from one group to another is something. But if a coin jumps from U to G directly, it’s even better than a coin going from U to L or H, or from L or H to G. So here is the first hint:

We should pick a weighing that maximizes the minimum of N_{U \rightarrow L} + N_{U \rightarrow H} + N_{L \rightarrow G} + N_{H \rightarrow G} + 2N_{U \rightarrow G} for all possible results of a weighing. In other words, each coin jumping from U to G gives us two points, and all other movements give us one points each.

The second hint is somewhat obvious. Whenever you’re facing a weighing that may only give you two results instead of the three (e. g. it’s impossible for the coins to balance out), you’re probably doing something wrong, unless it’s the last weighing.

The third hint is less obvious: if, at some point, you hit a decision tree branch that allows you to figure out the answer in less weighings than the given limit, you’re probably doing something wrong.

The last two hints are actually the two sides of the same coin, no pun intended: they both tell you that you’re going to create an unbalanced decision tree. In a well-balanced ternary tree most non-leaf nodes should have all three children, and the tree should ideally have the same height everywhere. Of course, that depends on the exact problem: if you’re allowed to do 100 weighings for 12 coins, you probably end up with lots of branches much shorter than that. But for three weighings it’s really desirable to balance everything out.

The last two hints are mostly redundant. If you’re squeezing as much information as possible, you would probably create a well-balanced tree anyway. So they just serve as additional warnings. But sometimes they simplify the math.

Back to the problem

So let’s try to build a good algorithm for 12 coins and 3 weighings. A 6/6 split is a bad idea because of the second hint. That’s how it simplifies the math, so we can instantly skip to an n/n split, where n<6. Since the situation is symmetrical for now, we don’t need to consider lighter/heavier results separately, so that leaves us two possible results of the weighing:

  1. The chosen coins balance out, therefore jumping right to G. The rest still belongs to U. That gives us 4n points (2n coins jumping from U to G).
  2. They don’t balance out. Therefore the lighter group goes to L, the heavier one to H and the rest straight to G because we know that the forged coin is among the weighed piles. That gives us 2n+2(12−2n) = 24−2n points.

The first expression increases with n, the second one decreases. If we find the point where they are equal, it’s obvious that the minimum will be less towards both directions from there: to the left 4n will be less, to the right 24−2n will be less. That gives us the following equation:

4n = 24−2n, 6n = 24, n = 4

Then we have two cases to consider.

Four coin piles balance out

The easiest case is when the coins balance out, so we now have just 4 U-coins. Intuitively, one may think of comparing two of them. If they balance out, we compare one of them with one of the remaining ones. If they balance out again, then the remaining one must be the forged one, but we don’t know whether it’s lighter or heavier. If the third weighing doesn’t balance out, then the third coin is the forged one (and we now know whether it’s lighter or heavier). If the second weighing doesn’t balance out, then we again compare one of them with one of the remaining ones, this time identifying not only which one is forged, but also whether it’s heavier or lighter. But there was still one case when we couldn’t do it. So let’s try to use our heuristics.

It makes no sense to compare good coins with good coins, so the only way we may end up putting good coins on the balance scale is to compare them with suspicious ones. This means that only one pan should contain good coins (if both of them do, we can just remove the minimum number of good coins from both pans). That means that we can have n1 U-coins on the left pan and n2 U-coins and n1-n2 G-coins on the right pan. Here, n2 can be zero (comparing suspicious coins with good ones), but then n1 can’t be 4 because all 4 U-coins can’t possibly balance out with 4 G-coins, and the second hint tells us it’s probably a bad thing. n1+n2 also can’t be 4 because of the same reason. So we have three cases now:

  1. Balancing out: 2(n1+n2) points for coins now promoted to G.
  2. The left side is lighter: n1+n2 points for promoting n1 coins to L and n2 coins to H. 2(4−n1−n2) points for promoting the rest to G.
  3. The left side is heavier: n1+n2 points for promoting n2 coins to L and n1 coins to H. 2(4−n1−n2) points for promoting the rest to G.

Note that even though cases 2 and 3 give equal number of points, the resulting group configuration is different if n1 ≠ n2. Now let n=n1+n2, then we have the following equation:

2n = 2(4-n), 3n = 8, n = 2 2/3

Let’s pick the closest integer value n = 3. Note that n1 and n2 by themselves don’t really matter, so let’s pick n1=3, n2=0. Going back to three cases:

  1. Balancing out: three coins are genuine, which means the remaining one is forged. We even have a spare weighing to determine if it’s lighter or heavier. Here is where 13 coins case breaks (the algorithm up to this point is the same): we can still determine which one of the remaining coins is forged, but with probability 1/2 we won’t know whether it’s lighter or heavier.
  2. Three U-coins are lighter: it means that the forged coin is among them and it’s lighter. Comparing any two of them will give us the answer.
  3. Three U-coins are heavier: the same as case 2, but the forged coin must be heavier.

See how using our heuristics allowed us to build a much simple algorithm than one may come up with intuitively, and how it’s also more powerful?

Four coins do not balance out

This case is much harder because we now have 4 L-coins, 4 H-coins and 4 G-coins which is a mess of possible combinations. Let’s pick n1 L-coins and m1 H-coins for the left pan, and n2 L-coins and m2 H-coins for the right pan, possibly adding some n1+m1−n2−m2 G-coins. Obviously we can’t use all L and H coins or else we can’t possibly balance out everything. The three cases now are:

  1. Balancing out: n1+m1+n2+m2 points for new G-coins.
  2. Pan 1 is lighter: m1+n2 points plus 8−n1−m1−n2−m2 points for not-participating coins. Where did m1+n2 points came from? Well, m1 coins were from H-group and now they are on the lighter side. The forged coin can’t possibly be both heavier and lighter, so it’s obviously then that these m1 coins are all genuine. Ditto with n2.
  3. Pan 1 is heavier: m2+n1+8−n1−m1−n2−m2. The same reasoning.

We now have three expressions and must do our best to balance them out. Let’s put it this way:

n1+m1+n2+m2=m1+n2+8−n1−m1−n2−m2

n1+m1+n2+m2=m2+n1+8−n1−m1−n2−m2

Subtracting one from another we get m1+n2=m2+n1. Let k denote this, that turns the first expression into 2k and the second and the third ones into 8-k. By solving 2k=8-k we get 3k = 8, much like in the previous case. And again, we have some freedom, but we can’t pick n1=3, m1=0, n2=3, m2=0, for example, because n1+n2=6 and we only have 4 L-coins. So let’s pick n1=3, m1=2, n2=1, m2=0. We’ll also need to add all 4 genuine coins to the right pan.

  1. Balancing out: we’re now left with 2 H-coins. Finding out which one is forged is a piece of cake.
  2. The left pan is lighter: we’re left with 3 L-coins. The right pan did not contain any H-coins, so it’s obviously that the forged one is among those 3. Again, it’s trivial to solve.
  3. The left pan is heavier: we’re left with 2 H-coins from the left side an 1 L-coin from the right side. Comparing the two H-coins we will instantly know which one of them is heavier and therefore is the forged one, in case they don’t balance out. If they do, then the forged one must be the remaining L-coin.

Problem solved! We didn’t even have to try different variations (but you can, and you’ll get correct algorithms). And we also didn’t have to use the third hint.

Encodings, Unicode and broken code

This is another sad tale of character encodings. Consider this LeetCode problem that asks to check whether the given strings are isomorphic. Isomorphic strings being defined as strings of the same length with a bijection mapping between the characters. For example, “aba” is isomorphic to “ava” with mapping a \leftrightarrow a, b \leftrightarrow v and “mlm” with mapping a \leftrightarrow m, b \leftrightarrow l, but not to “aaa” (no bijection since both “a” and “b” are mapped to “a”).

Now consider possible solutions in Java. One obvious solution:

Runs in 36 ms, certainly not the fastest submission. One way to “optimize” it:

This runs in 12 ms. Three times faster! Beating 92%! And here is yet another version:

Now, let me ask a question: which of the solutions above is the best one?

The last one is something an English speaker with C background might come up with. It will obviously break for any characters outside US-ASCII, including Cyrillic, Chinese, Hebrew or even German or Irish (because of the umlauts and fada). So obviously it’s not acceptable.

The second one is trickier. One thing is that it might break if one of the strings contains NUL characters because we abuse NUL as the “character not mapped” special value. Another thing is that initializing the whole array with zeroes takes \mathcal{O}(65536) time which could make it a poor choice for short strings.

So it looks like the first one is the best, right? It scales nice to any lengths, and even though it’s slower, it handles NULs properly.

Well, the answer is: all of them are wrong! One test case that none of the solutions above will pass is “ab”, “冬b”. In case you can’t see it, here is a picture:

kanji

That’s right, that one weird Chinese character is enough to break all of the solutions above. Moreover, it breaks LeetCode testing system as well (just like Cyrillic or anything non-ASCII does) and LeetCode Discuss forums too (unlike Cyrillic and many other non-ASCII symbols). Why? What’s wrong with that particular character? Java stores strings using Unicode, right? That’s why char is two bytes, after all! So it should be able to handle any characters without any problems! The dark age of terrible national encodings is over!

In order to understand it, we must look back at the history of encodings and Uncode.

It all started in 1960s or even long before that (Morse code came into existence long before the first computer). But it’s in 1960s that all hell broke loose. In 1963 both ASCII and EBCDIC were introduced. While even EBCDIC is apparently still in use today, it’s ASCII that became widespread, and the fact that ASCII was a 7-bit encoding meant that there was one “free” bit and 128 unused codes in the 128-255 range. That, and the lack of any letters except basic Latin, immediately gave birth to a myriad of various national encodings. Worse, multiple encodings were sometimes used for the same languages. I know of four Russian, for example: code page 866 (“MS-DOS” encoding), code page 1251 (”Windows” or ANSI encoding), KOI8-R (a really weird encoding that arranges letter according to English alphabet, not Russian one, was really widespread in the early days of Russian Internet) and the “standard” ISO-8859-5 that was rarely used at all. This is still a major source of various troubles, as when you run a program in a console window, you have no idea which encoding will be used and therefore you have about 50% chance of getting garbage (less in practice because most programs will use the MS-DOS encoding). And nobody plans to fix it because it is impossible and because nobody cares about console windows nowadays.

Chinese and Japanese people got it even worse: 128 values are obviously not enough to represent about 2000 ideographs in Japanese (and that’s only a subset of Chinese!), so they went ahead and invented two-byte encodings, which made things much worse because now, having some bytes, you couldn’t even determine the string length if you had no idea which encoding is used.

Then Unicode came into being. The first standard was published in 1991 and it introduced a 16-bit encoding intended for universal use, which included all characters deemed reasonable. Unfortunately, the bunch of Old Evil Encodings didn’t disappear at the very same moment, so the only thing that really happened that day is that the world now had one more encoding to deal with. No, wait, make it two encodings because Unicode defined characters as 16-bit units, but those can be represented with bytes using either Little Endian or Big Endian order.

Even worse, Unicode apparently failed to consider some important characters like rarely used ideographs (like that 冬), even though they are a part of personal names and names of places. Imagine you can’t type your own name as you’re trying to use some software! So apparently some extension was needed. That is how Unicode transformed from a single 16-bit encoding into a whole standard of concepts and encodings.

The core concept is the code point. A code point is a 21-bit number corresponding to some character, typically represented as a 32-bit integer in memory and as something like U+00B0 in writing, where 00B0 is the hexadecimal of the code point (in this case it’s the degree sign: °). The current range for the code points is U+0000–U+10FFFF, hence 21 bit (but it’s extendable). So, you see, to say that Unicode is a 16-bit encoding is wrong in several ways: Unicode is a standard (defining multiple encodings), not an encoding, and not all Unicode encodings are 16-bit.

The code points defined in the first Unicode standard now belong to the so-called Basic Multilingual Plane (BMP), and that includes code points in the range U+0000–U+FFFF. That is Latin, English, Arabic, Hebrew, most Chinese and Japanese and lots of other useful things. However, there are some Chinese symbols outside the BMP, which belong to the so-called Supplementary Plane, and the code points U+10000 and above are called supplementary code points (or characters).

There are three main encodings in the current Unicode standard. By “main” I mean that they are both part of the standard and are widely used. These are:

  1. UTF-8, which is a variable width character encoding, where a code point can be represented by one to four bytes (to six bytes if we ever need code points above U+200000). Good thing about it is that NUL byte is only used to represent the NUL code point, so UTF-8 strings can be NUL-terminated. Another good thing is that ASCII characters are represented by single bytes identical to their ASCII representation.
  2. UTF-16, which is also (surprise!) a variable width character encoding, where a code point can be represented by one or two 16-bit code units (which, in turn, can be represented by two bytes using either BE or LE byte order, that makes UTF-16LE and UTF-16BE). BMP code points are represented by one code unit, supplementary code points are represented by the so-called surrogate pairs, which consist of the first (high) surrogate and the second (low) surrogate. The high/low concept doesn’t really have anything to do with byte ordering here, they encode higher and lower bits of the code point, and the high surrogate always comes first regardless of the byte order.
  3. UTF-32 is a fixed width character encoding where each code point is encoded as a single 32-bit number (which, again, makes it UTF-32LE or UTF-32BE depending on the byte order).

As you can see, UTF-16 is pretty messed up, and if you consider it a fixed-width character encoding, you may end up in trouble. In fact, when I finally figured out all this, I started to think that UTF-16 is outright evil: it doesn’t have the nice properties of UTF-8 (like NUL-termination and ASCII compatibility) and its only advantage over UTF-32 is lower memory consumption, but with modern amounts of RAM it shouldn’t be a real problem any more. And the fact that it’s a variable width encoding screwes up almost any text processing algorithm you can think of. Here is a correct solution for the mentioned LeetCode problem, for example.

It’s certainly not as efficient as the others, but it’s the one that really works (and no, you can’t say it works unless it handles all possible inputs correctly). Some useful String and Character methods include:

  • String.codePointCount: returns the number of code points between the specified indexes. This is the true length of the string (not the number returned by String.length).
  • String.offsetByCodePoints: “adds” two indexes together, when one index is a char index and another one is measured in code points, returning the resulting char index. For example, if you have the string “冬b”, then offsetByCodePoints(0, 1) would return 2 because “b” is located at index 2, not 1. A call to offsetByCodePoints(2, 1) would return 3 (the end of the string) because “b” is only a single code unit. This method is kind of reversed version of the previous one.
  • CharSequence.codePoints: returns an IntStream of code points.
  • Character.codePointAt, Character.codePointCount: same as the String method, only for character arrays.
  • Character.highSurrogate, Character.lowSurrogate: return the respective surrogate for a given code point.
  • Character.isHighSurrogate, Character.isLowSurrogate: for a given code unit, check whether it’s a part of a possible surrogate pair. This is very important method for many cases when you need to be able to distinguish surrogate pairs from BMP characters. For example, StringBuilder.reverse uses it to properly reverse a string contains surrogates (because they obviously don’t need to be reversed).

On top of that, many methods have two variants: one accepting a char, other accepting a code point. Those accepting chars should really be deprecated because they actually encourage writing buggy code.

Considering all that, we must conclude that while Unicode indeed made life much easier than it was in the Dark Ages, it must be handled properly unless we want to enter another dark age where a person may fail to register an account on some site simply because he happened to have a supplementary character. Or wait a minute. We have already entered it. Now we must get out, so we all better start writing bug-free code!

Best Time to Buy and Sell Stock IV

Another awesome problem on LeetCode deals with buying and selling stock. You are given a list of prices, are allowed to only open long positions and you must close one before opening a new one. This would be trivial (buy on low, sell on high), but you’re limited to a total of k buy/sell transaction pairs. You need to return the maximum profit.

At first I thought it was a dynamic programming problem. Quick peek at the tags confirmed that suspicion. I quickly realized that subproblems require to determine the maximum profit for d days, starting with 2 (can’t make a profit on one day because only one price per day is given). Moreover, it is quite obvious that we have to determine maximum profits for limited numbers of transactions in range 0..k. This requires only k memory because you only need the profits for the previous day to compute the tomorrow profits. You need to consider, though, that you may have left your position open, so some potential profit may exist. In this case you also have to record the opening price.

In the end it looked like this:

Not very elegant, but pretty straightforward. profitClosed[j] is the maximum profit that may be made by today with j buys/sells, while leaving no open position. profitOpen[j] is the same thing, but with open position and openPrice[j] denotes the best buying price so far, so we can instantly gain profitOpen[j] + current price − openPrice[j] by selling today. And that’s what we do, but only if the resulting profit is more than we could gain by performing less transaction. When we do it, the number of elements in the profitClosed array becomes greater than the number of elements in profitOpen, so we immediately open a new position on the next loop.

Then we update our profits in the inner loop. We re-close an open position if we can get a better profit and we reopen if we can get a better profit plus potential profit! That is important because that’s what an open position is about: if we can instantly get that much profit, it would be a waste to reopen at the current price even if we can get a better profit today, it will still bite us in the future.

This solution runs from 4 to 6 ms depending on the weather on Mars and the number of holes in the cheese. For example, replacing > profitClosed[lClosed] with > 0  in the pre-loop close-another-position condition speeds it up (branch prediction?) even though we open not very profitable positions.

There are much better solutions using the same idea. This one is particularly awesome, as it does both loops in just four lines.

However, I didn’t particularly like the DP idea here. It felt like this problem should have a better solution, so I kept on digging. After seeing this one, I decided to do the same thing in Java:

Not terribly concise, but pretty straightforward. The idea is that we calculate the profit that we’ll lose if we do nothing on a particular day, thus reducing the number of transactions. Then we keep on throwing away those days starting with those that give the least profit. Obviously throwing away days on monotonous intervals won’t affect the profit at all, so we don’t even add those to begin with. We are only interested in “peaks” and “valleys” (plus possibly the first and the last days).

Maybe replacing indices arrays with a linked list was a bad idea. It’s definitely worth to try arrays instead. As it is, it runs in 20 ms. Not so terribly efficient. But I don’t think switching to array will improve it that much because the main slowdown here is the tree. Even though I only add peak/valley days and do it only if transaction reduction is needed, it is still a pretty slow thing. And it still felt like not the totally right thing.

Then I saw this solution based on another one. And I must admin, it really took me a while to figure these out. Especially loop invariants. So I hereby present my own implementation of the same ideas that runs in 3 ms \mathcal{O}(n) (if we assume quickselect is linear, which is a fair assumption since randomization provides a very high average case probability).

And here comes the explanation.

stocks

Consider this price chart. Let’s consider only closing prices (80 for the first day, 70 for the 2nd, 75 for the 3rd and so on). For a “bullish” day (green) the closing price is the top of the candle, for a “bearish” one it’s the bottom. If we were to aim for the maximum profit, we’d perform the following transactions:

  1. Days 2–6, prices 70–100, profit 30.
  2. Days 8–11, prices 50–80, profit 30.
  3. Days 13–17, prices 20–60, profit 40.
  4. Days 19–21, prices 40–50, profit 10.
  5. Days 23–27, prices 30–70, profit 40.
  6. Days 29–31, prices 10–100, profit 90.

The total profit is 30+30+40+10+40+90=240.

Now, we are limited to some number of transactions. If that’s only one, then the best we can get is the last transaction, that is, 90. If two, then the best is to buy on the 13th day, sell on the 27nd (profit 50), buy on the 29th and finally sell on the 31st for the total profit of 140. Note that the first transaction is not even among the list of transactions we’d perform if we weren’t limited. That is because the first three transactions (green runs on the chart) are best united into a single one. So we need to figure out which transactions to unite.

Consider the following problem. For given intervals [v1, p1], [v2, p2], v1 < p2, v2 < p2, where “v” stands for a valley and “p” for a peak, what are the possible relationships between them? There are six:

  1. They don’t overlap, and p1 ≤ v2. For example, days 13–15 and 15–17 with prices [20,40], [40,60].
  2. They don’t overlap, and v1 ≥ p2. For example, days 2–6 and 13–17 with prices [70,100], [20,60].
  3. They do overlap, and v2 < p1. For example, days 13–17 and 23–27 with prices [20,60], [30,70].
  4. They overlap, and v1 < p2. For example, days 2–6 and 8–11 with prices [70,100], [50,80].
  5. The second interval is fully included in the first one. For example, days 13–17 and 19–21 with prices [20,60], [40, 50].
  6. The first interval is fully included in the second one. For example, days 23–27 and 29–31 with prices [30,70], [10,100].

I’m not totally rigorous here about the strictly-less/less-than-or-equal thing. It doesn’t matter, though, because corner cases when something is equal to something else may be handled as either—they are kind of “at the border” between the two and belong to both sets. For example, if two transactions have exactly the same price range, then it doesn’t really matter whether you think of the second one included in the first one or vice versa. Or you may even want to consider them to be overlapping.

Now we need to ask ourselves a question: if we have two transactions and are allowed to make only one, how to get the maximum profit? Let’s consider all six cases.

Two transactions: both giving 20, both on the increasing interval

Case 1: transactions (1) and (2) are combined into (3)

The first case is not really possible for adjacent transactions if we only consider “peak—valley” transactions to begin with (the end of the second transaction is not really a peak, and the beginning of the second one is not a valley). Indeed, if two intervals form a monotonous non-decreasing sequence, then why bother splitting them in two intervals at all? However, for transactions far apart, this case is still possible (although not in our example). Anyway, in this case the answer is quite obvious: just combine two transactions into a single one, buying on for v1, selling for p2.

Case 2: one transaction gives 30, another 40, but it's way below the first one, so we can't combine them

Case 2: between transactions (1) and (2) we pick (2) because it’s more profitable

The second case is also obvious: just pick up the most profitable one. We can’t unite them because the starting price of the first one is greater than the selling price of the second one. If you buy on the 2nd day and sell on 17th, you’ll have 10 loss instead of any profit.

Two transactions overlap and the best profit is from the start of the first one to the very end of the second one

Case 3: two overlapping transactions (1) and (2) are transformed into one long transaction (3) and an imaginary short transaction (4)

The third case is the most tricky one! Since the lowest price is v1, and the highest price is p2, then the best is to combine them into a single transaction. This sounds like it increases the transaction count tremendously (as we have to consider all possible combinations), but in fact it doesn’t. We do a really amazing trick here: instead of considering them as two separate transactions to begin with, we instead think of them as one “long” transaction (buying at v1, selling at p2) and one “short” transaction (selling at p1 and buying later at v2), even though short transactions aren’t technically allowed by this problem! This works because we pick transactions starting with the ones giving us the most profit. In this case, the long one is guaranteed to give us more profit than the short one, so we’ll either pick the long one (without violating anything) or pick both. And since both transactions give us exactly the same profit as two long transactions we had to begin with (p1-v1+p2-v2=p2-v1+p1-v2), then it’d appear in the net result as if we performed two separate long transactions.

Two transactions overlap, but the combined one gives less profit

Case 4: between transactions (1) and (2) we choose the most profitable (in this case any of them) because the combined transaction (3) is the least profitable

The fourth case is trivial: since the combined transaction is the least profitable, we just pick the most profitable one of the two. In our example they are equally profitable, though.

Two transactions, the second one has higher valley and lower peak

Case 5: the second transaction is included in the first, so the first one gives the best profit we can get

The fifth case is even more boring: the first transaction has both the lowest price and the highest price, so we just pick the first one.

Two transactions, the second one has the lowest and the highest prices

Case 6: the second transaction gives the most profit

The sixth case is the mirror image of the fifth one. Just pick the second transaction.

So we have one case when we should combine transactions unconditionally, two cases when we should choose the most profitable of the two, two cases when the most profitable one is obvious and one tricky case when we transform two long transactions into one long and one imaginary short one. Now to the algorithm.

The algorithm preserves the following invariant. When an outer loop iteration finishes, the valleys/peaks stack contains transactions that are related according to case 5 above, with the latest transaction on the top. The invariant is obviously true at the beginning since the stack is empty, and any proposition is true for the elements of an empty set (all humans that have visited other galaxies have green ears—there are none at the moment of writing, so I’m not wrong in saying that “all” (zero) of them have green ears).

The invariant is preserved by these two loops:

The first loop just pops transactions with v1 > v2, which corresponds to cases 2, 4 and 6. In all these cases we don’t combine transactions, so it’s fine to just pop them and consider separately. The termination condition guarantees that v1 ≤ v2 at the end of that loop, assuming there are any elements left in the stack.

The second loop handles the tricky case 3. We pop a transaction, then generate a “short” transaction and put it into the profits array. Then we set the current valley to the one popped from the stack, so the current transaction now corresponds to the long one. Then we continue the loop because it may or may not be possible to combine it with the previous one and so on. Note that the termination condition guarantees that p1 > p2 at the end of the loop, assuming there are any elements left. Assuming the invariant held true at the start of the outer loop, popping some elements could not have increased the valley at the top of the stack because it may only decrease as we go deeper.

Note that we don’t consider case 1 separately. Instead, we treat it as a special subcase of case 3 where the “short” transaction gives us negative profit. Since we’re going to pick only the highest profits anyway, this is fine. One may think that it may have negative impact on the total profit in the case where k is high enough to allow all transactions to complete, but in fact it may not. This is because two adjacent transactions can’t form case 1 anyway, remember? So when we have transactions like this, it means that they are separated by some transactions in the middle. And if we’re going to pick every profitable transaction, then we can’t really combine those two because we aren’t allowed to engage in multiple transactions. So this negative profit corresponds to the profit loss caused by the fact that we need to sell first in order to buy again.

Lastly, since case 5 corresponds to the items in the stack, we pick top k profits using randomized quickselect and sum them up.

One last note before we get to the example: the loop may generate one false peak/valley pair at the end of the input array if the prices list ends with a valley. This corresponds to a zero-length transaction giving zero profit, so instead of checking for this corner case we can allocate one more element in the profits array to hold this zero. It won’t affect the result in any way.

Now let’s see how the algorithm works on our example. Our valleys/peaks are: 70–100, 50–80, 20–60, 40–50, 30–70, 10–100.

Step 1 (70–100). The stack is empty, so the inner loops don’t execute. The stack becomes (bottom-to-top)

Step 2 (50–80). The first loop pops the transaction because 70 > 50 (case 4). The stack becomes empty, therefore the second loop doesn’t execute. The next transaction is pushed into the stack.

Step 3 (20–60). The first loop pops because 50 > 20 (case 4). The rest is just like the previous step.

Step 4 (40–50). The first loop doesn’t work because 20 < 40. The second loop doesn’t work either because 60 > 50. Case 5.

Step 5 (30–70). The first loop pops only [40,50] because 40 > 30 (case 6), but 20 < 30. The second loop then transforms [20,60] and [30,70] into [60,30] (added to the profits) and [20,70] (becomes the current transaction). This is case 3.

Step 6 (10–100). The first loop pops because 20 > 10 (case 6). The second loop doesn’t work because the stack is now empty. The last transaction is pushed into the stack.

Lastly, we pop the stack and we now have

A total of six transactions! Now for the different values of k we have:

  1. Just pick up the last transaction.
  2. Pick up 50 and 90, where 50 corresponds to 20–70 produced on the 5th step (buy on the 13th, sell on the 27th).
  3. Pick up 50, 90 and any of the 30s, where 30s can be one of the first two transactions or the imaginary transaction that splits our 50 into 20–60 (days 13–17) and 30–70 (days 23-27), which gives 40+40 = 50+30.
  4. Pick up 50, 90 and any two of the 30s.
  5. Pick up 50, 90 and all of the 30s (meaning 50+30 now definitely means two real transactions 40+40).
  6. Pick all of them (the trivial unrestricted case).

Isn’t this awesome?

Closest Binary Search Tree Value II

Continuing on to the next interesting problem that took me a while to solve because I never actually done iterative tree traversals before, although I was aware that there is such a thing and knew the general idea how to do it.

The problem is to find k values in a BST that are closest to the given target. The target is a double, and the nodes are integers so the tree may or may not contain the target itself.

The linear solution is so trivial that I didn’t even try to do it. Indeed, just perform an in-order traversal, keep the last k values in some sort of ring buffer array and terminate when the next value is worse than the worst so far.

What was interesting is how to do that faster. Or not really faster because the test cases seem to be tailored for the linear solution, but still, how to get better time complexity?

The idea is pretty obvious. We need to find the target or some value close to the target (previous or next, doesn’t matter) and then look in the neighborhood for k closest values. But how do we look in the neighborhood. If we do recursive binary search, then we’ll lose all information about where we find it once the recursion returns. The best we can get is a reference to the found node, but no way to get back up. So it looks like we need iterative binary search.

The iterative binary search itself is very, very easy. But we also need some way to keep track of where we are, so we need a stack. The first part would look somewhat like this then:

This locates either the target or the next value before or after the target. So now what? And here is where I got lost. A typical iterative in-order traversal would look like this, if starting from the root:

However, the stack here and the stack I got from the binary search above is not the same stack! Here we save on stack space by only pushing previous elements if we know we’ll have to get back to them eventually. And we know that it’s only when we go left, because when we go right, we won’t have to return to the already processed elements. So when we hit a dead end when going right and pop the next element from the stack, it magically takes us to the parent of the current subtree, not the parent of the current element.

For example, consider the following tree.

tree

If we traverse this tree using the algorithm above, we first push left children into the stack until we hit null. That gives us the following pictures:

tree1tree2

tree3

And then we start popping elements and processing them (the second branch). The first element is a leaf, so it doesn’t have a right child, and therefore on the next iteration we pop another one, thus processing -1, -2:

tree4tree5

The current element (-2) now has a right child, so we go there instead and push it onto stack before going left. However, since there is nowhere to go, we immediately pop it back and process it:

tree6tree7

Now look at this! The current element is -1, but the stack only contains the root! So the next thing we do is pop it and jump all the way back up, just right to the next element in the sequence.

However, when performing a binary search and pushing elements to the stack, we never get a stack that looks like this. In fact, the stack always contains every element in the sequence leading up to the root, node-by-node. So at first glance, the stack I got by performing binary search, was not very suitable for this in-order traversal algorithm. Indeed, if I were looking for, say, -1.5, then I’d end up with something like this:

treebHere the green element is the current element at which the search stops. In fact, I now have the two closest elements to the target: one is at the top of the stack, another one is the leaf I’m at. However, if I was to perform an in-order traversal using the algorithm above, I’d quickly end up in trouble. What would the algorithm do? It would first push -1 to the stack. Then it’d pop and process it. Then it’d try to go right, but there is nowhere to go. So it’d pop another element from the stack. But that’s -2, which doesn’t come in the right order!

Now at this point, what I was supposed to do, and what most people at LeetCode did, is to create two stacks instead of one when performing the binary search. Then the whole thing would look like this:

Now why does this work? It works because the two stacks follow the same rule as the stacks used during the classic iterative traversal: when we do in-order traversal, we only push elements to the stack if we are going left, and that’s exactly what stackGT follows here, so it can be later used to perform an in-order traversal starting at the point where we stopped. The same goes for stackLE, but for reverse in-order.

But as I’ve said, I didn’t quite get it at the time, so I invented a rather unorthodox approach. I noticed that even though the stack I got was unsuitable for the typical space-saving concise iterative algorithm, it was just the type of the stack used for recursive solutions! Indeed, recursion always unavoidably stores everything in the stack simply because there’s no way around that. Consider a typical recursive algorithm:

How does it work for the tree above? It pushes 0 first. Then it goes left. There it pushes -2. Goes left again. Pushes -3. There is no left, so it processes -3 and tries to go right. But there is no right, so it pops -3 and goes up. So far it’s no different from the iterative approach.

But then things change. When it returns to -2, it processes it without actually popping it and then pushes -1. So at that point we end up with exactly the same stack as in the binary search algorithm. Why is that? How is that it works and the iterative algorithm doesn’t?

The answer is that the iterative algorithm lacks one thing: the return address. Remember that the computer doesn’t only push function arguments to the stack. It also pushes the return address, so when it pops the stack, it instantly knows what to do next: process the current value (if returning from the left subtree) or exit (if returning from the right one).

So it looks like I could add some imaginary return address to the stack in order to fully emulate the recursive algorithm. But I could do even better than that. When I pop the stack, I can compare left and right references of the popped node to the current node. If the left reference equals to the current one, that means I am returning from the left subtree, otherwise I am returning from the right one.

Note that I still need to perform two traversals: in-order and reverse in-order to locate the closest elements. So even if I have just one stack during the binary search, I need to make a copy of it to proceed further. The resulting solution was this:

It isn’t as beautiful as the “right”, but has the same time complexity and is a fun one. Runs in the same time (6 ms). I have no idea why most solutions only run for 5 ms, but then again it’s probably within the margin of error.

Either way, now I’m more than familiar with both recursive and iterative traversal algorithms and can even emulate one through the other, so that’s one step further towards becoming a better programmer.