Defense in Depth or Hack?

There’s an odd piece of code you sometimes see in applications (in pseudo-code):

if (GetFocus() == this)ShowCaret();elseHideCaret();

This piece of code checks to see if this UI control has the focus, and makes sure that the caret is blinking appropriately depending on whether it has the focus or not.

The reason the code is odd is that most UI systems, from Windows through iOS through the browser, will send events to tell a control when it gets the focus or when it loses the focus. So the control simply needs to process these events and do the right thing in response. It is silly to have code that independent of these events verifies that the focus state is actually what the control thinks it is.

At the same time, you have probably seen some text box blinking its cursor and yet not respond to typed input (I run into this with the Windows logon screen frighteningly often). Clearly the control has gotten confused about whether it has the focus or not and doesn’t have this kind of “defense in depth”. Or you can say it doesn’t have this hack.

There really is a general pattern here and a general philosophical argument. You have some object or component that wants to maintain some internal state. It maintains this state by receiving a series of events that it uses to construct and transition its own internal state. That state might be quite simple (a single bit in this case of tracking focus) or it might be quite complex.

Alternatively, the state can be constructed by polling or querying some other component in the system that is the “source of truth”. At some interval or in response to some generic “get-synced” notification, it goes through a negotiation to ensure that its internal state correctly matches the source of truth.

In the focus example, I am essentially doing both. That seems wasteful! In this case, from a performance perspective, it clearly seems cheaper to just get the event-based approach right. The central focus manager simply needs to send two events when the focus changes, one to the control losing the focus and the other to the control gaining focus. That has to be way more efficient than having an entire tree of controls all peppering the system with queries about whether they own the focus.

In fact, the code above is normally invoked at some “heavy-weight” time (e.g. in response to the entire application getting activated or in response to an explicit mouse click). So the performance cost isn’t really relevant. In fact, if you consider the large consequences of getting this wrong (serious user confusion), the cost seems well worth it.

So you get into this discussion of whether this is a hack just covering up the root cause bug — that you should really find and fix — or is a reasonable defense in depth strategy.

I personally am a much bigger fan of having components in a system always have a robust way of validating that their model of other state is correct and consistent rather than only depending on change notifications. If you have read any of my other posts on model-view synchronization you will see a pattern here. Depending on granular notifications tends to result in systems that are much more tightly coupled, both from a performance perspective and from a logical correctness point of view. Finding ways to more loosely couple components is almost always a better strategy.

OK, now I come to the real point of this post.

Apple CarPlay in my Audi A3 is driving me crazy. About a third of the time, when I get into the car, the CarPlay screen will automatically be displayed on the car’s internal screen. Another third of the time, Audi’s main menu will be displayed but with the CarPlay menu choice selected. I then need to explicitly choose CarPlay to get it to display. And then another third of the time, CarPlay won’t be displayed or even shown as one of the top-level menu choices in the Audi menu. The behavior doesn’t seem to correlate with whether the phone is plugged in before or after the car is turned on. At that point sometimes plugging and unplugging the phone works and sometimes it doesn’t.

If I had to guess, I would think that there is some event being signaled but a race condition as the Audi car electronics initialize results in the event being lost. In this case, there is a very high user cost to getting these states out of sync (phone plugged in, CarPlay not displayed) with no easy way of synchronizing them. I will admit to being in motion at times while also fiddling with the car phone jack in order to get it to recognize the phone. Not too smart and not too safe. The approach of adding some mechanism that just verifies consistency every second or so would be vastly more reliable than whatever mechanism they are currently using at minimal performance cost.

This is also yet another interesting example of the end-to-end argument at work. There are undoubtedly multiple layers involved from the electronics of the jack up through Audi’s car operating system and then interacting with CarPlay on the phone. One of those components is behaving unreliably. Critically, there is no end-to-end validation that an important end-to-end property is maintained (CarPlay available when phone is plugged in). Treating it as an end-to-end property that must be capable of being validated rather than as a series of fragile event notification behaviors that you need to debug through a complex set of layers is a way better strategy for building this system.

This also puts you on the moral high ground invoking grand end-to-end principles rather than getting into petty arguments over what is a hack or not.