Small things leak the worst

Bad things tend to creep in unnoticed. Big problems are easy to spot and developers tend to avoid them. But not enough attention is paid to the details. Lets consider quite trivial time series data case. After all time series are quite fashionable today, internet of things and all.

Let`s pretend that there are some kind of sensors that measure data across the globe and report back to some centralized servers. If the sensor measures values with constant phase, the measurements could be represent like this:

measurement-class.png

This saves space because time is not coded with every data point. Nice and easy. Data could be processed using following kind of loop.

for (int i=0; i<measurements.values.length;i++) {
  int v = measurements.values[i];
  DateTime t = measurements.startTime.plusMillis(measurements.interval);
  // Do something useful with value v at time t.
}

The code has subtle knowledge of how data is encoded. It might not feel big thing, the code works. However this small problem prevents introducing more random measurements that are not done between some constant interval. Well the code does not encapsulate the data, the internals are there to see. Data could be hidden like this.

measurement-class-private1.png

But just hiding the fields is enough. The loop would not have changed a bit. The code still exposes how data is handled. The clients needs to count the actual moment of the measurement using interval and start time. We can even provide a method to actually count the moment and hide some more information.

measurement-class-private2.png

Now the loop looks like this.

int max = m.getCount()
for (int i=0; i<max; i++) {
  int v = m.getValue(i);
  DateTime t = m.getMoment(i);
  // Do something useful with value v at time t.
}

But what if the amount of measurements changes during the iteration. What if we are required to follow some measurements near real time? Of course we can implement another class and use separate code for the this "special" situation or we could split the real time stream to small (array) parts hand handle them. This would introduce some latency, but it would work. There are lots of ways to wiggle around the problem by adding code.

I argue that the problem is that we are trying to use the same interface for two separated concerns: reading and writing. When we read the data we care about the actual time and the value. How data is stored is not the callers concern. Is there a array or not? It should not matter. On the other hand when data is written we might want to encode the data differently depending on the sensor. If we could separate these things, maybe the problem diminishes. Fortunately with object oriented wizardy for example following kind interface can be conjured.

measurement-class2.png

With this kind of abstraction the reading code does not need to know anything about how the data is encoded.

MeasurementsReader r = measurements.newReader();
while(r.next()){
  int v = r.value();
  DateTime t = r.moment();
  // Do something useful with value v at time t.  
}

But maybe having to implement the loop at all is too much of a burden for the caller.

measurements.read( (t,v) -> {   /* Do something useful with value v at time t. */   });

So, there are many possible solutions for reading the data. Which one is the best?

Unfortunately, it depends. Point here is not about hiding the index variable usage or the "clever" iterator-based solution nor call back lambdas, but to encourage people to think and then think some more. This time separating the read and writing code seems to lead somewhere. In other cases that might be the wrong way to go. However in the long run small details will add up, if attention is not paid to the details. If there is small lapse there and minor inconvenience here, the code starts to turn in to a mess and nobody notices it until it starts to be too late.

Oh.. and of course, the idea of splitting write and read concerns means that we can have multiple interfaces to read the same data if that is required.

Panu Wetterstrand