How can something with no moving parts be "out of service" so often?

So this post is kind of pointless really... I'm going to ask a question that thousands of Melburnians ask every day, and that we all know the answer to.  But I still feel the need to ask it anyway.

As the title reads:  How can something with no moving parts be out of service so often?

Of course I'm talking about the standalone myki readers at stations. Forget barriers and vending machines for a moment, though they both have their own problems. 

When myki first started in Melbourne I was quick to defend it.  It mostly worked well for me and it was more convenient than metcard.

But, at least in my observations, it actually seems to be getting worse.  Standalone readers at stations are quite often unresponsive or out of service.  In some cases, entire platforms, or even entire stations of readers are non operational.  Or readers appear operational but touching on causes them to crash and go out of service.  Or readers are very slow to respond... they still seem to suffer from a problem of needing to "wake up" for their first touch of the day (a fault that has never been fixed).

It never seemed this bad in the beginning, so the one conclusion to draw is that with the sudden surge in myki use (last number I heard was 70% of PT users are now using myki, and this number may well be higher now), readers simply aren't coping with all the extra "touches".

Case in point:  The reader on the right hand side as you enter Hawthorn Station Platform 1.  Here are three different examples:

Reader at Hawthorn out of service last night, 14/5/2012.  It was still out of service this morning.
The same reader at Hawthorn non responsive to touches from any card, 26/04/2012

The same reader again, crashing in response to an intermittently faulty card.17/02/2012. 

I think I've got other examples as well.  I've reported this fault to @mykimate on twitter who advises "our technicians have been notified".

But my rather obvious question is:  Why should they need to be?  How is this even possible?

These readers are solid state devices.  There are no moving parts.  They are (allegedly) designed to operate even when there is no network connectivity.  As long as there is power, they should work (though without network connectivity, your travel history may get delayed in appearing on the myki website, and online top-ups might not be available at that reader).

So how and why do they crash so often?  Why do they sometimes not respond to your card?

There are but two possible conclusions to draw:  Physical failure of the device, or shoddy and buggy programming of the reader software.

I can accept that out of the thousands of devices out there in myki land, that there might be a dry solder joint or two, a loose ribbon cable or dodgy network cable.  Perhaps somehow the rain got inside.  Vandalism.  These problems are unavoidable.

But there is simply no excuse for the obviously buggy and faulty reader software.  Why would a reader go out of service?  What could the reason be?  The only reason I can come up with, is that during the processing of card touches, the program encounters a fatal error.  One would assume that they at the very least have some kind of error handling routine, which shuts the reader down to the infamous "out of service" message.

And the only reason for this, is that the reader software isn't setup to deal with every single situation it might encounter in "real world" processing of card touches and transactions.

The myki reader has to do a lot of things when you touch your card to the reader.  Amongst some I can think of:
  • Check if your card is on the list of blocked cards.  If it is, and it hasn't been blocked, block it.
  • Check if your card has a topup waiting in the list of topups.  If it has, apply it.
  • Check if your card has an auto topup instruction on board.  If it does, and if your balance is low enough, apply the auto topup.
  • Check if your card is touched on or off.  Depending on the answer, and depending on if it was touched on or off in the last two hours or yesterday, it might have to perform a default fare touch off and then re-touch you on, it might have to charge you a default fare.  Or if you are touching off, it has to charge you the correct fare for the journey you've just taken.
  • it has to write details of your journey and topup to the card memory.
These are but a few of the things that probably happen when you touch on or off, or even top up at a vending machine.

So, it's obvious to me at least, that it's not set up to deal with every situation.  If it were, it would never or rarely crash, and reader reliablity would be rock solid.

In a system that is new, these issues are understandable.  But they've had years to get this right.  I don't have any idea what they are doing in there but, as gets pointed out time and again by many people, there is no excuse for any of this.  And as we all also know, these aren't the only problems myki has.

For all the money we have paid, the people of Melbourne and Victoria deserve better.

PS.  I will say that I've found Tram readers, at least, very reliable and fast lately.  Which is ironic considering you will never be able to top-up aboard a tram.  Bus readers I don't have much personal experience with, but anyone who reads twitter for five minutes will discover numerous "myki readers out of service on the bus again" tweets.  How the tram and bus experiences can be so different, I don't fully understand, when they use the same equipment.




Comments

  1. Twitter seems to think the tram ones are just as faulty. Matches my experience too.

    See: https://twitter.com/#!/search/realtime/myki%20tram%20%40mykimate

    ReplyDelete
  2. I reckon the reason tram readers are more 'available' is that, now that city saver is gone, and we have the Z1/2 overlap, they can simply 'default' to zone 1/2. Not to mention there are two consoles to talk to rather than just one on a bus.

    ReplyDelete
  3. Mykiuser, the problem is highly unlikely to be a software issue related to account management or pricing. It is is possible, but in general software that fails to correctly calculate these factors will return the wrong amount (as Myki periodically does) rather than die.

    More likely reasons:
    - The machine forgets where it is and can't provide useful information. Much more likely on a bus/tram, but since software/database updates need to be rolled out across the network, it is possible they periodically reset location. Any generic "out of service" message, is likely this or something similar.

    - Generic operating system failure that you'll be familiar with in other contexts: an out of memory error, or the smart-card hardware interface crashing for instance. A simple reboot would fix it, but someone probably needs to come and do that.

    - Network errors causing hangs/no responses. This is sort-of a software issue, in that they ought to be handled, and the whole process reset; but since the smart-card isn't self-powered and because drop-outs are probably common, knowing what to do if it dies at the very end of the process (post-write, pre-confirmation) is difficult for all sorts of reasons.

    - Network errors to the central server. I'm not sure if these should occur, because in theory they shouldn't communicate except overnight, but if they do in real-time, and if that is part of the touch-on process, then it introduces various issues with network timeouts for lag or central server load when scaling for peak users.

    The weak-point is definitely the smart-card chip. It needs to stay in the magnetic field, otherwise the process will start, stall, restart, ad infinitum. And it can fail for all sorts of reasons: scratches, bends, moisture (atmospheric conditions or pocket sweat), proximity, magnetic interference or a patchy wireless connection.

    ReplyDelete

Post a Comment