This. Always this. Since the dawn of programming and still a major problem.not doing any error checking
had no notion of atomicity of transactions.
I do keep meaning to start a thread (very niche though I think) on scientific/programming concepts that translate well to everyday life. I was taught it as a 'single indivisible action' and is extremely useful when turning on gas, starting to run a bath, going out and locking the door and the like.Atomicity is a critical part of any data processing system. There are certain kinds of database update which, while they might consist of several steps, need to operate on an all-or-nothing basis, ie. if a part of that collection of updates fails, the database should be rolled back to its initial state, and an error reported, rather than leaving the database with the transaction half-done (as a roving DBMS troubleshooter, I encountered dozens of cases, in all kinds of programming language, where this basic Databases 101 principle was not followed). The way you do it with any halfway decent DBMS is that you group the steps into a "transaction". You tell the DBMS you're starting a transaction ("BEGIN TRANSACTION" in SQL), do the steps, which are cached in the DBMS until you send the "COMMIT", at which point they are applied all together to the database. If you want to manually roll back, you can issue a command to do that, and most DBMSs will automatically roll back an uncommitted transaction if there is some sort of failure (eg a programme crash) before the COMMIT is sent. Another reason for checking those error states very carefully.
So, having gone down a few rabbitholes, I've learned some things about Horizon.
A few thoughts.
- It was written for NT 4.0, and up until 2015 at least, was running on the same platform.
- The programming language was Visual Basic (remember, this would be late-90s VB).
- The message passing system they used to mediate communication between branches and the central system was one that had no notion of atomicity of transactions.
I was in the full pomp of my IT career in the late 90s - maturing across to becoming a database designer/developer/admin on fairly large Unix systems (typically Sun or AIX). I come from a pretty solid Unix background, and - yes - I shared the general attitudes of my peers to things like NT. In Unix, pretty much from the start, reliability and security are baked in, in a way that means that, particularly in the early days of NT, it couldn't hold a candle to Unix on reliability OR security - its history was littered with comparatively trivial exploits, because it was comparatively easy for malware to gain access to the privileged part of the kernel; on reliability, NT types consider maintaining an uptime of days or weeks as an unusual achievement - NT shops I knew of would try to ensure that servers were rebooted daily, to avoid system crashes and unpredictable behaviours.
NT also didn't scale very well - again, Unix has the idea of machine-agnostic operations baked in, while NT did get some distributed stuff over time, but it was always a bit of a bolt-on. And an overloaded NT system tends to fail unpredictably (to be fair, overload failure cases for Unix systems aren't exactly beautiful, but tend to fail somewhat more gracefully than NT, whose usual failure mode was a segmentation fault and the blue screen of death).
Visual Basic is not a language that ranked (or ranks) very highly in terms of programming environments for large systems (and let's be clear here - Horizon was a LARGE system). It lacks all kinds of important features. In all the use cases for VB (and this is VB Classic, not the rather more competent .NET version around nowadays, which I probably still wouldn't touch with a bargepole) that I encountered in my career, the primary consideration was "ease of development", because it came with an integrated development environment, and "well, everybody knows BASIC, right?".
Things that BASIC doesn't have or do - consistent interfaces with external libraries - the language isn't really extensible in the way that most of the more solid offerings tend to do; a decent exception-handling model (typically, in BASIC, you're going to be testing error states explicitly rather than integrating your error handling into some kind of exception framework. That makes code complex to review and debug, and doesn't do a very good job of differentiating, for example, input errors vs aspects of system failure (communication link failures, synchronisation errors, inability to access subcomponents such as databases.
Atomicity is a critical part of any data processing system. There are certain kinds of database update which, while they might consist of several steps, need to operate on an all-or-nothing basis, ie. if a part of that collection of updates fails, the database should be rolled back to its initial state, and an error reported, rather than leaving the database with the transaction half-done (as a roving DBMS troubleshooter, I encountered dozens of cases, in all kinds of programming language, where this basic Databases 101 principle was not followed). The way you do it with any halfway decent DBMS is that you group the steps into a "transaction". You tell the DBMS you're starting a transaction ("BEGIN TRANSACTION" in SQL), do the steps, which are cached in the DBMS until you send the "COMMIT", at which point they are applied all together to the database. If you want to manually roll back, you can issue a command to do that, and most DBMSs will automatically roll back an uncommitted transaction if there is some sort of failure (eg a programme crash) before the COMMIT is sent. Another reason for checking those error states very carefully.
Another big fat gotcha I encountered a lot was people just assuming that their database update had worked, and not doing any error checking around that (probably on an "I'll get to that later" basis, as - particularly with something like VB, adding that error checking adds a lot of cruft and clutter to the code, and as we now know, many of Horizon's developers were barely competent*)
I haven't been able to find out which DBMS Horizon was using, but I'm willing to guess, on the strength of the choice of NT and Visual Basic, that they were using some version of SQL Server, Microsoft's in-house DBMS. I never liked administering it much, but the general view is that it was a solid enough platform, so they probably got that right, at least.
But - particularly given the question of atomicity - even if the DBMS supported transactions, which SQL Server does, we're talking about a distributed system here. The database ran on Horizon's servers; the workstations at the post offices just generated "transactions" (not DBMS transactions) which were then piped over the ISDN line to the servers, using a message passing library. In principle, with some decent design (and proper error checking, FFS), this could be made to work. But the problems you have when you are generating information on one system, and then updating a central system with that information, make doing local database updates look like a walk in the park.
While it is unlikely that there would be conflicts in data between the post offices (although there WERE errors that arose as a result of the system confusing updates from multiple workstations within a single post office), there are many more factors that come into play.
First, you need to be absolutely sure that your transaction went through, and handle it appropriately if it doesn't. So why wouldn't a transaction go through? Typically, this would be as a result of a communications failure - a temporary outage on the ISDN line, perhaps even some local networking issue. But another factor with Horizon is that every post office in the country did its reconciliation at the same time on a Wednesday afternoon, with the result that there would be heavy peak traffic just as everyone was updating their accounts for the week. Remember I said that NT doesn't scale all that well? Here we are. There's a front-end server somewhere, running NT, and suddenly receiving thousands of updates as each workstation sends a bundle of data to it. Most inter-system communications protocols will have some kind of timeout, so that a transfer that isn't reported as completed after a certain time (30s, 60s, something like that) returns as an error. Now, here is where it gets messy. You've got a developer sitting in their office writing code to do this kind of update. Their "local" workstation is sitting on the same network that they are, and that will be reliable enough for the dev to forget that there is even potential for an error. As for scaling - even with the best intentions, organising some kind of "stress test" where you put the server under high load is a complete PITA, and uses up huge amounts of time and resources - being able to do that kind of thing was, in my experience, a bit of a luxury. And, if you're inexperienced in working on distributed systems, you might not even perceive that there could be a problem. I think this is borne out by some of the statements at the inquiry, where subpostmasters would describe the system "freezing" without explanation, and then recording their multiple attempts to try and get the transaction submitted as multiple separate transactions. So it seems pretty likely that Horizon was handling communication failures badly, if at all.
There's loads more I could get into, but this is a long enough post already. Anyway, my curiosity about how the system worked is partly assuaged...
* barely competent - I recall encountering a fair few VB programmers in my career, and the bulk of those tended not to be what we might think of as "serious" developers, whatever they considered themselves to be. They typically had zero experience of remote, multi-user, multi-server environments, let alone working with databases (actually, that's not true - I saw a lot of CVs claiming "database experience" which turned out to be solely using the execrable Access). Plenty of my colleagues knew VB, and used it for simple, kludgy things, but wouldn't do most of their work in such a language - a pretty good tell for a dubious programmer would be to see if they had any programming experience outside VB.
Also, we have to remember that the world of software development HAS changed over the last 25 years. With the emphasis on cloud services, handling communications effectively is much more of a given than back when Horizon was being developed, when techie types would still be at the stage of creaming their drawers when they'd managed to get one system to display "hello world" on another system's display. Or maybe that was just meJust to add to what existentialist is saying, and to preempt the potential counter-argument that lots of scalable and reliable complex enterprise systems are built on Windows services these days, I have had a few interactions with Fujitsu, and it is no surprise at all that a software dev culture sustained mainly to exploit the continued gravy train of servicing legacy green screen HMG megasystems would struggle with anything new and shiny.
Also, we have to remember that the world of software development HAS changed over the last 25 years. With the emphasis on cloud services, handling communications effectively is much more of a given than back when Horizon was being developed, when techie types would still be at the stage of creaming their drawers when they'd managed to get one system to display "hello world" on another system's display. Or maybe that was just me
I love it when you talk dirtySo, having gone down a few rabbitholes, I've learned some things about Horizon.
A few thoughts.
- It was written for NT 4.0, and up until 2015 at least, was running on the same platform.
- The programming language was Visual Basic (remember, this would be late-90s VB).
- The message passing system they used to mediate communication between branches and the central system was one that had no notion of atomicity of transactions.
I was in the full pomp of my IT career in the late 90s - maturing across to becoming a database designer/developer/admin on fairly large Unix systems (typically Sun or AIX). I come from a pretty solid Unix background, and - yes - I shared the general attitudes of my peers to things like NT. In Unix, pretty much from the start, reliability and security are baked in, in a way that means that, particularly in the early days of NT, it couldn't hold a candle to Unix on reliability OR security - its history was littered with comparatively trivial exploits, because it was comparatively easy for malware to gain access to the privileged part of the kernel; on reliability, NT types consider maintaining an uptime of days or weeks as an unusual achievement - NT shops I knew of would try to ensure that servers were rebooted daily, to avoid system crashes and unpredictable behaviours.
NT also didn't scale very well - again, Unix has the idea of machine-agnostic operations baked in, while NT did get some distributed stuff over time, but it was always a bit of a bolt-on. And an overloaded NT system tends to fail unpredictably (to be fair, overload failure cases for Unix systems aren't exactly beautiful, but tend to fail somewhat more gracefully than NT, whose usual failure mode was a segmentation fault and the blue screen of death).
Visual Basic is not a language that ranked (or ranks) very highly in terms of programming environments for large systems (and let's be clear here - Horizon was a LARGE system). It lacks all kinds of important features. In all the use cases for VB (and this is VB Classic, not the rather more competent .NET version around nowadays, which I probably still wouldn't touch with a bargepole) that I encountered in my career, the primary consideration was "ease of development", because it came with an integrated development environment, and "well, everybody knows BASIC, right?".
Things that BASIC doesn't have or do - consistent interfaces with external libraries - the language isn't really extensible in the way that most of the more solid offerings tend to do; a decent exception-handling model (typically, in BASIC, you're going to be testing error states explicitly rather than integrating your error handling into some kind of exception framework. That makes code complex to review and debug, and doesn't do a very good job of differentiating, for example, input errors vs aspects of system failure (communication link failures, synchronisation errors, inability to access subcomponents such as databases.
Atomicity is a critical part of any data processing system. There are certain kinds of database update which, while they might consist of several steps, need to operate on an all-or-nothing basis, ie. if a part of that collection of updates fails, the database should be rolled back to its initial state, and an error reported, rather than leaving the database with the transaction half-done (as a roving DBMS troubleshooter, I encountered dozens of cases, in all kinds of programming language, where this basic Databases 101 principle was not followed). The way you do it with any halfway decent DBMS is that you group the steps into a "transaction". You tell the DBMS you're starting a transaction ("BEGIN TRANSACTION" in SQL), do the steps, which are cached in the DBMS until you send the "COMMIT", at which point they are applied all together to the database. If you want to manually roll back, you can issue a command to do that, and most DBMSs will automatically roll back an uncommitted transaction if there is some sort of failure (eg a programme crash) before the COMMIT is sent. Another reason for checking those error states very carefully.
Another big fat gotcha I encountered a lot was people just assuming that their database update had worked, and not doing any error checking around that (probably on an "I'll get to that later" basis, as - particularly with something like VB, adding that error checking adds a lot of cruft and clutter to the code, and as we now know, many of Horizon's developers were barely competent*)
I haven't been able to find out which DBMS Horizon was using, but I'm willing to guess, on the strength of the choice of NT and Visual Basic, that they were using some version of SQL Server, Microsoft's in-house DBMS. I never liked administering it much, but the general view is that it was a solid enough platform, so they probably got that right, at least.
But - particularly given the question of atomicity - even if the DBMS supported transactions, which SQL Server does, we're talking about a distributed system here. The database ran on Horizon's servers; the workstations at the post offices just generated "transactions" (not DBMS transactions) which were then piped over the ISDN line to the servers, using a message passing library. In principle, with some decent design (and proper error checking, FFS), this could be made to work. But the problems you have when you are generating information on one system, and then updating a central system with that information, make doing local database updates look like a walk in the park.
While it is unlikely that there would be conflicts in data between the post offices (although there WERE errors that arose as a result of the system confusing updates from multiple workstations within a single post office), there are many more factors that come into play.
First, you need to be absolutely sure that your transaction went through, and handle it appropriately if it doesn't. So why wouldn't a transaction go through? Typically, this would be as a result of a communications failure - a temporary outage on the ISDN line, perhaps even some local networking issue. But another factor with Horizon is that every post office in the country did its reconciliation at the same time on a Wednesday afternoon, with the result that there would be heavy peak traffic just as everyone was updating their accounts for the week. Remember I said that NT doesn't scale all that well? Here we are. There's a front-end server somewhere, running NT, and suddenly receiving thousands of updates as each workstation sends a bundle of data to it. Most inter-system communications protocols will have some kind of timeout, so that a transfer that isn't reported as completed after a certain time (30s, 60s, something like that) returns as an error. Now, here is where it gets messy. You've got a developer sitting in their office writing code to do this kind of update. Their "local" workstation is sitting on the same network that they are, and that will be reliable enough for the dev to forget that there is even potential for an error. As for scaling - even with the best intentions, organising some kind of "stress test" where you put the server under high load is a complete PITA, and uses up huge amounts of time and resources - being able to do that kind of thing was, in my experience, a bit of a luxury. And, if you're inexperienced in working on distributed systems, you might not even perceive that there could be a problem. I think this is borne out by some of the statements at the inquiry, where subpostmasters would describe the system "freezing" without explanation, and then recording their multiple attempts to try and get the transaction submitted as multiple separate transactions. So it seems pretty likely that Horizon was handling communication failures badly, if at all.
There's loads more I could get into, but this is a long enough post already. Anyway, my curiosity about how the system worked is partly assuaged...
* barely competent - I recall encountering a fair few VB programmers in my career, and the bulk of those tended not to be what we might think of as "serious" developers, whatever they considered themselves to be. They typically had zero experience of remote, multi-user, multi-server environments, let alone working with databases (actually, that's not true - I saw a lot of CVs claiming "database experience" which turned out to be solely using the execrable Access). Plenty of my colleagues knew VB, and used it for simple, kludgy things, but wouldn't do most of their work in such a language - a pretty good tell for a dubious programmer would be to see if they had any programming experience outside VB.
I'll read it to the lover next time she's here. Let's hope she sees it the same wayI love it when you talk dirty
Do it in a Serge Gainsbourg voiceI'll read it to the lover next time she's here. Let's hope she sees it the same way
Do it in a Serge Gainsbourg voice
I don't know VB.NET in any kind of depth, but the general view does seem to be that VB Classic (what Horizon would have been written in) is a very different animal from VB.NET.To be fair to it, I don't think modern VB.NET has very much to do with the command line interpreted version of our respective youthhoods. It evolved out of it, sure, but it's a very, very different beastie.
Public Function ReverseSign(d)
If d < 0 Then
d = Abs(d)
Else
d = d - (d * 2)
End If
ReverseSign = d
End Function
A = -B
Thank you so much for this! I've become quite curious about more and more of what went on with Horizon, particularly as it parallels the latter years of my own career in the industry. I almost wonder whether it might be worth splitting off a "Horizon tech" thread, if there were enough interest in the idea...?In an effort to try and bolster existentialist's excellent post, I'm struggling to find the link I read to it now, but I read a couple of weeks back that the backend DB for Horizon - such as it merits the term "DB" when it was clearly barely anything of the sort - was actually a proprietary thing cooked up by Fujitsu themselves, rather than a Known Good platform for distributed transactions. Missing features like atomic transactions and zero auditing of the manual edits to the transaction logs (posts passim from me and others) had already been solved problems in the UNIX world for decades and I'm sure there were windows systems to achieve the same sort of thing since at least the early 90s (even if windows did lag sorely behind the "big iron" UNIX of the era in a number of other aspects).
Chances are that even with the deficiencies of VB (likely picked because it was fairly easy to knock up a winforms GUI pretty easily) and NT (much cheaper than a Sun/AIX/HPUX) there could have been a reliable system here but... likely due to cost, it looks like an unproven DB technology was used and, coupled with less than perfect development, that made transaction integrity almost impossible to achieve in the real world.
Incidentally, in hunting again for what DB Horizon actually uses under the hood I re-discovered this technical-but-not-too-technical video/article which details some of the steps commonly taken in financial DBs and some of the horror-show bugs witnessed occurring under Horizon:
What went wrong with Horizon: learning from the Post Office Trial
This Post Office trial has revealed what is likely the largest miscarriage of justice in UK legal history. Hundreds of individuals who operated Post Office branches (subpostmasters) were convicted on fraud and theft charges on the basis of missing funds identified by the Horizon accounting system. Twww.benthamsgaze.org
Have got a bit of a way through that, it's really damning. The Post Office is just really criminally responsible because they lied to the courts relentlessly when they learned what had actually happened. Fujitsu though, criminally responsible because they lied to the courts relentlessly, too, but how embarrassed must they be too at seeing their beloved system taken apart and shown to be fucking uselessIn an effort to try and bolster existentialist's excellent post, I'm struggling to find the link I read to it now, but I read a couple of weeks back that the backend DB for Horizon - such as it merits the term "DB" when it was clearly barely anything of the sort - was actually a proprietary thing cooked up by Fujitsu themselves, rather than a Known Good platform for distributed transactions. Missing features like atomic transactions and zero auditing of the manual edits to the transaction logs (posts passim from me and others) had already been solved problems in the UNIX world for decades and I'm sure there were windows systems to achieve the same sort of thing since at least the early 90s (even if windows did lag sorely behind the "big iron" UNIX of the era in a number of other aspects).
Chances are that even with the deficiencies of VB (likely picked because it was fairly easy to knock up a winforms GUI pretty easily) and NT (much cheaper than a Sun/AIX/HPUX) there could have been a reliable system here but... likely due to cost, it looks like an unproven DB technology was used and, coupled with less than perfect development, that made transaction integrity almost impossible to achieve in the real world.
Incidentally, in hunting again for what DB Horizon actually uses under the hood I re-discovered this technical-but-not-too-technical video/article which details some of the steps commonly taken in financial DBs and some of the horror-show bugs witnessed occurring under Horizon:
What went wrong with Horizon: learning from the Post Office Trial
This Post Office trial has revealed what is likely the largest miscarriage of justice in UK legal history. Hundreds of individuals who operated Post Office branches (subpostmasters) were convicted on fraud and theft charges on the basis of missing funds identified by the Horizon accounting system. Twww.benthamsgaze.org
This reminds of my own deep dive into the software used by Atos for those wretched assessments. I found out a lot of horrifying things about the process, like it was based on a series of drop down menus and ignored anything reported on free text boxes. Where all the pertinent info was being included by patients on answer to mandatory questions...Thank you so much for this! I've become quite curious about more and more of what went on with Horizon, particularly as it parallels the latter years of my own career in the industry. I almost wonder whether it might be worth splitting off a "Horizon tech" thread, if there were enough interest in the idea...?
No, it's all gone a bit technical. I grew up listening to my dad talking about this stuff as he worked in databases, so I can follow a lot of it.I can't read all this tbh, wtf
Edit: aita?
It's a big can, and a bloody long road. I don't think it is going to be either quick, or anywhere near as much as it should be.And how long until they get proper compensation?
Done.This reminds of my own deep dive into the software used by Atos for those wretched assessments. I found out a lot of horrifying things about the process, like it was based on a series of drop down menus and ignored anything reported on free text boxes. Where all the pertinent info was being included by patients on answer to mandatory questions...
Maybe a software deep dives thread, a look behind the curtain sorting of thing?
I think I'm very jaded and cynical but I think the government will wait until more claimants have died. You can bet someone's done an analysis on it somewhere...It's a big can, and a bloody long road. I don't think it is going to be either quick, or anywhere near as much as it should be.
Quite the shower of cuntsThe phrase rhymes with Clucking Bell
Post Office said last month it stands by most Horizon convictions
Letter shows Post Office told ministers it would oppose attempts to overturn 369 prosecutionswww.theguardian.com