Urban75 Home About Offline BrixtonBuzz Contact

Massive worldwide IT outage, hitting banks, airlines, supermarkets, broadcasters, etc. [19th July 2024]

Because us Linux fanboys know that the Falcon monitor runs in kernel mode in Linux as well, and something similar is just as likely. It's the software, not the OS. Security software needs to run at kernel level to be useful, and that makes it dangerous to fuck about with.
That kind of thinking does usually stop Apple-cultists :D
 
Because us Linux fanboys know that the Falcon monitor runs in kernel mode in Linux as well, and something similar is just as likely. It's the software, not the OS. Security software needs to run at kernel level to be useful, and that makes it dangerous to fuck about with.

I'm not familiar with crowdstrike products (else I'd likely be fixing a thousand windows machines right now TBH) but Linux provides EBPF capability which is designed to stop bad kernel-mode code from completely carking the running kernel.

It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules. Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.

So, linux already has some form of defence against this sort of thing. IIRC microsoft were working on something similar for windows but I don't think it's mainstream yet.


Perhaps his methods are unsound.
 
I hadn't been aware of this until I saw this thread just now.

CBA reading 16 pages so can someone just say if stuff is fixed now / it's probably the start of the end of the world / something in between?
 
I hadn't been aware of this until I saw this thread just now.

CBA reading 16 pages so can someone just say if stuff is fixed now / it's probably the start of the end of the world / something in between?
It will be short term chaos that will be sorted over the next few days, once it has been there will be lots of talk of 'lessons learned' and how can it be prevented from happening again. There will be focus groups and management heartsearching but fairly quickly they will start realising just how much an effective solution will cost. So it will gradually get put off and forgotten about since it was a one-off that won't happen again until it happens again.
 
It will be short term chaos that will be sorted over the next few days, once it has been there will be lots of talk of 'lessons learned' and how can it be prevented from happening again. There will be focus groups and management heartsearching but fairly quickly they will start realising just how much an effective solution will cost. So it will gradually get put off and forgotten about since it was a one-off that won't happen again until it happens again.
They'll make choices that actually make it more likely.
 
It will be short term chaos that will be sorted over the next few days, once it has been there will be lots of talk of 'lessons learned' and how can it be prevented from happening again. There will be focus groups and management heartsearching but fairly quickly they will start realising just how much an effective solution will cost. So it will gradually get put off and forgotten about since it was a one-off that won't happen again until it happens again.
They might tot up how much it's cost them and decide that proper testing each update is more expensive.
 
It's interesting really, to look back over the life of the Internet and see how, quite regularly, some big corporation or another would want to proprietorise the whole thing. MS were at it in the 1990s, with the famous Hallowe'en Email and talk of "decommoditising protocols". Apart from all ending up even more in hock to Microsoft, it also completely went against the decentralised, distributed ethos of the original Internet designers, who were after all trying to design something that was resilient and not vulnerable to single points of failure.

And, in a funny way, even though it isn't Microsoft, this event shows that we HAVE managed, largely through centralisation and corporatisation of core services, to create an Internet that is not as robust or resilient as we've come to expect.
 
It will be short term chaos that will be sorted over the next few days, once it has been there will be lots of talk of 'lessons learned' and how can it be prevented from happening again. There will be focus groups and management heartsearching but fairly quickly they will start realising just how much an effective solution will cost. So it will gradually get put off and forgotten about since it was a one-off that won't happen again until it happens again.

Centralised functions - stuff the IT admin staff can directly access - is reasonably fast to fix in this scenario, you could probably do around a box every two-to-five minutes if you're prepared to multitask like a hyperactive crack squirrel and have blanket access to the admin credentials. However since the task is almost entirely manual (booting in to safe mode/recovery environment, firing up a shell, deleting the offending file) there's little scope for automation. Thus it's likely to take a few days before all core systems are up and running.[1]

Things get massively more complicated for things like endpoints (desktops, laptops, thin clients, etc). As well as not usually having any method of remote access that works when the OS doesn't boot and thus requiring a physical presence, these days they're almost certainly protected by various forms of drive encryption. So as well as needing to be physically there, there's a lot more work to do to actually get booted in to a recovery environment where you can delete the files.

Endpoints in someone's work-from-home environment means you either need to spell out these passwords over the phone, or have them bring the device in to be fixed. That's likely to take weeks. Most firms won't have the gold stock to instantly give everyone a new laptop whilst these repairs are carried out.

Details on the nature of the BSOD aren't forthcoming yet I don't think, but the "lessons learned" will finally be that patching needs to be tested properly (both by the manufacturer and the customer), but this is something that practically nobody does any more. I don't have time to go in to details about it at the moment but software quality has, by and large, been on a downward trajectory for quite some time now. Not to mention IT purse strings being progressively tightened because "the cloud" or "product from Manufacturer X" will somehow magically fix everything. I'm relatively lucky that I work for a company that takes security and resilience fairly seriously still, but even here there's procedural idiocies in how the whole stack is handled.

[1] Incidentally I had a fun task last year of deleting a file from a whole bunch of OS images sitting on virtual hard drives (the files used by virtual machines to emulate a hard drive) that involved stepping through about 1200 images, mounting the image over loopback, finding deleting the file, unmounting the drive, handing the drive back over to the VM so it can remount and boot. This would be a strategy that could feasibly work in most virtual server environments - as long as drive encryption isn't used - and I imagine there's many stdP's doing exactly this sort of thing today.
 
I think Field Support in a great many companies aren't going to see their families for a bit. Giving out bitlocker passwords etc over the phone isn't going to happen, might as well post your corporate secrets on wikileaks. I imagine that a lot of companies will throw every warm body at it. It may be primarily a Windows Field Support Team job but plenty of experienced Linux sysadmins and programmers could be given a crash course and sent forth.
 
I think Field Support in a great many companies aren't going to see their families for a bit. Giving out bitlocker passwords etc over the phone isn't going to happen, might as well post your corporate secrets on wikileaks. I imagine that a lot of companies will throw every warm body at it. It may be primarily a Windows Field Support Team job but plenty of experienced Linux sysadmins and programmers could be given a crash course and sent forth.

A friend of a colleague works at a firm that got hit, literally the entire department are working all weekend on this and the entire facilities staff are being trained up for a huge influx of laptop repairs in the office on monday.

Regarding bitlocker and local admin passwords - it's "safe" (within a large enough time margin) to give these out over the phone as long as the company's set them up to be cycled on a regular basis. For instance, local admin accounts on our windows machines are there purely as breakglass accounts for this sort of eventuality; they get reset every 7 days, or 1 hour after they're actually used (at which point they force a logoff of the session) and they don't give you any access to anything other than the local machine, so they're of very limited practical use.
 

A widespread Blue Screen of Death (BSOD) issue on Windows PCs disrupted operations across various sectors, notably impacting airlines, banks, and healthcare providers. The issue was caused by a problematic channel file delivered via an update from the popular cybersecurity service provider, CrowdStrike. CrowdStrike confirmed that this crash did not impact Mac or Linux PCs.

It turns out that similar problems have been occurring for months without much awareness, despite the fact that many may view this as an isolated incident. Users of Debian and Rocky Linux also experienced significant disruptions as a result of CrowdStrike updates, raising serious concerns about the company's software update and testing procedures. These occurrences highlight potential risks for customers who rely on their products daily.

In April, a CrowdStrike update caused all Debian Linux servers in a civic tech lab to crash simultaneously and refuse to boot. The update proved incompatible with the latest stable version of Debian, despite the specific Linux configuration being supposedly supported. The lab's IT team discovered that removing CrowdStrike allowed the machines to boot and reported the incident.

A team member involved in the incident expressed dissatisfaction with CrowdStrike's delayed response. It took them weeks to provide a root cause analysis after acknowledging the issue a day later. The analysis revealed that the Debian Linux configuration was not included in their test matrix.

"Crowdstrike's model seems to be 'we push software to your machines any time we want, whether or not it's urgent, without testing it'," lamented the team member.

This was not an isolated incident. CrowdStrike users also reported similar issues after upgrading to RockyLinux 9.4, with their servers crashing due to a kernel bug. Crowdstrike support acknowledged the issue, highlighting a pattern of inadequate testing and insufficient attention to compatibility issues across different operating systems.

To avoid such issues in the future, CrowdStrike should prioritize rigorous testing across all supported configurations. Additionally, organizations should approach CrowdStrike updates with caution and have contingency plans in place to mitigate potential disruptions.
 
According to the latest BBC report, it hit about 8.5m window machines across the world, another 2.2k flights cancelled today, taking the total to over 9k now, with more expected as airlines try to recover, many GPs and pharmacies are still trying to get their systems up and running again, etc., etc.

But, at least it didn't inconvenience teuchter, so all is good.
 
Back
Top Bottom