IT End of World - S...
 

IT End of World - STW going strong

146 Posts
79 Users
22 Reactions
1,746 Views
Posts: 7021
Free Member
 

Buy quill pens, parchment and make your own ink using soot and water.

 
Posted : 19/07/2024 7:59 pm
Posts: 13554
Free Member
 

Would love to like a lot of these comments but alas I’m unable to do so.

 
Posted : 19/07/2024 8:22 pm
Posts: 3319
 

Love how this thread is a mix of IT helpdesk and comedy. Also love that I landed in the USA 12 hours or so before air travel went to shit. Phew. Also, is #humblesmug a thing?

 
Posted : 19/07/2024 8:35 pm
Posts: 76786
Free Member
 

I'm seeing a lot of predictable "Microsoft sucks" posts on places like Facebook.

For the record, this is nothing to do with Microsoft. An automatic update to a third-party application (CrowdStrike Falcon) pushed out malformed, unsigned code, and Windows - absolutely correctly - slammed on the brakes rather than allowing unverified and potentially malicious code to execute.

From the point of view of Windows this is intended, desired behaviour in response to something that shouldn't happen. Make no mistake, this is Bad, Falcon is in layman's terms a highly advanced antivirus product; it is supposed to be buried deep into the system and difficult to remove/bypass so that malware can't knobble it, which makes fixing it tricky.  In many cases it's going to be a manual task on individual machines and I expect it is going to take weeks for some organisations to fully recover but a potential alternative could have been far worse.

CrowdStrike claims to have discovered a defect in their update system and rectified it, unsubstantiated rumours suggest that corruption may have happened "in flight" via their Content Delivery Network.  Whether this is actually the case, I don't know.

In any case, I suspect there are going to be a lot of questions and introspection once the dust settles.  Vendors like CrowdStrike operate with little to no regulation, "marking their own homework" if you will.  I bet that's going to change.

 
Posted : 19/07/2024 9:50 pm
Posts: 13060
Full Member
 

Love how this thread is a mix of IT helpdesk and comedy. Also love that I landed in the USA 12 hours or so before air travel went to shit. Phew.

You should be ok until you need to make a card payment or get cash out of a machine! Good luck.

 
Posted : 19/07/2024 10:09 pm
Posts: 13554
Free Member
 

All is well. Amazon Prime Video is working so I can watch The Boys

 
Posted : 19/07/2024 10:35 pm
Posts: 7021
Free Member
 

Do any of the IT bods have a 'plain english' translation thing we can use to understand what you're wittering on about.

Also, using acronyms is only ever a way to present an 'aura of mystique' and exclude those who don't believe in communicating by using acronym soup. It's unnecessary bollocks.

Be clear, concise and, most importantly, intelligible - please.

 
Posted : 19/07/2024 10:37 pm
 DT78
Posts: 10061
Free Member
 

someone made a boo boo

 
Posted : 19/07/2024 10:45 pm
 StuF
Posts: 2059
Free Member
 

An IT security company (CrowdStrike) pushed out an update (not in the right format) to part of it's software, window's tried to use this updated file, didn't like it and then refused to switch on, meaning the computer is now effectively dead until someone comes along and manually removes the broken file.

The problem is that lots of big companies use windows computers and this CrowdStrike software, so lots of computers all stopped working at the same time.

 
Posted : 19/07/2024 10:48 pm
Posts: 1760
Free Member
 

A large part of my job is telling people that yes, this COULD go wrong and if it does it will cost you a lot of money, so mitigate it.

You work in the local off license...?

 
Posted : 19/07/2024 10:52 pm
Posts: 2965
Full Member
 

Do any of the IT bods have a ‘plain english’ translation thing we can use to understand what you’re wittering on about.

You remember the bloke with the submarine that was kind of winging it and then it imploded?

It's the same as that, but with computers and trillions of dollars and broken transport and broken healthcare systems instead

Agree with cougar - more regulation is the likely outcome.

I work in IT. For an IT vendor. Risks are seen and tolerated chasing £. Corners get cut.

 
Posted : 19/07/2024 10:59 pm
Posts: 4413
Full Member
 

frankconway

Do any of the IT bods have a ‘plain english’ translation thing we can use to understand what you’re wittering on about.

To be fair I think @Cougar did that an hour ago?

 
Posted : 19/07/2024 11:09 pm
 zomg
Posts: 847
Free Member
 

“Well actually BSOD is Windows working exactly as intended.” Absolutely ****ing glorious! Chapeau.

 
Posted : 19/07/2024 11:26 pm
Posts: 7536
Full Member
 

I’m seeing a lot of predictable “Microsoft sucks” posts on places like Facebook.

I have seen a few posts flagging up similar stuff happened a couple of months back with a couple of Linux distros. Didnt play well with versions slightly behind and went pearshaped in a not dissimilar fashion.

Vendors like CrowdStrike operate with little to no regulation, “marking their own homework” if you will.

Its a tricky one because they need to be able to push stuff quickly to shut down zero days and who is going to regulate and mark their homework?

I would say getting sued might do the trick but Solarwind have managed to mostly defang an SEC lawsuit for their incompetent security practices.

 
Posted : 19/07/2024 11:27 pm
Posts: 7039
Free Member
 

The patch file is all zeros....

https://twitter.com/jeremyphoward/status/1814364640127922499

 
Posted : 19/07/2024 11:36 pm
Posts: 11021
Full Member
 

[img] [/img]

 
Posted : 20/07/2024 12:11 am
Posts: 13554
Free Member
 

All I’m taking away from this is stop updating stuff. Windows updates are bad enough. After each one the menus, icons and general feel get closer to a Fisher-Price My First Computer vibe.

 
Posted : 20/07/2024 12:33 am
Posts: 11605
Free Member
 

Does anyone know if TicketMaster is affected?

Your gig aside, that's one company I'd love to see take a nosedive right down the shitter.

 
Posted : 20/07/2024 12:42 am
Posts: 76786
Free Member
 

To be fair I think @Cougar did that an hour ago?

I attempted to.

“Well actually BSOD is Windows working exactly as intended.” Absolutely ****ing glorious! Chapeau.

Rewind far enough and you could BSOD a Windows box by unplugging the keyboard. Remember the days when you could have a power cut and you'd spend two days trying to recover it because it'd shat itself?  Times have changed.

Today Windows architecture is far more tightly secured, you cannot just slot in any old shit into it.  The issue in this particular case is that Falcon is essentially a rootkit, it operates at a very low level because it has to. A compromise at that level could be wildly catastrophic, so in the event of this kind of failure - ie fundamental code not being what it claims to be - Windows just stops rather than allowing who-knows-what to run amok with gay abandon. This is literally what a modern BSOD is, it's damage limitation because code is not what it claims to be.  If your car caught fire would you rather it stopped to let you get out or just carry on burning in the middle lane of the M6?

As I said, the alternative is far worse.

Its a tricky one because they need to be able to push stuff quickly to shut down zero days and who is going to regulate and mark their homework?

That my friend is a very interesting question indeed.

 
Posted : 20/07/2024 12:43 am
Posts: 76786
Free Member
 

All I’m taking away from this is stop updating stuff.

Don't, that is akin to an anti-vax argument.  Once more with feeling, "the alternative is far worse."

This incident is, in the panorama of incidents over decades, extraordinary.  Patching avoids many such incidents daily only a) they don't hit the news any more than we see headlines going "still no Polio" and b) would you rather have an outage from a mistake or by deliberate malicious intent?

The semi-recent outbreak which took out half of the NHS, the vulnerability had been patched for months but it was never applied.  This scenario is FAR more common than what we've seen today.  In fact, I'd probably go so far as to say that today has been unique.

Patch your shit.

 
Posted : 20/07/2024 12:56 am
Posts: 3139
Full Member
 

My brother is the Sky News newsreader that launches all their weekday live broadcasts at 6 am. Certainly was an interesting morning, when you become the news but don’t yet know the news as to why your systems are so broken. They drank a lot of coffee until they managed to fully get back on air at 9 am.

 
Posted : 20/07/2024 8:29 am
Posts: 655
Free Member
 

Interesting, I was absolutely oblivious to any of this happening until i saw this thread .

What's been missing I haven't missed

What got turned off I haven't turned on

What couldn't run well I always walk or cycle

I really do wish Whatever button got pressed stuck like a 13 year old reverb in a well used cotic soul and stayed off.

Even if that meant obviously losing here

 
Posted : 20/07/2024 9:12 am
Posts: 28406
Free Member
 

The problem is that lots of big companies use windows computers and this CrowdStrike software, so lots of computers all stopped working at the same time.

Genius move to push an potentially bricking update to every single client machine in one go!

Have to admit, if an enemy government wanted the ability to screw with western critical infrastructure worldwide, all they have to do is start-up a cybersecurity firm in the US and wait a bit.

Didn't the founder of Kapersky have some 'interesting' links with the KGB/FSB?

 
Posted : 20/07/2024 9:30 am
Posts: 1872
Full Member
 

Interesting, I was absolutely oblivious to any of this happening

I realised how big an issue this is when I went to the doctor's yesterday morning. Staff working from slips of paper with patients' names and DOB on, no access to medical history, couldn't prescribe or raise a referral. Basically, all they could do was have a chat and a physical examination with no follow-up.

The link to medical records is still down 24 hours later. Guessing they have a massive data validation job on their hands now, it's not as simple as getting a few BSOD Windows boxes going again.

 
Posted : 20/07/2024 9:40 am
Posts: 1668
Full Member
 

@Big-Bud most folk need this stuff at some point or another:

For our health

For family

For friends

For our jobs

Despite media reports, having a connected digital world parallel to the real one has improved health, communications, creativity and productivity. Going back would be a regression.

 
Posted : 20/07/2024 10:01 am
Posts: 2295
Full Member
 

According to the CrowdStrike blog it wasn't a code update signed or otherwise, just a config file update that caused 'logic error' making their Falcon engine bomb. These config files get released several times a day so they can quickly respond to new threats, but also promptly take down any machine online during a one hour window the other night. That explains the breadth and swiftness of the issue, and why booting into safe mode and deleting the config file fixes it.

 
Posted : 20/07/2024 10:06 am
Posts: 76786
Free Member
 

Didn’t the founder of Kapersky have some ‘interesting’ links with the KGB/FSB?

There are/were rumours.  However you slice it though, they're a Russian company.  Make what you will of having anything to do with Russia as your security provider.

 
Posted : 20/07/2024 3:41 pm
Posts: 3205
Full Member
 

it wasn’t a code update signed or otherwise, just a config file update

I find it funny that suppliers spew and customers accept this distinction as somehow more excusable

 
Posted : 20/07/2024 4:13 pm
 PJay
Posts: 4693
Free Member
 

According to the CrowdStrike blog it wasn’t a code update signed or otherwise, just a config file update that caused ‘logic error’ making their Falcon engine bomb.

I'm no coder, but aren't logic errors avoidable? Generally one checks, for example, that a value isn't zero before trying to divide by it

 
Posted : 20/07/2024 4:20 pm
Posts: 28406
Free Member
 

Sounds a bit like the logic bomb from Portal 2.

 
Posted : 20/07/2024 4:24 pm
Posts: 3137
Free Member
 

Should have put a try...catch block around the bit of code that tries to read the config file to handle the exception cleanly lol!

 
Posted : 20/07/2024 5:20 pm
Posts: 13554
Free Member
 

Don’t, that is akin to an anti-vax argument.

Where’s the laughing face emoji when you need it. It was meant in jest.

Technology is great but we have become hugely over reliant on it. Worse, we’ve left all the people with no social skills in charge of it! Every IT department should have a protocol droid like C3PO to translate things from coder/IT bod in to language the rest of us can understand.  (Inset winking emoji here). Now go, roll your hundred sided dice and hope for a +7 to reversing ****ups!

 
Posted : 20/07/2024 5:28 pm
Posts: 76786
Free Member
 

it wasn’t a code update signed or otherwise, just a config file update

.

I find it funny that suppliers spew and customers accept this distinction as somehow more excusable

In CrowdStrike parlance it was a "channel file."  But to report that would be meaningless to most people.  I was wrong earlier about it being a driver update - I'm piecing this together on holiday when I can - but the rest of the post stands, the component which uses the channel file is a system level driver.  Windows stopped procedures because it saw something it didn't like and couldn't terminate gracefully because of the level it operates..

 
Posted : 20/07/2024 6:48 pm
Posts: 33017
Full Member
 

Flights grounded,
Trains halted,
Stock exchange not trading,
Sky news off air.
Paxman and his underwear

You’ll be telling us you started a fire next…

Very well done, sir! Highly commended. [img] [/img]

At least I’ve still got my source of emojis that works… *smug picachu face*

Does anyone know if TicketMaster is affected? Trying to login and it says Email address not recognised despite it working yesterday..

Got a gig at weekend so need to access the tickets

Ticketmaster are bloody awful, but I always try to save my tickets onto my phone, just in case. If you’ve got your ticket order reference, then, depending on the venue, if you get there early a member of the venue staff should be able to print a copy of the ticket - I had a problem with mine at a Roundhouse gig, and I had paper tickets printed at the desk beforehand.

 
Posted : 20/07/2024 7:10 pm
Posts: 76786
Free Member
 

Where’s the laughing face emoji when you need it. It was meant in jest.

Doh. 😀

 
Posted : 20/07/2024 7:23 pm
Posts: 33017
Full Member
 

I can’t find the article now, but I read earlier that one of the main individuals behind Crowdstrike sold a bunch of shares worth over a million dollars a day or so before this whole mess happened, which has both raised a few eyebrows, and instigated a call for an in-depth investigation. Unsurprisingly. Apparently, the sale was set up some months ago, with a deliberate delay to avoid charges of insider trading, the date it was set to go live unfortunately fell just before this all kicked off.
Still looks iffy, though.

 
Posted : 20/07/2024 8:01 pm
Posts: 14611
Free Member
 

Where’s the laughing face emoji when you need it. It was meant in jest.

Due to technical difficulties, we now have to do manual emojis ¯\_(ツ)_/¯

😉

 
Posted : 20/07/2024 8:18 pm
Posts: 13554
Free Member
 

:’(

 
Posted : 20/07/2024 8:57 pm
Posts: 4132
Full Member
 

Our Tesla thinks it's a 30mph speed limit everywhere today until the cameras pick up an actual sign. It's normally eerily accurate.

End of days I tell thee.

 
Posted : 20/07/2024 9:33 pm
Posts: 22849
Free Member
 

It affects a different version of Windows.

have you tried closing the curtains then opening them again?

 
Posted : 20/07/2024 10:56 pm
Posts: 7039
Free Member
 

Genius move to push an potentially bricking update to every single client machine in one go!

My employer has many millions of embedded (not Windows) devices with updates of one sort or another going out pretty regularly. All of those updates go through some kind of "Canary" phase - deploy to internal alpha/beta, then to a small population, and then rollout to the entire population while monitoring various metrics. It's not rocket surgery.

Anything that ends up affecting code like bootloaders - where bricking a device is a real possibility - gets huge amounts of care taken over it - everyone's nightmare is waking up to a slack message from someone you've never met before asking you to join an urgent 2am call.

On the one hand, I do feel a lot of sympathy for whoever it was made whatever change it was that did this, and I'm sure it won't be much fun being that person, or writing the RCA.

On the other hand, they've got a huge market cap, and insane valuation so they must have huge amounts of cash sloshing around so surely they could afford to do a better job than they did, and foresee this kind of thing and defend against it?

As a wise old engineer once said to me when I was a young whippersnapper, "If it hasn't been tested, it doesn't work".

 
Posted : 20/07/2024 11:21 pm
Posts: 14611
Free Member
 

As a wise old engineer once said to me when I was a young whippersnapper, “If it hasn’t been tested, it doesn’t work”.

As someone else upthread said though...

...do you trust your security firm for a zero day fix, or do you run an multi-million pound business unpatched for 48hrs to allow for testing and hope you don't get hacked, either way it's a risk.

[URL= https://images2.imgbox.com/88/95/NR1mVS0w_o.jp g" target="_blank">https://images2.imgbox.com/88/95/NR1mVS0w_o.jp g"/> [/IMG][/URL]

 
Posted : 20/07/2024 11:25 pm
 zomg
Posts: 847
Free Member
 

48 hours? You’re doing it wrong.

edit: Ah, you’re talking about staging in the customer environment? That’s probably fair, though a smoke test could hopefully be automated and be done much quicker. Perhaps there’s now a product niche there, courtesy of engineering management at Crowdstrike who presided over a pipeline that didn’t test what they were publishing.

 
Posted : 20/07/2024 11:38 pm
Posts: 7039
Free Member
 

.

 
Posted : 20/07/2024 11:53 pm
Posts: 11021
Full Member
 

Our Tesla thinks it’s a 30mph speed limit everywhere today until the cameras pick up an actual sign. It’s normally eerily accurate.

My mg hs trophy thinks every single road is 40mph and constantly flashes red in the display, it’s quite distracting and no fix for it according to the dealer. I fixed it myself with a bit of duct tape over the flashing icon.

 
Posted : 21/07/2024 12:18 am
Posts: 76786
Free Member
 

…do you trust your security firm for a zero day fix, or do you run an multi-million pound business unpatched for 48hrs to allow for testing and hope you don’t get hacked, either way it’s a risk.

This, really.

The IoT example above is all well and good, but it's apples and oranges.  EDR/XDR is not like "normal" software.  Falcon's very raison d'etre is to respond to threats fast. How often does your lightbulb get an update?(*)  Falcon Sensor receives multiple updates every day.

If the building's on fire, do you say "well, the hosepipe is still in Alpha so we'll get to you in a couple of weeks?"  I'm increasingly of the mind that this wasn't a testing issue, it was a QA issue.

Quite what the solution is, I do not know.  But as I said at the outset, I expect will be some robust exchanges of view when it's mostly all over.  Vendors like CrowdStrike essentially mark their own homework, that surely has to change.  If this incident had been malicious rather than a big whoopsie we would be in a VERY bad place right now.

(* - probably answer: "not often enough")

 
Posted : 21/07/2024 10:39 am
geeh and geeh reacted
Posts: 1872
Full Member
 

A config file change that blue-screens the device and puts it in a boot-loop obviously would never get through CrowdStrike's testing, so IMO something has gone drastically wrong with the deployment process. Either what was distributed was not the intended update, file got corrupted somehow, human error etc.

 
Posted : 21/07/2024 10:58 am
Posts: 76786
Free Member
 

I've seen all of those posited and more.  I too find it hard to believe from CrowdStrike, but here we are.

I just tripped over this blog post, which seems to be as comprehensive and accurate technical overview as any I've found.

 
Posted : 21/07/2024 11:33 am
Posts: 7039
Free Member
 

That medium article is interesting. Sounds like they rolled out some new and broken code without testing it.

Very poor. And nothing to do with urgently needing to fix threats as soon as possible (not that that is an excuse anyway).

If the building’s on fire, do you say “well, the hosepipe is still in Alpha so we’ll get to you in a couple of weeks?

The hosepipe had not been tested on your fire. Hard to believe there even was a fire.

 
Posted : 21/07/2024 1:34 pm
Posts: 76786
Free Member
 

Hard to believe there even was a fire.

You don't have a fire brigade because there's a fire.  You have a fire brigade in case there is.

 
Posted : 21/07/2024 3:15 pm
Posts: 14611
Free Member
 

Vendors like CrowdStrike essentially mark their own homework, that surely has to change.

True, but ...cost? given the frequency of updates of this nature, Imagine clients would have to have a permenent 'Security test and release' team who's only job is to test and release security patches/AV definition files etc.. it sound like a full time job, even if its just 2 or 3 people it could easily cost £100k a year or more..

The bean counters would not like that... I have a hard enough time trying to convince clients up slightly up-spec thier VMs at sensible cost, despite...oh look thier SQL server has bombed again as it's out of RAM...again 😀

"but our environment didn't go down, so why should we pay?"

"because we had to manually fail over environment A to environment B when we were getting critical resource alerts, AGAIN!"

Maybe it could be automated as a half way house, if its just a simple 'smoke test', is the definition file in the expected/correct format, simple stuff like that, but then you'd think that would happen at crowdstrike anyway as part of the automated deployments...

 
Posted : 21/07/2024 3:25 pm
jonwe and jonwe reacted
 zomg
Posts: 847
Free Member
 

Crowdstrike could be publishing their homework along with their product. Testing isn’t a sideband activity. It is the product too.

 
Posted : 21/07/2024 3:34 pm
oldnpastit, TedC, TedC and 1 people reacted
Posts: 14611
Free Member
 

Virgin radio calling it a 'microsoft windows outage' just now...

Thats like me crashing my car into a crowded bus stop and calling it a Ford issue, FFS, lol

 
Posted : 21/07/2024 6:33 pm
Posts: 2295
Full Member
 

We know what went wrong but there’s still a question over how and why it happened. It’s almost unthinkable that some level of testing didn’t take place before making the update, so why was it inadequate? I think the clue is in CrowdStikes own blog, that these channel files are updated several times a day. This is the Falcon USP that they are responding to threats as they emerge, so the normal develop / test / release cycle is highly compressed and probably highly automated..

Instead of a phased rollout, every machine online got updated at the same time. A little over an hour after release CrowdStrike realised there is a problem and pulled the channel file but by this time 8.5 million machines have already been compromised. CrowdStike themselves seemed surprised that an issue could even occur as they state there’s not been an issue with Falcon before, so I think a combination of trying to be the fastest to respond and their own hubris created the perfect storm.

FWIW there could have been a simple failsafe – if Falcon fails after channel update, roll back that channel update, reboot, and you are back. The fact that a simple mechanism like this wasn’t considered leads me to think they didn’t believe a channel file could take down Falcon, which may have fed into a minimal testing strategy.

 
Posted : 21/07/2024 8:24 pm
verses, Jamze, verses and 1 people reacted
Posts: 1872
Full Member
 

 if Falcon fails after channel update, roll back that channel update, reboot, and you are back

Once you've caused the memory exception and blue-screened, don't think you can then have a script do something else.

 
Posted : 21/07/2024 8:50 pm
Posts: 76786
Free Member
 

Right there with you until the last paragraph.

There is no "simple mechanism" to roll back because of how early in the boot process Falcon is called.  It's not loaded by the OS, it's loaded by the boot manager.  The boot logic is basically "check for malware, if no then start Windows Kernel, if yes then Halt."  It's not an oversight.  Rather, it's not possible.

As I Understand It.

 
Posted : 21/07/2024 9:33 pm
mattyfez and mattyfez reacted
Posts: 14611
Free Member
 

Yeah, windows machines quite rightly cacked themselves due to 'unexpected item in bagging area'.

Theres no automatic roll back for such a low level security update for endpoint/desktop pc.

If it were a server, then any 'org' could just take that server offline and fail over to an unpatched mirror/backup whilst the issue was figured out...

 
Posted : 21/07/2024 10:08 pm
Cougar and Cougar reacted
Posts: 3137
Free Member
 

It’s not loaded by the OS, it’s loaded by the boot manager.

I would suggest the OS would instantiate the Falcon drivers at a very early stage.  Falcon will undoubtedly reference a whole raft of Windows dll's for things like low-level IO access and the like.

But agreed, if this part fails to work then there is no easy way to "roll back" hence Windows halts - and correctly so.

 
Posted : 21/07/2024 10:37 pm
 MSP
Posts: 15334
Free Member
 

I am guessing that the solution would be to have some sort of integrity check on the update files. From my understanding of the problem (which isn't great) even a digital signature in the file would have highlighted in this case that the content wasn't sound, a checksum would have highlighted if the file was corrupted in the distribution network.

 
Posted : 22/07/2024 7:56 am
Posts: 8552
Full Member
 

I'm sure they can and will add some better error checking into the driver code. It's not that driver that's being updated frequently, it's the channel files the driver calls which contain the updated content for the detection code that runs in the kernel layer. It appears there isn't much validation done of those channel files by the driver as it just assumes they are correctly formatted etc. as they come from Crowdstrike. That will need to change (although it's unlikely to be able to detect every anomaly) and a rollback process might be an option (as in if an anomaly in the latest channel file is detected it reverts to using the previous update, rather than disable itself).

I still don't understand how it was missed by Crowdstrike in their testing, it made more sense when it was speculated the updated channel file 291 had null bytes in it (which might have been caused by corruption whilst copying it to their public staging locations post validation - although even that process should have file hash checks) but Crowdstrike has said that wasn't the case and imply it was just the new detection logic in the channel file that triggered a logic issue in the driver when it processed it (and if a kernel mode driver crashes it will intentionally crash the OS).

 
Posted : 22/07/2024 8:44 am
 dlr
Posts: 696
Full Member
 

Yes full of zeros from posts I saw. Was a busy Friday. ~25 Servers, ~100 desktops half of which are installed in random areas in a manufacturing plant, great fun.......one Hyper-V Host in my cluster got itself messed up and would no longer live migrate, fixed now along with a couple of remaining desktops which I CBA to deal with on Friday and weren't important.

 
Posted : 22/07/2024 11:39 am
Posts: 4130
Free Member
 

@ahsat

Please can you ask your bro about Sky news' choice of content around 7am on friday.

(see my post on page 2 ?!?)

Ta

 
Posted : 22/07/2024 11:46 am
Posts: 792
Full Member
 

Good explanation of the technicals by Dave who used to work at MS (*):

TLDR the Crowdstrike driver is a kernel driver that marks itself as required to boot ('a bootstart driver').

The driver is tested and certified by MS....the definition files that the driver loads, which are almost certainly code, are not. The definition file made an invalid memory access causing a SEGV. Kernel quite reasonably gives up at this point, reasonable given its architecture and CrowdStrike's use of it anyway.

Still of course how Crowdstrike allowed something so large scope to happen is anyone's guess.

(*) and by the looks of things was in early enough to make an absolute boatload!

 
Posted : 22/07/2024 1:25 pm
Posts: 7039
Free Member
 

Initial root cause analysis:

https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

On July 19, 2024, two additional IPC Template Instances were deployed. Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data.

It still doesn't answer the question of why they were not doing staged rollouts of these new named-pipe templates.

The first template for spotting named pipe usage went out in February, and the named pipe monitoring itself is just another way to possibly spot malicious programs - it wasn't actually handling some kind of 0-day attack - i.e. they could have done a staged rollout without impacting their ability to protect customer systems.

It also seems like a poor design choice to put so much complex code into the kernel - is it really not possible to do the complicated stuff in userspace? I don't know anything about Windows, but in Linux all of this could have been in userspace (auditd, apparmor, etc). Maybe there's some reason I don't understand.

 
Posted : 25/07/2024 9:40 am
Posts: 7536
Full Member
 

It still doesn’t answer the question of why they were not doing staged rollouts of these new named-pipe templates.

Its worse than that.  Whilst initially they did test their "template type" properly once it is bedded in apparently they just switch to using a "content validator" and so were just throwing these into prod without real testing.

On the plus side they have handed out some giftvouchers to their partners for the inconvenient caused.

On the downside at $10 it is probably one of those times they shouldnt have bothered at all.

 
Posted : 25/07/2024 10:02 am
Posts: 8552
Full Member
 

I've also heard that a lot of companies configure staged deployments of Crowdstrike updates to their end points (not involved with managing it myself though) but the way they pushed this update (I guess the Rapid Response option) ignores all that and pushes out to all the end points at once - which is probably why it took down services in companies like Microsoft where you'd expect them to have staged roll-outs configured. If I were MS I'd certainly be suing Crowdstrike

 
Posted : 25/07/2024 12:12 pm
Posts: 76786
Free Member
 

Initial root cause analysis:

The executive overview is worth a read:

Adopt a staggered deployment strategy, starting with a canary deployment to a small subset of systems before a further staged rollout.

- called by oldnpastit at the top of this page

Conduct multiple independent third-party security code reviews.

- Called by me a couple of pages back.

Strengthen error handling mechanisms in the Falcon sensor to ensure errors from problematic content are managed gracefully.

- Called by multiple people on here.

Make no mistake, even fully understanding how this happened, it really shouldn't have.  But credit where it's due, I cannot fault CrowdStrike's subsequent handling of it.  They'd resolved the initial issue within 90 minutes - which I'd expect from a rapid response company - and have been transparent as to what went wrong and what they're doing about it.

 
Posted : 25/07/2024 12:36 pm
Page 2 / 2