Jon.Black
fuck up

That one big fuckup: part two

Jon Black 5 min read
That one big fuckup: part two
Photo by David Pupaza / Unsplash

Tentative title: What the fuck is two plus two?


Leaving off with the last story where I talked about the unfortunate events of an on-prem hybrid 365 migration setup thing gone wrong, I think it's fitting that on-prem issues as a theme should lead into my next story.

It's 2017, earlier in my career and honestly this story tells of my first real IT role outside of sales or being a fool with Linux at a medical software vendor; I'm at a blue-collar transport and logistics company... Sound familiar? Well, it isn't, it's a different place.

I worked as an IT support tech at a transport company west of Melbourne. 150-200 staff spread across a good 10 locations around the nation, with all IT support taking place from this one sole location at their HQ.

Quick question, how many people do you think handled IT in the org? You'd assume with at least 150 staff you'd want to spread that across about three people, right? BZZZZZT, wrong. This should have been my first red flag. Pulling from the linkedin description of my predecessor's specific entry for this same role:

Solely provided level 1 & 2 remote and local support to a nation wide company for all staff at all levels (approximately 150 staff).

"Solely". Uno, eins, ett, en... one. All levels? Poor me who was a level two at best had to cross over into level three territory?!

Handover? Zero. Documentation? Zero. Decent pay? Nope, but it was close to home, the 15 minute drive each direction was easy.

So quick rundown of this place. Everyone had a nice core i5 desktop computer, two monitors, and some of the higher ups had great laptops. But everyone worked out of Remote Desktop Services, and I mean everyone, even the user in the middle of nowhere-town Northern Territory. Why? A proprietary in-house built ERP system that was always being actively developed for the last 20 years, and Microsoft Access databases about 10GB big.

RDP was split across 6 virtual machines/silos I guess you could say, with a load balancer that would route you to a random machine each time.

In summary:

  • Six RDP virt machines.
  • All on a single hypervisor.
  • Hosted in a Melbourne data centre.
  • Pre-NBN Fibre to the Node rollout, so ADSL2+ at most locations.
  • A copy of the Microsoft access database on each RDP virt machine.
  • No user folder replication.
  • 200GB of storage on each virt machine.
  • Two domain controllers and load balancer hosted on the same hypervisor.
  • The biggest location, west of Melbourne, had about 60 users, all sharing a 4MB/s line. Meaning just under 100KB/s for each person.
  • No updating any software. Ever.
  • Disciplinary action for those who used Office 365 on their local machine outside RDP.
  • Backups were hosted on the same hypervisor.

Oh my god, a nightmare to work with. This setup was doomed to fail at some point.

To put it in perspective, my day to day was made up of troubleshooting issues with user accounts, and nothing but. MS Access DB is out of date? Robocopy it over from another VM. Missing files? Robocopy from another VM. Printing not working? Bear with the negative feedback until you can reboot one of the domain controllers. I'd actually managed to script and automate a lot of this, running a PowerShell command from any of the virt servers with the appropriate username could solve all of these and more.

This particular story though, was different.

It's a blistering hot afternoon, and users are reporting that explorer.exe keeps crashing on a VM. Then another. Then another. Until there is only one left working. scf /scannow didn't work, nor did DISM.

An explorer.exe crash would remove your taskbar, file explorer, icons, etc. It was pretty problematic.

With very little time to lose (because there was no way what I was about to do would corrupt any user data), and no way of really knowing if this had happened before/the fix, I wrote a script. The script would check the hash of explorer.exe, and if it had changed, would pull the explorer.exe file from another VM (which matched a known working hash) and overwrite the faulty one. It made sense, didn't it? These machines were never to be updated, so why not pull a file from a machine we knew worked.

In layman's, imagine your car engine was totally fucked, and the best solution was finding another engine then creating a mould of it and copying it into your chassis to replace the old one.

Everything was well, problem resolved for many aside from some users who were still experiencing the issue.

Huge success, the day is mine. Time to go home.


The next day.


I get into work. No one can log in. Fuck, and I mean seriously fuck this stupid VM bullshit. It's been a thorn in my side for 6 months.

On top of this, unbeknownst to me, there's a piece of software running on one of these VMs for a fuel tank system out in the middle of nowhere. That has coincidentally fallen over and is unable to start. The thing was made back in the 90s, and was no longer under support... for about 20 years.

My boss, a Vietnamese guy in his 60s with a language barrier that made things near impossible to interpret, was now calling me. He told me he had just remoted in and found the script that was running every so often. He inferred that stopping the script would fix all issues, when I asked him what gave him that idea he began to shout over the phone (and the tentative title for this post):

"You an idiot, you idiot, you idiot. Tell me what two plus two is"

"What"

"What is two plus two"

"Are you being serious right now?"

"WHAT IS TWO... PLUS... TWO. FUCK YOU"

I hung up. No way was I going to take shit from someone who resorted to talking down his co-workers. After that exchange, I was ready to burn this place, I had mentally checked out.

I went to each VM, disabled the script, watched as the error kept occurring.

New plan. Kicked every off, then run the updates that had been pre-installed and waiting for a reboot for months, maybe years. Yes, do the thing I was told not to do. What's the worst that could happen? I hear another aggressive grade one math problem over the phone? Someone talks crap about my hair again? They fire me? I wanted to leave anyway. The confidence welled up inside and I felt untouchable.

So I waited. 10 minutes passed, another 10, another 10. The servers come spinning to life. Send a message out on Skype. "Servers are up, feel free to log in". Silence. No problems. Viet man comes in, heated. I refuse to talk to him, draft up my resignation, send a copy to every printer in the org, wipe my PC, and hand a physical copy of the letter in to my boss.

To whom it may concern,
Consider this letter formal notice of the resignation of Jon Black, effective (todays date). I will not be returning.
Effective immediately, you have no IT support, you can get fucked.

A pro-tip for workers here in Aus. A probationary period (in this case, 6 months) not only protects the company, but you too. Contrary to what a company will try to make you believe, You have the option of leaving at any point during that period without giving notice.


What did I learn?

  • No amount of money is worth the stress of a crap work environment.
  • Never let someone at work talk you down, if you give them an inch you'll eventually lose a mile.
  • Four, you stupid idiot.

There's always a better job out there. I ended up moving into contracting work next, and honestly the experience I gained from seeing and operating within a large scale corporate environment set me up for the years to come. Law firms, universities, the odd production studio, it was all perfect. Meticulous detail put into each project and recognition for the work kept me in IT, I never knew this was a thing I could be afforded.

Share
More from Jon.Black

Jon.Black

Late twenties, systems administrator for a financial body in Australia, trying to learn pro wrestling in my free time, and honestly just love to tell stories.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Jon.Black.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.