5

Tell HN: I salute everyone on call/working support through the holidays

 1 year ago
source link: https://news.ycombinator.com/item?id=38725015
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Tell HN: I salute everyone on call/working support through the holidays

Tell HN: I salute everyone on call/working support through the holidays
653 points by waynesoftware 9 hours ago | hide | past | favorite | 120 comments
Thank you for keeping systems available and safe. I've been there many times in the past, including having to fly at the last minute to a non-internet-connected data center in NJ to babysit an emergency production bug fix that took the entire holiday to create, install, verify, and monitor.
Always be kind, and say it’s your fault.

If you don’t do it for the sake of the person you are asking for help, do it because it works better. That’s the most practical advice [0] ever given by Hans Rosling [1], the Fact master himself:

> In fact, I have the secret to how to get the best help immediately from any customer service, like the phone company or the bank or anything. I have the best line, it always works. You want to know what it is? When I call, I say, “Hello. I am Hans Rosling and I have made a mistake.” People immediately want to help you when you put it this way. You get much more when you don’t offend people.

[0]: Unless you are in charge of a developing country’s budget and have to decide between education and healthcare.

[1]: https://blog.ted.com/qa_with_hans_ro_1/

s.gif
> Always be kind, and say it’s your fault.

I do this with internal teams at work. I've found approaching other teams with issues with their library/framework in a "this could be our mistake" manner really helps in keeping them from getting defensive and stonewalling.

s.gif
Saying, "I'm sorry; I've made a mistake" is the killer disarming technique for even the most emotional conflict. Not sure if it's our pride or fear of liability but western culture is very hesitant to say "sorry" - other than the fake one "I'm sorry if anyone interpreted my actions|remarks|words as ..." - that doesn't count.
s.gif
Absolutely. The same is true for bug reports: if you always approach any bug report with the possibility that it may be your mistake, 1) you avoid annoying someone if it is your mistake, and might actually get helpful advice, and 2) you're more likely to get a cordial reception for real bugs.

You don't have to be excessively self-effacing about it, just avoid presenting things as though the project you're reporting it to being at fault is the only possible conclusion.

s.gif
I do something similar. Hey, I'm pretty sure I'm doing something wrong. Can you help me figure it out?

Then be grateful for the help, because it truly isn't granted or a given that people have to drop everything and figure things out for you, even if you work together. And even if the mistake was actually theirs. Gratitude is huge.

s.gif
I'm going to try that's but will need more information about Hans Rosling to get through the identity verification ...
s.gif
I like to append ", unless I'm missing something?" in a similar manner, might be useful in situations where 'made a mistake' doesn't actually make sense :)
s.gif
I think this works great sometimes, but I've been in plenty of situations where "I messed up" leads to being railroaded into the "boilerplate tech support" discussion tree. "Did you try doing X", "Did you confirm that Y is Z", "Make sure your VPN is working, try installing a new config", and finally "That's strange, double check all your parameters, there may be a typo in there somewhere"
s.gif
Blame doesn't really help solve or prevent problems. Root cause, blameless analysis with awareness and new tests and mitigation does. Also, career-wise, you won't become an IC9 by admitting to making a lot of mistakes. It's best to just solve them as fast as possible.
s.gif
Hey thank you so much for linking the Hans Rosling interview, he seemed like a really great guy -- it led me to buying his book just now!!
s.gif
You can ask customer service for help politely and constructively, without disingenuously (or passive-aggressively) stating that the problem is your fault.

If you want, you can acknowledge how you tried to fix it and failed (if that's accurate). But don't say that the problem is your fault unless it is.

(There are situations in which taking blame for a situation not necessarily yours might be a convention, but mistakes of vendors when talking with the vendor aren't one of them, IMHO. For example, you might take a little heat for colleagues, when appropriate, and all the CEO you're talking with needs to hear right then is, "Sorry, I don't have that for you yet; let me get that to you later today." Not "I've been pestering Bob since Monday for the dependency." Then you can go tell Bob that you two really need to solve this in the next couple hours. And if there's a larger problem, like Bob has been overextended by a family problem, or tasking has been unclear since a tentative pivot, then work it with management in the appropriate vertices of the org chart.)

s.gif
I mean, that sounds very holistic in theory, but in practice just doesn't work out.

A few days ago I suddenly had my french press for coffee suddenly shatter and almost blast hot coffee over my upper body.

How am I supposed to start that call with "Hi, I'm jorvi and I made a mistake"..?

It's not like that is a unique situation either. And you can guarantee that if you tell customer service "I made a mistake", and it is clear they delivered a broken service / product (but often want to duck responsibilities), there is no way in hell they will not take the freebie you just gave them by admitting fault.

s.gif
"Hi, I'm jorvi and seems I made a mistake or something, my french press just exploded in front of me! Can you help me figure out what went wrong?" is one way of putting it.
s.gif
You can't, but it turns out acting reasonably under uncertainty is a better goal than being sure of things.
Yeah, we discourage production changes starting first or second december week, and start freezing changes third december week until it's frozen solid fourth december week until second week of january.

December tends to be hell for our customers, so stability should be a priority there.

And honestly, no one wants to work on holidays. So lets just wrap everything starting in december, maybe use the third week for some unnoticed issues and then just lay down the tools. Use that time for documentation, or shorter days, quite frankly.

That way we minimize the on-call situations occuring. Let's hope it goes well for the engineer this year as well. We have a streak to keep.

s.gif
We don't all do dev ops here 8)

My little firm have just lifted and shifted a customer's hardware from someone else's computer room (data centre is too grand) and plopped it down in ours. Downtime was roughly six hours which includes two hours driving, unracking, loading, unloading and racking.

Then there was a flurry of network knitting ... oh they've tagged the bloody VLAN instead of untagging it on what are effectively access ports and don't need to be trunks or hybrid. lol, lose 20 mins. I wasn't allowed to look at the "source" switch's config and might (emogi: looking up and whistling) have assumed a few things ...

We did spend quite a long time trying to work out what the customer might have failed to tell us because we hadn't asked the right questions.

... so I plug my laptop into the NIC in question on the Hyper-V box and run up Wireshark ... fuck (dot 1Q tag) ... run back upstairs to my PC and reconfigure the port to hybrid with tagged VLAN 100 instead of access on VLAN 100. A better solution would be a trunk with PVID on the naughty VLAN and tagged v100. I chose the former to make it stand out.

The naughty VLAN thing is similar to a discard VLAN but the traffic is not discarded but instead gets logged. We should never see traffic on the naughty VLAN. If we do its a miss-configuration or something nasty.

As well as that, we have customers for whom Chrimbo is anything up to 50% of annual turnover. Their systems tend to be treated in the same way as yours.

s.gif
Holiday oncalls are a fun tradeoff. On one hand, no one should be making any changes (and if they do, they'll have some explaining to do), so it's more likely to be calm. On the other, traffic patterns are weird, and it's time off where you'd rather not be tethered to your phone. What's universally bad is being oncall when the code freeze ends or the week leading up to the freeze.
s.gif
Major Major Lesson learned in the trenches.

Business offices cut their chilled water supply back to minimums (or nothing) over holiday weekends & breaks.

If you're running a server closet, even if you have a dedicated Liebert HVAC, when the chilled water cuts.. you will overheat.

I learned this over the course of three consecutive Thanksgivings.

s.gif
We do the same, I work in logistics software and we usually freeze early November up until Christmas.
s.gif
E-commerce. Code freeze two months before Christmas. A very chill time indeed.
s.gif
How do you avoid a sort of thundering herd problem when the freeze is over?
s.gif
Technically we don't... But at least everyone's had a nice vacation so we're ready to deal with it after we're back.
s.gif
The place I work for pushed v2 of their software, a full rewrite (nothing from the old system, not even databases) by a new team, into production this week for several customers. Mostly they did it so they could say they met their made up 2023 KPIs for the v2 rewrite. There was no good reason to push it out now other than that, and there were several reasons not to, such as it wasn’t well tested and it’s fucking December 20th. Anyways, I’m not really on call so I can’t complain much, but my poor coworkers have to support this over the holidays now.
s.gif
Ugh. Several years ago I spent an entire Christmas vacation, including all day Christmas Day, putting out fires because a team couldn't be bothered to do five minutes of cursory load testing. As a consequence, multiple production systems went down under load.

Later, after we regrouped after a month of this brutality, they wandered around the office bragging like they'd hung the fucking moon after they fixed the crippling, obvious design issue they'd released. I confronted the dev lead with the fact that they would have seen this after 30s of load testing and he just laughed, I think he literally said "LOL". A giant middle finger, that's what Ops got from Dev for Christmas that year.

Here's to the people who KTLO. My people.

s.gif
I've been is similar situation before. They wanted to release right on Christmas. But luckily enough, instead of releasing version full of bugs, managers come up with excuse: release postponed for one month due to some new vulnerability in third party library project used.

What a brilliant move! Christmas's was saved, everyone eligible received their bonuses.

s.gif
> And honestly, no one wants to work on holidays.

Actually I bet some people like it (I know I do). It's not that crazy to want to dodge the whole mad rush and take lots of time off later in the year when it's actually nice outside. Summer vacation beats winter vacation, so if you have to take days off in the winter there's pressure to try and get somewhere warm where the days are longer. Besides. The "office" is quiet, even if you're a telecommuter, so it's easy to get things done. If you're not touching production, that's fine, there's usually all kinds of fun or quality-of-life projects around tech debt, tooling, whatever. Lots of important work is actually easier to do during a change-freeze or other downtime.

s.gif
I think that’s a great policy as it’s clearly intended to help people when they need it, and get people to unplug when it’s valued by their loved ones.

_However_ (that part is probably best bookmarked until Jan 2nd), it also betrays that your system is brittle and can be broken by a bad commit. Don’t do it because you want people to grind until Dec 24th at 6 pm. Do it because it’s great the rest of the year, too. I’d recommend you look into (or ask me about) feature flags, alerting, and automated roll-backs.

The short version is: there’s a meta-system on top of your release process that can tell (if you are using roll-back not features flags): - commits until xyzsdf are fine; - roll-outs starting from commit abcdef have a 2% error rate, 80% on Android; - revert to xyzsdf, send a message (low-priority, email) to the DevOps on call and the author of abcdef that it happened; - for all commits after abcdef: if there no conflicts with xyzsdf, re-try to roll them out; - if there is a conflict because they were on top or abcdef, send a message (low-priority email) to the authors that there is a conflict.

There are more sophisticated versions that can do things like, if you use feature flags, flagging Android users to use the previous version. Another way to do this is to scale who has access to abcdef gradually: say 1% every hour, and revert if you detect issues.

All those seem daunting to teams that haven’t worked like this before, but it my experience, they love it very fast.

s.gif
We use these systems liberally on other times of the year and no one notices, usually. If they do, downtime and interruption budgets handle this.

/However/, let me counter with the point: Just one of our customer has 8000 FTEs working with our system. During hell-time (aka, December and Christmas shopping and shipping), each of those dudes spends their shift taking customer calls lasting 2-4 minutes, which in turn require a few requests into our systems.

Due to the stress of their customers^2 (because it's Christmas and holidays and such), if an agent of a customer is unable to access our systems, they cannot handle the use case of the customer^2 and that will piss of the customer of the customer.

So if we push a bad change during this time, we're going to piss of hundreds of customers^2 per minute for that one customer alone. Even with a fast automatic rollback, that's a long time during hell-time. And they have people who know how to yell at vendors in nasty ways who don't like that.

I enjoy moving software fast and enabling moving software quickly, but customer focus and customer orientation means to understand when to move slow as well.

And hey, if that means more quiet holidays for the hard working operators on my team, who's gonna complain?

s.gif
As the person before mentioned, partial rollouts with separate monitoring would help with that and might be an improvement the other 11 month..

But we are doing the same thing, 2 weeks around Christmas there is please take holidays if you can period where we do not merge any non priority one tickets.. which has not happened yet.

s.gif
You are a lot more ahead than most companies.

I’ve worked for too many places where the Christmas break was because of a lack of tooling. I’m glad you are two steps ahead.

s.gif
How do you detect errors like this?

What is an error? Is a business logic bug going to be picked up by this process automatically, or is some manual steps involved?

Ie a point of sale app releases an update that automatically halves the amount to charge, but displays the full amount to the merchant in the UI. Unit tests pass (because an engineer made a human mistake). Backend calls are correctly used, no errors thrown, simply the wrong amount is used.

How would this be automatically detected and reverted?

Would anyone writing point of sale software want to risk this over one of the biggest trading periods of the year?

s.gif
As you point out, it really depends on what is an error. Most of the companies I know of have a Holiday freeze are video games, casual ones, even. Changes are minor fixes and optimization—glitches that a player likely won’t notice, but you want to detect them early to avoid losing your ability to detect more.

Back-end tools are different, and I definitely see reasons other than bugs to not change business logic this month.

s.gif
Yeah, that model may work for many public facing apps, but probably less so for enterprise systems that are heavy in business logic.
s.gif
> it also betrays that your system is brittle and can be broken by a bad commit.

Correct. So's yours. So's everyone. You might not know what the bad commit is, you might've fixed a bunch of the other bad commits, but even Google gets taken down by bad commits. Your system is brittle and can be broken by a bad commit.

Yes, absolutely thanks to all who keep our world running when no one is looking. To keep the yule log on Youtube, to keep our christmas tree lights on, to keep a fresh glass of water from the tap, warm natural gas to keep the freezing cold outside etc. Thank you for keeping society ticking away :)
s.gif
I’ll give a shout out too to everyone in the military monitoring warning systems and maintaining stance to protect us from being killed while we’re with our families.
s.gif
Hundreds of thousands are fighting in the trenches right now to protect their homes, to protect their families, to protect democracy while politicians went on holiday break without approval of military aid for allied army.
s.gif
Let's not confuse on-call firefighters or a water facility staff with the on-call admins that maintain money-making machines monetizing attention of billions. The latter is a net negative on society.
s.gif
That firefighter is probably using YouTube or scrolling through Instagram to unwind while they’re stuck at the station waiting for a call. Just because someone works in entertainment or ads doesn’t mean that the economic puzzle piece they represent isn’t valuable to society.
s.gif
Imagine instead they were making a meal for the whole fire hall, working out, or talking to each other …
s.gif
Yeah, how dare Netflix provide entertainment on-demand and for cheaper than the other entertainment companies?
s.gif
I am currently viewing this on ethically sourced rfc1149 (birds gave consent via a scientifically proven “brain electrode interface”), manually decoding packets using an abacus made out of various animal droppings foraged on the forest floor. If I can’t view your content this way, it should not be on the internet.
s.gif
Oh, so you're just going to deprive the trees of the nutrients those animal droppings would've provided? Ethics, indeed.
On call sucks so badly. At this point of my life, I firmly believe that there's not enough amount of money that can compensate the mental suffering it implies. Even more if the company you work for has this mentality of "deal with it" without making improvements, which was my case in the last period I did on call and what made the camel's back to break for me. Nowadays I simply refuse it. For those who are still on the trenches, stay strong, never resígnate yourself to just "deal with it" and thank you.
s.gif
To me, the worst part of being on call is the stress _after_ my shift ends. I understand that it's a necessary part of the job to fix issues that occur during my shift, so I don't really mind it, but it gives me long term issues. I feel anxious whenever I don't have my phone on me, or when I'm far enough into the wilderness to lose my cell signal. Late night when I don't expect to be getting messages from anyone, a random notification can sometimes give me an immediate stomach-drop panic response.

Unfortunately I feel like I lucked into this role and if I left I wouldn't be able to find anything anywhere near as good.

s.gif
And I am not even sure whether you are talking about just day-time on-call or the 24 hours on-call for at least 1 full week to two week stretches or a simple 12 hours on-call you are talking about? In India the Indian managers (and American managers are just fine with it) have made an environment of this barbaric practice of 24x7 on-call handled by just one person.

In fact, even when there are US/western counterparts these subhumans projects that they will make sure Indian engineers are on-call even during American daytime. This has been happening at my workplace. They employ all tactics - from fear, intimidation, to try to sweat talk engineers into it with shit like, "Oh, we own it, right? So it's our responsibly to support even when it's night".

With that environment it becomes extremely difficult and a pressurised situation for someone like me who simply refuse to even sign up on something like PagerDuty and make it clear that my phone remains silenced and out of my bedroom between 10pm-7am and it really does.

I agree with you - there is no amount of money that can put on on-call, definitely not on a night shift on-call.

s.gif
With the last (and only) job that required me to be on call I quit the day before I was scheduled. I've always refused to do it. Devs have no business doing it.
Meanwhile a huge number of us (non-religious? introverted kernel compiling cave dwellers?) treat this period no differently than any other week in the year. I'll be here keepin the servers runnin :horns:

It's actually my favorite time of the year. Everyone is gone, it is quiet, and I can get shit done.

s.gif
> non-religious

Or a member of one of the religions that don't celebrate Christmas.

s.gif
It's me. But we still have a holiday period at the end of year - normally financial targets are hit and it's a 4 day leave to get 10 days off.
s.gif
When I did shift work I always volunteered for the holidays because I don't care about holidays.
s.gif
Holidays are special because they’re special, both the winter solstice festival (rebranded for christianity) and the spring equinox one (same deal) can be treated differently for cultural variety by the non-observant.

I’m a militant proselytizing atheist raised by a jew and I still have a tree with pretty lights, give presents, and drink and eat some things I only drink/eat once per year (never make homemade eggnog if you ever want to enjoy it guilt free again, you’re basically drinking a megacalorie of heavy cream, yum). It’s fun to celebrate the generic concept of “holiday” - a time that is different from other times.

You’re allowed to feel nice about peppermint candy (and/or chocolate gelt, I go for both) at the end of December without bringing the supernatural into the equation. :)

s.gif
> never make homemade eggnog if you ever want to enjoy it guilt free again, you’re basically drinking a megacalorie of heavy cream, yum

Solid advice. I literally put on 5 lbs. while refining my non-alcoholic eggnog recipe.

One thing I noticed is that ice cream, crème brûlée, and non-alcoholic eggnog are all just variations on the same recipe. A glass of egg nog is pretty similar to a glass of melted ice cream.

s.gif
Oh ya same I love the smell of evergreen wreaths and trees and enjoy partaking in festive activities. A 4K cracklin’ Yule log goes a long way too.
s.gif
Just like San Francisco during Burning Man!

Build A 300-Mile Wall Around SF During Burning Man:

https://web.archive.org/web/20190213021206/https://megagogo....

>A community effort to construct a 300-mile wall in one week and prevent Burning Man attendees from returning to the Bay Area.

>About This Project

>We want to help Burning Man attendees continue their favorite week of the year, and allow them to keep experiencing the genuine community and deep connections they can only feel while at Burning Man. To do this, we will build a 300-mile wall around the entire Bay Area during Burning Man.

>For the rest of us, what’s normally our favorite week of the year… lasts forever!

s.gif
Certainly doesn't have to be religious. I think you probably have all your family living near each other or don't like them. To me, this is the only time of year outside of major events like weddings, funerals, and graduations that I can be reasonably assured my parents, all my sisters, all of my nieces and nephews, and at least a few aunts, uncles, and cousins will all be in the same place at the same time. It's both nice and convenient to be able to travel to one place and see all of them together. It's the kind of thing that can only realistically happen at a coordinated national level, and if took a religious holiday to give the country an excuse to give us all a holiday at the same time, I'm fine with that even if I don't practice the religion.
s.gif
> I think you probably have all your family living near each other or don't like them

Bingo!

My D-I-L is a nurse and will be working the coming weekend. That's her "price" for getting Thanksgiving (in the US) off. We'll schedule as much as possible around her work hours and make sure there's food left for her when she gets home.

Many thanks to all of the health care workers who take care of us over the holidays. (Along with all of the others, of course.)

My mum always worked on most of the public holidays because we never really cared about them but it did mean everyone owed her favors (she was a physician and would be on call, at the hospital etc) so she could get someone to cover for when whenever she wanted.

Growing up ignoring holidays is mostly great (fly on xmas and everybody feels sorry for you, even though they are the ones working on xmas). But it causes relationship problems bc even when you genuinely try to participate you’re “doing it wrong”.

s.gif
Usually I choose to work during normal holiday season too, and then go on vacation in mid January when crowds are smaller.

Having a family that accepts rescheduling Holidays helps. We've celebrated Thanksgiving, New Year and Christmas on different days before.

And for all those of you trying to hold the world on your shoulders: don't be a hero. If you don't let things fail (that aren't your responsibility), nobody will notice it's at risk of failing, and thus will keep letting you hold it up by yourself.
I get nervous every New Year’s Eve due to date/time issues. I work on emergency 911 software. In our system each time a 911 Call is created, we create an incident number in the format YYYY-NNNNNNNNNNNN where N is an incrementing number. I was oncall a few years ago when a date time bug was introduced that resulting in numbers being created prematurely by a few hours. As each hour passed more customers in a different time zone called in to report the issue. I was the only person working and was getting hammered with cases.

It sounds like an easy isssue to correct, but downstream systems that consume those numbers had already processed them and associated reports and other records with the incidents. I spent the next few months sorting out that mess and helping work with partners to clear out data.

My uncle used to work for the City. Every holiday season, if it snowed, he would be called away to plow the roads.

Here's to those out there plowing the roads so we can get there safe!!

I worked four years in the military, and three of those I had evening and night shifts during the holidays.

Absolutely nothing happened, no activity whatsoever - just babysitting systems deep inside a bunker. Closest I got to new year's eve was watching the fireworks through CCTV.

The last year I was on call, which was miles better, but those years definitely cemented my will to get a job with normal work hours.

Holidays are excellent times for hackers to take advantage. It’s not just Christmas or other Western holidays, either. Extend this principle to any holiday/world conflict/anniversary of conflict made into holiday/calendar new year and then adjust your time of attack.

protip: US companies with offshore groups are usually underfunded, understaffed, and underskilled. Time to see if that disaster recovery environment works!

Happy holidays to those who encounter system stress tests. Can’t spell salary without some elements of slavery…

In a similar vein, I'm grateful for the people who maintain the foundational pieces of our digital world that often go unnoticed like date & time systems.
Thank you indeed! While I have fond memories of working holiday pager support early in my career, especially before marriage and kids to cover for those with families, I’m very grateful for those able to cover for all of us now! Cheers to you all
FYI Israelis are not on holiday - our holidays are on whole different dates. Hire Israelis and experience no down time while working with Silicon Valley level talent
s.gif
Just hope stuff doesn’t break on sabbath, they ain’t touching no computer that day :)
s.gif
Most Israelis in tech are not religious and take on call duties on Saturday (Sabbath) just like Christians take on call duties on Sundays.
s.gif
As I wrote in another comment, most Israelis in tech are not religious and take on call duties during Saturdays and religious holidays
It tends to be a tradition for me to be oncall over at least one of the winter holidays, this year I (again) get to preside over a vendor vs vendor showdown where the only loser is our hardware!
Agreed, there are some gigs that just really require support to exist - I know this first-hand from working at a Zoo (very large exotic animal rescue basically). Animals do not take holidays. They need to eat and do animal things in spite of our costumes that day.

On the flip side, having worked Cinema on Christmas Day two years I think, there is no amount of Grace and Patience I can give that is enough to those earning their living. Still have a hat and polo. Why? I had to buy them!

Thank you but I wish you were not doing it during your night time. Let someone in their timezone for whom it's daytime do it. Insist on it, refuse being exploited. Take this opportunity to tell your company to hire the time timezones where it's day during your night. If it's not important enough for them then they anyway don't need to you remain awake at night. Don't harm yourself. It's not just losing sleep, that leads to harmful effects in your body that keep showing for long and some damages might not even be reversible.

It's not bravery, it's not being a hero as many are putting in comments - it's just plain exploitation if you are awake at night doing this.

Your health and wellbeing is not worth any money and definitely not worth someone's shopping cart and checkout page working smoothy over the holidays when your Western colleagues just took off en masse. Not even close. I know a lot of you would be from a third world country like me (yes, that's a thing!). Stay strong and work for this exploitation to end.

Just barely started my current job too recently to be in the on-call rotation yet. Lucked out! Props to those keeping the wheels turning.
Is it a US thing to push updates right before holidays and force people to be on call?

European companies I've been working on are planing releases till the end of November, first week of December max. All project plans are baked like there is one week in December.

While US company tried to force every contractors to work everyday and some weekends too. Are you Indian? You don't celebrate Christmas, you'll be on call. Are you Eastern Slavic? You celebrate Orthodox Christmas in January. You'll be on call.

s.gif
Not all breakage happens because of new updates. Sometimes the servers go down, sometimes a bug that already existed gets triggered by something, sometimes there's extra volume over the holidays and the load is too much. Full agreement that you shouldn't push new code in late December, but that won't stop the need for some folks to be on call.
s.gif
We discourage changes starting early/mid november and completely freeze deploys a week before thanksgiving and christmas, but there's still so much that can go wrong. The other commenter summarized it well, but in addition there's also systems that require continuous automatic updates as part of their functionality, and sometimes those automatic updates can start to go haywire. For some systems, the holidays are the most important part of the year, and it's extra crucial for people to be monitoring and ready to respond at a moment's notice.
s.gif
There are also systems that aren’t entirely stable due to resource leaks, and when you stop deploying you might find you need to restart them occasionally.
s.gif
> Is it a US thing to push updates right before holidays and force people to be on call?

Yes. And usually they get it done via their off-shore employees in third world countries like India working during their night time.

> Are you Indian? You don't celebrate Christmas, you'll be on call.

Not celebrating Christmas isn't a point. The point usually is - you will be without a job probably if you are not agreeing to be on-call 24x7.

Firefighting sucks. I salute all of the ways people ensure that on-call runbooks are thoroughly documented, not shipping code on Fridays, and robust, simplicity-based engineering that reduces problems during and especially afterhours.
I'm not sure if there is something in the water this year, but this week, Dec 18th to Dec 21st (only a partial week), has been our busiest week all time already.

Sweating over here trying to make it through the week and praying that it slows at least for the first half of next week.

    non-internet-connected data center
Woah. That sounds wild. If you are allowed to share: What industry / company? I would like to hear more about that setup!
Managers, if you're reading this, and you have engineers/developers "on call" but not contractually (off book), make it so, because it slightly sucks when you're having Christmas drinks but can't enjoy yourself because you might need to drive somewhere and climb up a ladder to tend to a product.
s.gif
If somebody asked me to be on call, during a holiday, without overtime, there's no way I'm driving anywhere. If it's worth it to the business, pay me for the time I'm no longer getting to enjoy with my family, or shove it.
s.gif
Yes, would be nice to not need to mentally prepare to make that call.
On call till 31st, so please don't hit refresh too much this days ;-)
s.gif
Me too, but they pay a few bucks an hour to carry the phone so at least that adds up.

Ultimate trick is to have a diverse team. Someone that doesn’t care about Christmas but absolutely needs some random day off in March (cool with us!). Someone that celebrates new years some other time.

s.gif
Speaking of diversity, if you don’t do Christmas dinner I strongly recommend ordering takeout from a Chinese place on the 25th! There’s lots of happy photos of Chinese chefs and Jewish customers doing a Christmas fist-bump.
parent company got hit by cyberattack yesterday. salute to all IT and InfoSec colleagues working around the clock while everyone else is taking the day off.
Big up the on call heroes! Hope you're getting paid well, hope you get no red lights on the bug hotlines.
My wakeup alarm this morning was 9am when OpsGenie let me know I'm on-call today. Praying for peace.
Just remember this time of year is often peak vulnerability time. When attackers exploit that teams are at reduced strength and off guard. Slower response times to investigate and fix issues etc.
I salute those in the startup world - the ones in a team of 5 and they're the only Ops person who always gets paged.

Been there, done that.

Thanks for all the great work; I hope no one has an outage this holiday and has time to enjoy family and alone.

Keep up the good work, folks

Thanks for the salute, but we also accept cash :)
As if this week had attempted to take a measure of blood from my body, I'll be on-call next week. Looking forward to all things quiet on the HEP network front.
For some of us, we look forward to the peace and quiet
I'm on call 12 hours a day and hoping things are very quiet next week. Best wishes to everyone else too!
Two live video productions, including one on the evening of 31st. I managed to push back on last minute infra/workflow changes (:
If you’re atheist or of a religion that isn’t relevant then why do you care? There’s no spiritual meaning. It’s just a day off. Get off the fence and own your convictions.
I wouldn't mind honestly. Seems like a good excuse to skip the social obligations.
We've been on freeze for weeks now in preparation for the holiday season.
I announced a downtime for a smallish GPU Cluster starting from christmas eve just a few hours ago. It is just the perfect time to schedule a day or two of downtime for a system like that. And if IPMI doesn't fail me, I can get a lot of things done without leaving the comfort of my home. I scheduled this without pressure from my boss. It was a totally voluntary decision... While being raised as a Christian, this time of the year is for me more about solstice then about the Christian clelbration. A time to enjoy the comfort of a heated home. A time to celebrate that the days are going to be longer from now on again. A time to reflect on the past year. And all of this is easily done while having a few terminals open and waiting for remote stuff to complete...
Thanks. Honestly, all I do is check emails once or twice a day and maybe respond to a DM if there's an emergency, but it's nice to be appreciated.
Thanks, been there many seasons, and same to you all.
> Thank you for keeping systems available and safe.

theres that word "safe" again. What systems are dangerous otherwise? Do you mean like traffic lights or something? The API serving ads to your mobile game isn't dangerous.

s.gif
'Safe' in the context of systems can mean hacking attempts, safe from data leaks and other emergencies relative to the system that may arise. It can refer to things that are dangerous for the system itself.

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK