An investigation was launched to ascertain the exact circumstances around his death. For something which is very, very rare. But did-you-know that a statistically significant proportion of drivers change their behaviour around riders wearing helmets? Now that the Google December 2020 core update is officially done rolling out, I am confident to say that it looks like this site, the Search Engine Roundtable, has … [1] (That's not a reason to not wear helmets, everyone should ATGATT; it's a reason to change driver behaviour through other incentives). All tests passed. To be honest, I never understood the point of companies publishing post mortems after outages. I really don’t understand what was under a quota and what it means that the quota had a grace period. And I’ve got the scars to prove it. "I hated that app (vscode) on my last laptop, it was very slow and bloated, and I've been considering switching to something else. This… Get started. Just because you can do something, doesn't mean you even remotely should. A Monday rollout sounds horrible. But complex systems. The lesson learned might be different if it wasn't global ("prevent fast changes to the quota system for the auth service") but the conclusion would be substantially similar - there is usually no good reason, and plenty of danger, for routine adjustments to large infrastructure to take place in a brusque manner. That's not even a knock on them, but it is a bit disheartening to see people suggesting massive MTA behaviour/RFC alteration for no reason. > A configuration change during this migration shifted the formatting behavior of a service option so that it incorrectly provided an invalid domain name, instead of the intended "gmail.com" domain name, to the Google SMTP inbound service. >both of those migrations failed with symptoms suggesting that whoever was performing them did not have deep understanding of systems architecture or safety practices and there was no one to stop them from failing. Post navigation ← The Fernwood Doe. Post-mortem examination . Configuration issues can be, and often are, temporary. The difference is that google more often builds with some form of guaranteed read-after-write consistency, while AWS is more often 'fail open'. At a nominal fee, of course. It's the corporate equivalent of taking personal responsibility for your own safety. It's almost as disappointing as the fact that their status page doesn't redirect from HTTP to HTTPS. In general, AWS more often shifts the harder parts of global distributed systems onto their customers, rather than solving them for their customers, like GCP does. SMTP 550 was the next day. That'll work out well. On a 5xx series response, you bounce. Tweet. When is the message requeued to send? CLICK HERE. Quotas are one way to isolate this impact to that service in particular. You can find this here https://news.ycombinator.com/item?id=25438169, The topmost commenter also replied to you that there was that previous discussion here https://news.ycombinator.com/item?id=25473468. It's a bit counter intuitive but if you have a few incidents during a year, the probability to have two incidents the same week is a lot higher than what you would expect. It took several hours and some back and forth with support to realize that the burst IOPS quota of the provisioned underlying EBS disks on the EC2 instances forming the ECS cluster had been depleted so disk performance completely tanked to the point the docker agent couldn’t be reached for 4 minutes. That's the ticket to a nice robust revenue stream! Original sender sees the message was not sent successfully. Chart; Votes; Other upcoming events; Other past events; Related News; … AWS updated their quota management to per-full-prefix, finally though. Even if you called support, there was not a lot they could do, because the rate limiter is a low-level thing built into an internal load balancer somewhere, and it is difficult to override it for a single account. Nope, halfway through the same issue occurs. Others upthread were advocating that MTAs don't bounce on an initial 5xx failure, regardless of my assertions that the client/end user should receive an immediate 5xx perm failure message. Turns out that EFS rations out IOPS per GB of stored data, and this particular application stored only a few MB at the time, because it was brand new and hadn't accumulated anything yet. Some screens do so many calls that you'll hit the rate limit in a matter of minutes if you leave your browser on those. (Presumably they wanted to make the status page depend on as few services as possible, to prevent a scenario where an outage also affects the status page itself, but whatever script they are using to publish updates to the page could also perform a check that the HTTPS version of the site is accessible, and if not, remove the redirect). if there's an urgent security bugfix. On July 14th of 2014 @voxdotcom tweeted a graph that showed that 68% of Americans thought that elections are rigged, to which Joe Biden’s 2020 presidential campaign manager replied, “That’s because they are.” Unfortunately, we can’t post photos on OD, but if you search for it online, I am pretty sure you can find it. This "unclear" reason can hide a bigger issue like a security bug fix, or just an important migration to go somewhere. I think you're on to something. Ended up just creating a 100GB file, or something, just to get the IOPS I needed until we could migrate off EFS. 1K Followers. Instead, these "solutions" trade a moderately difficult storage compression problem at the service provider end for a physically impossible time travel problem on the consumer end. The login page prompts for email again and again, never makes it to the password. > Mail is decades old, built upon millions of hours of work crafting software, RFC standards, and works the way it does, including bounces, to ensure stability. 5 seconds later? image copyright Google. The fact is, they did. i. Some people do understand the dep stack. Officers were called to a hotel in the Lakeside area at around 2pm on Friday 18 December following a call made to the ambulance service that a 43-year-old man had died. Slowing down actuation of prod changes to be over hours vs. seconds is a far cry from the large org / small org problem. 12/28 Post-mortem Dev Update "12/28 Post-Mortem | On Dec-28–2020 at 08:08:12 AM +UTC an exploit was abused on Cover Protocol's shield mining contract (Blacksmith)..." Proof Source | Exchanges.