[TYPO3-typo3org] Post Mortem gerrit outage from 2014-01-13 late evening through 2014-01-14
aleichsenring at ab-softlab.de
Sun Jan 18 20:26:06 CET 2015
Hello Steffen and Peter and Lieuwe and all the others involved,
thank you for dealing with the situation, recovering all the data,
sacrifizing sleep and time and nerves.
Your engagement is highly appreciated.
On 14.01.2015 18:10, Peter Niederlag wrote:
> Dear TYPO3 Contributors,
> As reported earlier we had a severe crash on our server due to a power
> outages in the data center on Monday. As it seems we still see some bad
> Late on Tuesday 2014-01-13 we noticed fatal errors in one of gerrit's
> central database tables. Due to this reason we had to shut down gerrit
> for disaster recovery.
> At first we spent an estimate of ~10 hours in order to fix the problem.
> Around noon on 2014-01-14 we decided to just restore one specific table
> from a 24h old backup. At this time we were pretty sure to find a way to
> recreate the missing data later on in a progammatic way. After we
> restored the table gerrit was put back into production around 3 PM GMT+1.
> Lieuwe Hummel, one of our community members, noticed our problem on
> twitter and sent us a bash script he once used to fix things in a
> similar situation. We adapted the script to our setting and use case and
> have been able to restore all patch requests that had been submitted on
> 2013-01-13 within another four hours of work.
> Thx Lieuwe!
> Lessons learned? InnoDB can be very tricky in case of severe failures.
> Great thanks also to Steffen who spent half the night trying to bring
> back the data with percona disaster toolset.
> Peter Niederlag
> For those interested CHECK TABLE reported 'InnoDB: The B-tree of index
> "PRIMARY" is corrupted'.
More information about the TYPO3-team-typo3org