{ height: 1%; } - Ruby on Rails and User Interface Design

CSS, UI Design, Ruby on Rails and cheese ... lots of cheese

Data Loss :(

Posted by Richard White Fri, 29 Jun 2007 09:28:00 GMT

We passed two major milestones yesterday: our millionth SlimTimer hour was logged and we had our first ever widespread incident of data loss. Ugh!

What happened:
  • Due to what I’ll call ‘operator error’ we lost all data on any tags or tasks created after June 28th 04:00 GMT
  • We stopped the server to repair the database using nightly backups to fill in the gaps the best we could
  • We created a script that would create ‘recovery’ or dummy tasks to be placeholders for the tasks that we lost and give us something to attach your time entries to.
Who’s affected:
  • 360 People had tasks that were lost and now have placeholder tasks, with names like ‘[Recovery 190]’, in their places
  • More people may have lost tasks if they were created yesterday but not linked to any time entries
  • All tags added yesterday to both tasks and time entries were lost.
What you need to do:
  • If you have any placeholder tasks you need to rename those and add any tags, coworkers or reporters that may be missing
  • Check over your entries from yesterday and re-add any tags that may have been lost.

If you’re one of those 360 people you’ll be getting an email from me shortly, but anyone should feel free to contact me with any concerns they have.

We’re working on making sure this doesn’t happen again and increasing our database backup regularity.

a million apologies..

Rich

Comments

  1. Mark said about 8 hours later:

    You guys rock. Total transparency…. builds trust

    Does lead one to think about a slightly more automated way to do backups. It’s one thing to do CSV dumps on a monthly basis by hand, but maybe a cron job that emails the users the CSV file each week or something.

    I’d Pay for that for sure!

  2. DK said about 8 hours later:

    You’ve wasted at least 15 mins of my time, but since you’ve saved me at least 40+ hours over the past year, I guess I’ll have to let it slide! :)

    Thanks for the info on what happened.

  3. Kristopher said about 8 hours later:

    That really sucks to lose that data, but props for the openness and getting it taken care of promptly. You guys rock.

  4. Nelson said about 8 hours later:

    Ouch! But you have an awesome site and we, your users, love you. Thanks for the hard work of trying to recover!

  5. Jason Coleman said about 8 hours later:

    Thanks for the update. I feel with you. I know how stressful these situations can be.

    Luckily I didn’t use slimtimer too much yesterday. Thanks for being honest with us.

  6. Arron said about 9 hours later:

    I’ve got to tell you, I just signed up yesterday to use the service and having it crash / loose data the same day I started using it…

    But! The transparency is great—without it I just would have found another timing solution. I’ll be sticking with you a little while longer to see how things work out.

  7. Alex said about 14 hours later:

    Thank you for letting us know so quickly! I have been using the service for months now, and never had any problem. This kind of things happen, we understand. Keep it up, we love what you are doing!

  8. Matt Platte said about 16 hours later:

    I’ve used that MySQL binary log/automatic replicator mechanism save me from myself.

  9. Richard White said about 20 hours later:

    @Matt: Yeah, Lance has used that as well and we’re actively looking into using that in the future (we’re on Postgresql currently).

  10. mike@kkl.com.au said 2 days later:

    Bummer ‘bout the loss, but like the others have said, being straight up counts for a lot.

    Let me know if you want any help / advice on backup for postgres – we’ve been using it for years & one of our guys is a committer.

  11. Chris Matthais said 4 days later:

    Well, really 360 appologies, right? I’m glad you sent me that email but I figured it out when I saw the recovered time in my reports. No harm done. Luckily for me, I was only working on one project that day and probably one task, so I wasn’t hit hard… but thanks for the post and the email to let me know.

  12. raand said 135 days later:

    Thanks for timely notice and being completely honest. Good work on some of the data recovery as well

Trackbacks

Use the following link to trackback from your own site:
http://height1percent.com/articles/trackback/3899

(leave url/email »)

   Comment Markup Help Preview comment