The borked backup: a tale of database disaster

Posted by Randall Wood on Saturday, August 23. 2014

This is a story with a happy ending, eventually. It’s the tale of how Google and some script-fu saved my bacon. There’s a moral, too: don’t be stupid, and you’re stupider than you think you are. That was my case, anyway, when a bear mauled my database.

I was running 4 websites on FreeBSD 9.0, and after two years of upgrading, it was time to upgrade the system to FreeBSD 10.0. That was an upgrade with some risks, and in my case, it went poorly. The system had been getting quirky anyway, so I figured it was time to just wipe it clean and install FreeBSD 10 fresh. I had downloaded to my local machine a full set of backups, so I was ready to go. As the new machine came up, disaster struck.

I’d settled on a set of bash scripts to back up the databases via mysqldump, one each time. For the wildspaces database, I simply copied another script and changed the destination file. Unfortunately, I didn’t change the database name itself, and the script was producing a wildspace.sql.tz file with the contents of another database. My backup was worthless: disaster. I looked around to see what else I could work with. I found an extremely old database backup from three years previous, when I was running Joomla instead of Serendipity. It was a different database but at least had the content of the first 10 years of my site. I also had a folder of text files I’d kept, each one containing one un-formatted entry devoid of hyperlinks and some later corrections. But for the last year I hadn’t saved any text files, and feared those files would be totally lost. Still, there were at least some means of restoring my fourteen years’ worth of content, although it was going to be laborious.

First, a fresh install of Serendipity and some comparison showed where the content was coming from, and where it would have to go. The table fields matched up like this:

Joomla (table: jos_content)	Serendipity (table: serendipity_entries)
id	id
title	title
created	timestamp
introtext	body
fulltext	extended

It would’ve been easier to just use a ‘select/insert into’ sequence to just dump the Joomla content into the wildspaces (Serendipity) database, but unfortunately I needed the recreated system to stay true to the original URLs. It was essential then to map the Joomla content to the now-disappeared Serendipity URLs. Google came to my rescue.

I had to act fast, but my system had eaten shit only a day or two ago, and Google had cached my whole site. By referring to a list of articles I’d written, I could do a search like this, and Google would show me its cached entry. Time was passing though, and the next time Google’s robots passed they would update the cache with the now mauled-bear entries. I did 270 Google searches (so many, in fact, that Google made me fill out a captcha after about 100 to see if I was human or a script: that’s something I never even knew happened).

But in order to get the old content into the right slots in the database, I thought it was easiest to create 270 ‘stub’ entries, and then update them by inserting the proper content. I created one legitimate entry with the text "This website was mauled by a bear. Please be patient as we rebuild." Then from the mysql command prompt I copied it 270 times so I’d have a database full of empty entries, into which I could insert the proper text from the old (Joomla) table.

The trick to duplicating an existing row in a table is to copy it to a separate (temporary) table, drop the ID column from that temporary table, and then re-insert it into the original table. Having dropped the ID column allows the original table to assign it an auto-incremented ID as it receives the data. To run this script 270 times, I saved it as a text file, and then sourced it from the mysql command prompt.

mysql> source MYFILE.sql

Then I pressed the up arrow to retrieve the previous command in mysql’s command history (i.e., to rerun this command). Repeat 270 times. Here’s the SQL:

create temporary table tempo select  from serendipity_entries where id='1';
alter table tempo drop id;
insert into serendipity_entries select null,tempo. from tempo;
drop table tempo;

Now that I had 270 blank entries and a map showing the Joomla article ID and the Serendipity ID, I could run a loop in a bash script to query myself and transfer the articles’ text and date, one by one. The only nuance in the below bash script and sql command is a unix_timestamp clause that converts the time from one format to another when transferring.

#! /usr/local/bin/bash
while :
do
echo "Enter ID number from Joomla we will move."
read JOOMLA
echo "Enter ID number in S9Y to which we'll move it."
read SEREN

echo "Article title, date:"
mysql -u root -PASSWORD -e "select id, title, created from jos_content where id=$JOOMLA;" joomla

echo "From:  $JOOMLA     To: $SEREN"
echo "Hit 0 to commit or anything else to bail out with shit pants"
read ANYFUCKINGKEY
if [ "$ANYFUCKINGKEY" = "0" ]; then

mysql -u root -pPASSWORD -e " update wildspaces.serendipity_entries, joomla.jos_content set wildspaces.serendipity_entries.title = joomla.jos_content.title, wildspaces.serendipity_entries.timestamp = unix_timestamp(joomla.jos_content.created), wildspaces.serendipity_entries.body = joomla.jos_content.introtext, wildspaces.serendipity_entries.extended = joomla.jos_content.fulltext where (wildspaces.serendipity_entries.id='$SEREN' and joomla.jos_content.id='$JOOMLA');" wildspaces

else
        break
fi
done

There it was: at the end of the day, I had a 270-row serendipity database into which I’d transferred everything left in that ancient backup. I used the collection of text files to populate another 3 years’ of writing. That left the final year, in which I’d neither saved text files nor been able to restore the data from the old Joomla backup. Again I turned to Google.

By searching again, I was able to quickly retrieve cached versions of the remaining pages, save one. That last search turned up not my original articles but the new stub ("This site was mauled by a bear") articles. So I was glad I’d acted quickly. Google’s robots were already updating their cache, and my old article was off to the bit bucket in the sky. It was easy enough to recreate from memory, as it wasn’t a difficult article.

I’ve still got a lot of work to do: the next step is reassigning each article’s pictures, updating the hit counter where possible, and doing a lot of formatting. That will give me several weeks penance in which to reflect upon what a bonehead I was – and to think about the morals of this story:

You are not as clever as you think you are. Or I’m not, anyway. Never assume you have mastered a Unix system well enough to avoid disaster.
Don’t trust your backups. You’ve got to check them, especially before relying on them!
Copying one backup script in order to produce a new one is a risky move. You’ve got to really check you’ve correctly modified everything that needs modification.
It seemed like extra work at first, and even redundant, but actually, the discipline of saving each posted article as a text or HTML file turned out to be useful and valuable. I’ll keep doing that (starting with this one!)
I do a lot of bitching about Google’s avaricious collecting and caching of data, but they sure saved my bacon this time. So thanks, Google, really! (But can we talk about Android permissions some day?)

Now pardon me as I get back to work cleaning up this site. You see, a bear mauled it.

Into the Wild Spaces

The borked backup: a tale of database disaster

Trackbacks

Comments