Helping Common_schema Help Me
I’m a big fan of common_schema. It’s a really powerful and flexible tool, and I’m always looking for new ways to use it.
Last week I had to update millions of rows across many databases to tokenize some persisted URL values, and I remembered reading Baron Schwartz’s recent blog post about using the common_schema split feature. Baron’s use case was deleting data, but I figured this could work well to break my large updates into chunks too. I had already written the update statements I wanted to execute, and after five minutes reading the common_schema documentation I was ready to try it out on a dev database.
One of the queries I tried didn’t work in common_schema, but with a creative workaround I was able to trick common_schema into accepting it. Read on for the details.
The first query I wanted to run involved two tables, so I needed to specify which table to split on. In this case I’m updating the parameter table, so I specify that as my split table first, then specify the full update statement, and after that I select $split_total_rowcount so I can see the progress of the update after each chunk is done.
Here’s the code:
1 2 3 4 5 6 7 8 9 10 | |
It worked on dev and QA so I tried it out on my staging db and it worked perfectly, updating over a million rows in about an hour. Here’s the end of the output:
1 2 3 4 5 6 7 8 9 | |
The second query was even simpler, just a single table update. But when I ran this one with common_schema no rows were updated:
1 2 3 4 5 6 7 8 9 10 11 | |
I guess common_schema failed trying to split the table, so I tried a no op call to verify it’s splitting approach, and that returned nothing, which validated my hypothesis:
1 2 3 4 5 6 7 8 9 | |
The registration table has over 50 columns, and in addition to having an auto-increment primary key it also has other unique indexes, so of all my tables I’m not too surprised this one had a problem.
So now what do I do? I could submit a bug report for common_schema, or open up the code and try to fix it myself (and I’ll eventually do at least one of those two things), but I really wanted to make this work, so I decided to trick common_schema into doing it anyways. Since common_schema supports multi-table updates, I just rewrote my single table update as a gratuitous multi-table update so I could split on the second table.
First I tried joining to a parent table:
1 2 3 4 5 6 7 8 | |
But when I ran that it didn’t return anything for a few minutes, which was not a good sign. Rather than troubleshoot that query, I decided to try another join table.
Next I tried joining to a child table, using a where clause that I knew would give me a 1-1 relationship with the table I was updating:
1 2 3 4 5 6 7 8 9 10 | |
This time it worked! Here’s the end of the output:
1 2 3 4 5 6 7 8 9 | |
Now I can go back and look at the code and/or submit a bug report, but I was pleased that with a little help common_schema solved my use case. This was a reminder to me of the power of a good tool, and the value of being creative in the way you use that tool.
