Chain Copying to Multiple hosts
This week I was given the task of repopulating our entire primary database cluster. This was due to an alter that had to be performed on our largest table. It was easiest to run it on one host and populate the dataset from that host everywhere.
I recalled a while back reading a blog post from Tumblr about how to chain a copy to multiple hosts using a combination of nc, tar, and pigz. I used this, with a few other things to greatly speed up our repopulation process. As I was repopulating production servers, I did a combination of raw data copy and xtrabackup streams across our servers, depending on the position in our replication setup.
For a normal straight copy, here’s what I did:
On the last host, configure netcat to listen and then pipe the output through pigz and tar to uncompress and untar. This needs to be run in the destination directory:
nc -l 1337 | pigz -d | tar xvf -
On any hosts in the middle of the chain, you do the same thing with one extra step. Using a fifo to redirect the stream to the next host:
nc next_host_in_chain 1337 <copy_redirect &
nc -l 1337 | tee copy_redirect | pigz -d | tar -xvf -
And on the source host you actually make the stream. This is where I differed the most from what Tumblr had written. I added a progress bar using pv.
tar -c /data/mysql/ | pv --size $( du -sh /data/mysql/ | cut -f1 ) | pigz | nc first_host_in_chain 1337
To do this with an xtrabackup stream, the commands are similar. On each host, tar needs to add the “i” flag (to become “tar xvfi -“). The progress bar here became slightly less accurate, but was still a good rough estimate of the progress. On the source host, the command became:
innobackupex --stream=tar /tmp/ --slave-info | pv --size $( du -sh /data/mysql/ | cut -f1 ) | pigz | nc first_host_in_chain 1337
I found that using this method, for a raw copy, I was able to achieve between 300 and 350 MB/sec copying large tables. Smaller tables averaged slower speeds. I didn’t do enough testing here to see where the bottleneck was. I can say that it was not network, cpu, or io. Our servers involved have 10 GBit network and FusionIO drives. Increasing the compression level may have helped add some throughput here as well. Copying a 1.4 TB Dataset to 3 destination servers took under 2 hours.
This is definitely a tool that I will be adding to my arsenal to use on a regular basis.
- RT @POTUS: Thank you for everything. My last ask is the same as my first. I'm asking you to believe—not in my ability to create change, but… 6 days ago
- RT @ParkerMolloy: "Only ones who care about my tax returns are the reporters." Retweet this if you're not a reporter, but you care about T… 6 days ago
- RT @Benoo_Brown: *enters password* WRONG. WRONG. WRONG. WRONG. *resets password* NEW PASSWORD CAN'T BE YOUR OLD PASSWORD https://t.c… 1 week ago
- This is just cruel… two significant people in one family pass within days of each other. *hugs* to the surviving f… twitter.com/i/web/status/8… 2 weeks ago
- How can the doctor not know that Clark Kent is superman? 3 weeks ago