When I was young, I downloaded a lot, and burned a lot of CDRs and DVD-Rs. Nothing would ever fit onto them, so I’d have to split movies up, or put some episodes of an anime series here and another there, and that disc had a few megabytes free, so there went a few more episodes… this was a major pain in the ass.
I have also since amassed a 600GB music+photos collection, which I’d hate to lose to silent data corruption, drive failure, ransomware, theft or whatever. No matter how many layers Bluray had, it just wasn’t enough – imagine figuring out how to split all this between 12 BD-R DLs.
The answer: a Quantum LTO4 drive (~200EUR). It’s much longer than a Bluray drive, much noisier, much hotter, and even requires an additional SAS HBA (so all in all, very exotic and sexy). Each tape is 800GB and costs 10-20EURs, and unlike DVD-Rs going bad, tape is much more reliable (all digital movie footage, especially digitized film, is stored on tape these days).
How it works
The drive writes data on one small part of the width of the tape, winds to the end, shifts the head down ever so slightly, and winds backwards. It’s pretty smart actually – it means if you finish writing, you don’t need to rewind; and if you need something in the middle, you don’t have to wind through so much tape.
An LTO4 drive makes 896 such tracks on the tape.
How it really works
Of course it’s not as easy as that. For starters, a read/write head can only work so fast. So you have to write several tracks at once. And coordinating the writing of multiple tracks such that they still end up in a back-forth pattern?
Also, you don’t need that much surface area to read the magnetic recording, but the writing head cannot actually write that small. So when writing the next track, you have to overlap the previous one slightly to save space. This is called shingling, and some hard drives do it too.
Verifying the data after it is written it is important too. Simply place another read/write head behind the current read/write head, and it can read the tape right after the head in front writes it.
The gory details are found in the IBM research paper LTO – A better format for mid-range tape.
Blocks and Filesystems
To the OS, though, this is all irrelevant. The drive pretends the tape looks like this:
This is nothing new, though. CDs, DVDs and Blurays store data in a spiral track, starting from the inside of the disc to the outside. Records start from the outside and go inside. Hard drives store data in concentric circles tracks instead.
But who wants to deal with so many bytes individually? Plus, it’s difficult for computers to keep track of arbitrarily large numbers. Better to group bytes into blocks. Let’s say, blocks of 128KB.
800,000,000,000 / 128000 = 6,250,000
There, that’s a much more manageable amount. We can say
pornmovie.avi occupies blocks 0-1500, and
smallfile.txt occupies block 66704. Of course, if
smallfile.txt is only 1KB, we just wasted 127KB of space, but it’s a good enough tradeoff.
And of course, at this point, there is only data at 0-1500, and 66704, but so far only we, the humans, know that there is something at 0-1500, and it’s supposed to be
pornmovie.avi. This fact still needs to be stored somewhere!
Let’s reserve blocks 6,000,000 – 6,250,000 for this kind of information. This is called a filesystem, like FAT32/NTFS/HPFS/APFS/exFAT/ext2-4/ZFS/btrfs/ReiserFS/XFS. There are lots of them out there.
Unfortunately there is only one filesystem for tape (LTFS), and it is only for LTO5 and above, so I had to keep track of all this by hand. Let me tell you I was not happy having to learn all this – but I am grateful.
After dealing with all this shit I went on eBay and bought a LTO5 drive for LTFS, so I wouldn’t have to deal with blocks anymore, or accidentally overwrite the last bit of my previous file with the next file.
/dev/nst0 same thing, except after the command,
st0 will rewind to the beginning of the tape. Use it if you’re only going to store one backup/file and need to automate the whole process.
Always do this after putting the tape in the drive. The block size is up to you.
mt -f /dev/nst0 stsetoptions scsi2logical mt -f /dev/nst0 setblk 128k
This line used to work, but now dd errors out with Invalid Argument. This really pisses me off.
Somehow the blocksizes must be a multiple between the
mt -f setblk and
tar cf - /data/music -b 256 | mbuffer -m 4G -P 100% | dd of=/dev/nst0 bs=256k
This works just fine. You don’t really need a RAM buffer.
tar cf /dev/nst0 /data/music -b 256
tar has a ‘blocking factor’. If it doesn’t match the block size, you will get this error at the end:
tar: /dev/nst0: Cannot write: Invalid argument
[shinichi@cell 860evo]$ tar cf - data/ | md5sum 0c3889da7b395803ed3837767876165e - [shinichi@cell 860evo]$ tar cf data.tar data/ && md5sum data.tar 0c3889da7b395803ed3837767876165e data.tar [shinichi@cell 860evo]$ tar cf - data/ -b 256 |md5sum 33ea43c0fceafbf47d6846a34466ff9e -
tar -b 256,
dd bs=128k doesn’t affect actual data! It only affects read/write speed.
[shinichi@cell 860evo]$ dd if=data.tar bs=256k | md5sum 449+0 records in 449+0 records out 117702656 bytes (118 MB, 112 MiB) copied, 0,199188 s, 591 MB/s 33ea43c0fceafbf47d6846a34466ff9e - [shinichi@cell 860evo]$ dd if=data.tar bs=128k | md5sum 898+0 records in 898+0 records out 33ea43c0fceafbf47d6846a34466ff9e - 117702656 bytes (118 MB, 112 MiB) copied, 0,217331 s, 542 MB/s [shinichi@cell 860evo]$ dd if=data.tar | md5sum 229888+0 records in 229888+0 records out 117702656 bytes (118 MB, 112 MiB) copied, 0,220926 s, 533 MB/s 33ea43c0fceafbf47d6846a34466ff9e -
Figuring this out cost me days, and it would’ve cost me more if I hadn’t plugged in a SSD. I can’t wait for my LTO5 tape drive.
But I have unlimited, reliable storage
In the end, though, I managed to write the entire filmography of the Shaw Brothers studio and Studio Ghibli to tape, freeing up 300GB. I don’t have to worry about dropping the tape. And if I need 800GB more storage, it’s just 10-20EURs away.
I’m still longing for the day when I can just drag and drop the files onto 1.5TB LTO5 tape.