Get to Know tar, Linux’s Original Packaging Format
Back in the day, regular file backups were recorded on a magnetic tape. It made sense to gather all the files requiring backup into a single file and then dump it out to the tape drive, all at once. Tape drives didn’t like to start and stop so doing everything in one shot cut down on drive wear and tear and the time needed for getting the data onto the tape. Although tapes are long gone, backups are still important and consolidating an archive into a single large file is still effective.
The chief tape archiver in the Linux/Unix world was tar, which originally stood for Tape Archiver. It is still pretty much universally used as a packaging mechanism, where multiple files can be gathered together under a single file. After creating an archive file, you can then copy the file to your backup drive, a USB thumb drive or send it to a network storage device.
Lots of Linux distributions and applications roll their file sets up into a single file for easy downloading. The tar command has a million options. We’ll look at a few of the basics today.
Creating a tar File
Suppose I’m sitting on my Linux notebook and I want to transfer a collection of files to another Linux machine on the network. Those files include the .deb files needed to build the LibreOffice application, a graphic file and an old conference tech talk slide stack. I could transfer them using rcp, but moving individual files that way is certainly tedious.
First, all the files went in a directory called “examples”. In my situation, the .deb files actually reside in a subdirectory, as does the .readme file.
On the command line, move down into the example directory.
robnotebook% cd examples
Notice the individual files and the DEBS and readmes directories. tar can handle multiple levels of directories.
Next, use the tar command with the create (-c option) to build a .tar file. Using a “*” grabs all the files and directories and puts them in a resultant tar file, in this case rob.tar. The file (-f option) tells tar what you want to call your archive. tar will silently do its work and return you to the command line when finished. On large archives, it might take a while to come back, so be patient. If need be you can crash out of tar with a CTRL-c.
robnotebook% tar -cf rob.tar *
You can also specify what you archive with a list.
robnotebook% tar -cf rob.tar DEBS readmes steampunk06192016-01.png talk-oscon2014.odp
In addition to bundling up a collection of files and directories into a single file, we can also compress the output. Several compression schemes are available, such as gzip (-z option) and bzip2 (-j option). Some are fast with little compression and some are slow with more compression.
robnotebook% tar -czf rob.tar.gz *
This example produces a tar.gz file with the gzip compression algorithm. In this case using gzip only reduced the output file size by 3 MB, going from 224 to 221 MB. Compression varies according to the number of directories, types of files and other factors.
Occasionally, we might want to exclude a file or files from a tar archive. Use the “–exclude=” option.
robnotebook% tar -cf robex.tar --exclude=*.tar *
In this case, the robex.tar excludes any previously created tar files.
If you use the verbose (-v) option when creating a tar file you can kind-of see the progress of the command by watching the file names go by as they are written to the tar file. Other archiving programs give you a progress meter. tar doesn’t really have that function.
We’ve looked at creating tar files. Creating is only half of the equation. We also need to know how to turn the tar file back into directories and files.
In the simplest form, you extract the files from a tar file into the original file/directory structure, with the following Linux command line.
robnotebook% tar -xvf [tarfile.tar]
The “x” option stands for extract, while the “v” option gives verbose output. The “f” designates the file name. The normal output is just a list of the files extracted that scroll up the screen.
Likewise, you can run your tar file through the gzip (-g option) and b2zip (-j option) for decompression. Lately a simple tar -xvf [filename] seems to work, at least on a fairly current version of Xubuntu. You may still have to use the -g, -j and other decompression options on different revisions (older) of tar.
You may want to see what’s in a tar file before extraction. Use the -t option to get a file listing of everything in the archive. Add the -v option to get file sizes, creation date and so on.
Mind the File Sizes
You should be mindful of disk usage any time you work with archive files. They can get huge. For example, if you plan to backup 1GB of data made up of various files and directories, you will need enough space on your disk to hold the newly created tar file. Bottom line, don’t fill up your 100 GB disk up then expect to be able to create an all-inclusive tar file, on that same disk. It might not be a concern if you have three other empty 100+ GB partitions or external drives. That’s one reason I usually break up my drives into multiple partitions.
In my example above, the amount of space used by the directories and files is as follows.
DEB – 207 MB
readmes – 24 Kb
steampunk06192016-01.png – 476 Kb
talk-oscon2014.odp – 16 MB
Adding them up comes out to about 223 MB of disk space. Using tar without any compression creates an archive file of about that same size. Using compression it might be smaller.
It’s always a good idea to clean your archive files off to an external storage source anyway, perhaps even to multiple locations, just to be safe in case of disaster.
Use the df and du commands to keep track of your disk space and estimate if you have enough room on your drive for a large tar file. You might want to regularly clear out any old unneeded tar files in your “Downloads” directory or after you’ve copied a tar file to another machine. Having an unchecked list of 1-2GB tar files can eat up a disk pretty quickly.
tar is a great program for rolling up a bunch of files into a single unit you can copy over to another location or machine. It’s commonly used for transferring disk images and application builds.
We’ve only scratched the surface with tar. Do a little poking around on the web and you’ll find all kinds of tips and tricks, since tar has been around since forever. We’ll explore some of the other options in a future tutorial.
Contact Rob “drtorq” Reilly for consultation, speaking engagements and commissioned projects at firstname.lastname@example.org or 407-718-3274.
Feature image by siala from Pixabay.