MPoMddHandout < GNHLUG

GNHLUG > MPoMddHandout

GNHLUG webs: Main | TWiki | Sandbox Log In or Register

Written and presented by Bill Freeman to MonadLUG as "Man Page of the Month" 10-May-2007:

dd copies data with extra control and optional conversions.

dd is from the time of raw devices that couldn't correctly support reads and/or writes of arbitrary sizes. Disk drives, for example, must write the entire block, so the actual writes going to the hardware must be a multiple of the block size. Even disk reads really always happen over whole blocks, otherwise the block based ECCs are useless. Tape drives were also fussy in a different way, since most supported arbitrary block length, but had poor performance if the blocks were too small (which is the reason that tar - TAPE archiver - allows you to specify blocking factor). I believe that most devices on a linux system today take advantage of kernel buffering to achieve a suitable operation block, but someone can always write a driver for something weird that doesn't.

Thus dd accepts the bs argument, letting you specify the size of IO operations. Actually specifying bs is just equivalent to specifying ibs, the size used for reads, and obs, the size used for writes, to have the same value. But you can also set them individually, so if you have to, you can copy from a device using one block size to a device using even a mutually prime block size. dd provides any necessary buffering. You will usually want to specify these if the data set is large, since the default is, on the current gnu implementation, anyway, 512 byte reads and writes, which means lots of system calls, and poorer performance than if you used a large bs. You need not worry that the total size isn't a multiple of bs, because dd will successfully copy the runt at the end (if the files in question support it).

Something to note about dd is that, unlike most commands, options that take arguments do not have a leading "-". (In fact, any options at all that take leading "-" are recent additions, at least compared to my time with *nix.) Thus you say "bs=2048", NOT "-bs 2048" or "--bs=2048".

The arguments to bs, et al, (and indeed all numeric arguments) accept a suffix letter which acts as a multiplier. Thus "bs=2048" can also be written "bs=4b" (where b is for standard 512 byte blocks), or "bs=2k". "M" is also available. Modern gnu versions also support "K" and "KiB" as synonyms for "k", "MiB" for "M", and "G" and "GiB" are also synonyms. "kB", "MB" and "GB, are available for the power of 10 versions (as opposed to powers of 2, as in the xiB versions). And there are the corresponding "T" variations for 10**12 or 2**40, and successively for additional factors of 1000 or 1024, "P", "E", "Z", and "Y". From the old days we still carry the pretty usless "c" (times 1, characters) and "w" (times 2, words), should you run into them in a script.

Even if the file is small or performance really doesn't matter, ibs is useful in that it sets the unit for the "count" and "skip" options, and obs is useful because it sets the unit for the "seek" option. These are useful for making files of given sizes, dropping the header of a file, and for patching a portion of a file. Yes, you can do that with a descent editor, but that gets harder to automate, and a lot of editors have trouble with binary and/or very large files (especially if you don't have the available memory). Sadly, there isn't a seek option for the input file, so you are stuck waiting for "skip" to read past it, time consuming if that's a long way. (There may be some implementations with "iseek", but the gnu one doesn't have it.) This is sad if the file being read has unreadable portions before the desired data.

"count", if you haven't guessed, allows you to stop before the end of the input file, particularly useful with /dev/zero, don't you think? Or, make a backup of your boot record before playing with an old Windows install disk:

sudo dd bs=1b count=1 < /dev/hda > /mnt/floppy/bootsave.blk

While dd uses stdin and stdout by default, and redirection with "<", ">" and ">>" do work, you can also specify the input file with "if" and the output file with "of". For example "if=/dev/zero" or "of=foo.bar". This makes it easier to exec dd from another program, and reduces the impact of using a weird shell that doesn't get redirection right, but most of us won't suffer that very often. It might improve readability of a script in some folks eyes in that the files can be specified near any related options, rather than having to wait until after the last option, as must be done with redirection.

A more significant difference is that, since dd opens the file, the "iflag" or "oflag" argument can give additional control over the open mode. The argument to these is a comma separated list of a subset of the following symbols (no white space is allowed): append, direct, dsync (doesn't uses synchronized I/O for metadata), sync, nonblock, nofollow (symlinks), or noctty (at least as a result of this file's opening). See the open system call man page and possibly man pages for specific device files to see exactly what the more obscure of these mean. Letting dd handle the opens also allows some of the conversion options, below.

But wait! There's more! (It's not sold in any store!)

dd will also perform conversions on the data as it goes by, through the use of the "conv" option. Multiple conversions can be specified, when they make sense, as with flags, comma separated with no white space.

Some obvious conversions include "lcase" and "ucase". "ascii" assumes the input is EBCDIC and produces the corresponding ASCII. "ebcdic" goes the other way. "ibm" is like "ebcdic", but converts to "alternate EBCDIC". Slightly "nocreat" requires the output file to already exist, "excl" requires the output file to not exist, "notrunc" prevents truncation of the output file (redundant with "oflag=append"?). "swab" exchanges the order of each pair of input bytes. "noerror" continues past read errors.

A few of the conversions depend on the separate setting of the conversion block size, using the "cbs" option, which is independent of "ibs", "obs" or "bs". There's a lot of bs in this utility.

The "block" conversion takes newline as delimiting records, discards the newline, and pads with spaces to make the record written be cbs bytes long. "unblock" carves the input file into records every cbs bytes, discards any trailing spaces, and adds a newline, reversing the process. Now you can work with a punched card reader: "conv=unblock,ascii".

"sync" (probably related to the "sync", "dsync", and "nonblock" settings for "iflag") pads with NULs to make the input blocks actually be ibs long. Note that ibs otherwise just specifies how big a read to ask the kernel for. The read system call can return fewer bytes (especially if non-blocking I/O is specified. If used with the "block" or "unblock" conversion, pads with space instead.

"fdatasync" and "fsync" suffer in my copy of the man page on feisty fawn of formatting problems that make them look like part of the description of "sync". The info page reads OK. These do a sync system call at the end, hastening the writing of data buffered in the kernel. The difference between them is that "fsync" also requests that the file metadata also be written, while "fdatasync" only requests that the file data be written.

When all is said and done, dd prints a summary of IO. It tells how many ibs sized records it read (i), and how many partial records there were at the end (j, always 0 or 1?), like so: "i+j records in". Similarly, for output, in terms of obs sized records: "o+p records out". In older dds that's all. In gnu dd on feisty, another line describes total bytes copied, the time it took, and the calculated rate. Specifying the "status" option with the "noxfer" value eliminates the extra gnu summary line. In any event, such of this as is printed is printed to stderr, so it won't be part of the output file even if you used redirection rather than "of".

You can get a long running dd to print summary information for how far it has gotten by sending it the USR1 signal.

gnu dd supports --help and --version options, the only "-" introduced options that it supports.

There wasn't much in this man page that I didn't know already, and which sound likely to be useful. I might use block and unblock conversions or iflags and oflags, but by the time I needed nonblocking I/O and sync, I'd probably be writing in python, rather than trying to shell script. But you never know.