Written and presented by Bill Freeman to
MonadLUG as "Man Page of the Month" 10-May-2007:
dd copies data with extra control and optional conversions.
dd is from the time of raw devices that couldn't correctly support
reads and/or writes of arbitrary sizes. Disk drives, for example,
must write the entire block, so the actual writes going to the
hardware must be a multiple of the block size. Even disk reads really
always happen over whole blocks, otherwise the block based ECCs are
useless. Tape drives were also fussy in a different way, since most
supported arbitrary block length, but had poor performance if the
blocks were too small (which is the reason that tar - TAPE archiver -
allows you to specify blocking factor). I believe that most devices
on a linux system today take advantage of kernel buffering to achieve
a suitable operation block, but someone can always write a driver for
something weird that doesn't.
Thus dd accepts the bs argument, letting you specify the size of IO
operations. Actually specifying bs is just equivalent to specifying
ibs, the size used for reads, and obs, the size used for writes, to
have the same value. But you can also set them individually, so if
you have to, you can copy from a device using one block size to a
device using even a mutually prime block size. dd provides any
necessary buffering. You will usually want to specify these if the
data set is large, since the default is, on the current gnu
implementation, anyway, 512 byte reads and writes, which means lots of
system calls, and poorer performance than if you used a large bs. You
need not worry that the total size isn't a multiple of bs, because dd
will successfully copy the runt at the end (if the files in question
support it).
Something to note about dd is that, unlike most commands, options that
take arguments do not have a leading "-". (In fact, any options at
all that take leading "-" are recent additions, at least compared to
my time with *nix.) Thus you say "bs=2048", NOT "-bs 2048" or
"--bs=2048".
The arguments to bs, et al, (and indeed all numeric arguments) accept
a suffix letter which acts as a multiplier. Thus "bs=2048" can also
be written "bs=4b" (where b is for standard 512 byte blocks), or
"bs=2k". "M" is also available. Modern gnu versions also support "K"
and "KiB" as synonyms for "k", "MiB" for "M", and "G" and "GiB" are
also synonyms. "kB", "MB" and "GB, are available for the power of 10
versions (as opposed to powers of 2, as in the xiB versions). And
there are the corresponding "T" variations for 10**12 or 2**40, and
successively for additional factors of 1000 or 1024, "P", "E", "Z",
and "Y". From the old days we still carry the pretty usless "c"
(times 1, characters) and "w" (times 2, words), should you run into
them in a script.
Even if the file is small or performance really doesn't matter, ibs is
useful in that it sets the unit for the "count" and "skip" options,
and obs is useful because it sets the unit for the "seek" option.
These are useful for making files of given sizes, dropping the header
of a file, and for patching a portion of a file. Yes, you can do that
with a descent editor, but that gets harder to automate, and a lot of
editors have trouble with binary and/or very large files (especially
if you don't have the available memory). Sadly, there isn't a seek
option for the input file, so you are stuck waiting for "skip" to read
past it, time consuming if that's a long way. (There may be some
implementations with "iseek", but the gnu one doesn't have it.) This
is sad if the file being read has unreadable portions before the
desired data.
"count", if you haven't guessed, allows you to stop before the end of
the input file, particularly useful with /dev/zero, don't you think?
Or, make a backup of your boot record before playing with an old
Windows install disk:
sudo dd bs=1b count=1 < /dev/hda > /mnt/floppy/bootsave.blk
While dd uses stdin and stdout by default, and redirection with "<",
">" and ">>" do work, you can also specify the input file with "if"
and the output file with "of". For example "if=/dev/zero" or
"of=foo.bar". This makes it easier to exec dd from another program,
and reduces the impact of using a weird shell that doesn't get
redirection right, but most of us won't suffer that very often. It
might improve readability of a script in some folks eyes in that the
files can be specified near any related options, rather than having to
wait until after the last option, as must be done with redirection.
A more significant difference is that, since dd opens the file, the
"iflag" or "oflag" argument can give additional control over the open
mode. The argument to these is a comma separated list of a subset of
the following symbols (no white space is allowed): append, direct,
dsync (doesn't uses synchronized I/O for metadata), sync, nonblock,
nofollow (symlinks), or noctty (at least as a result of this file's
opening). See the open system call man page and possibly man pages
for specific device files to see exactly what the more obscure of
these mean. Letting dd handle the opens also allows some of the
conversion options, below.
But wait! There's more! (It's not sold in any store!)
dd will also perform conversions on the data as it goes by, through
the use of the "conv" option. Multiple conversions can be specified,
when they make sense, as with flags, comma separated with no white
space.
Some obvious conversions include "lcase" and "ucase". "ascii" assumes
the input is EBCDIC and produces the corresponding ASCII. "ebcdic"
goes the other way. "ibm" is like "ebcdic", but converts to
"alternate EBCDIC". Slightly "nocreat" requires the output file to
already exist, "excl" requires the output file to not exist, "notrunc"
prevents truncation of the output file (redundant with
"oflag=append"?). "swab" exchanges the order of each pair of input
bytes. "noerror" continues past read errors.
A few of the conversions depend on the separate setting of the
conversion block size, using the "cbs" option, which is independent of
"ibs", "obs" or "bs". There's a lot of bs in this utility.
The "block" conversion takes newline as delimiting records, discards
the newline, and pads with spaces to make the record written be cbs
bytes long. "unblock" carves the input file into records every cbs
bytes, discards any trailing spaces, and adds a newline, reversing the
process. Now you can work with a punched card reader:
"conv=unblock,ascii".
"sync" (probably related to the "sync", "dsync", and "nonblock"
settings for "iflag") pads with NULs to make the input blocks actually
be ibs long. Note that ibs otherwise just specifies how big a read to
ask the kernel for. The read system call can return fewer bytes
(especially if non-blocking I/O is specified. If used with the "block"
or "unblock" conversion, pads with space instead.
"fdatasync" and "fsync" suffer in my copy of the man page on feisty
fawn of formatting problems that make them look like part of the
description of "sync". The info page reads OK. These do a sync
system call at the end, hastening the writing of data buffered in the
kernel. The difference between them is that "fsync" also requests
that the file metadata also be written, while "fdatasync" only
requests that the file data be written.
When all is said and done, dd prints a summary of IO. It tells how
many ibs sized records it read (i), and how many partial records there
were at the end (j, always 0 or 1?), like so: "i+j records in".
Similarly, for output, in terms of obs sized records: "o+p records
out". In older dds that's all. In gnu dd on feisty, another line
describes total bytes copied, the time it took, and the calculated
rate. Specifying the "status" option with the "noxfer" value
eliminates the extra gnu summary line. In any event, such of this as
is printed is printed to stderr, so it won't be part of the output
file even if you used redirection rather than "of".
You can get a long running dd to print summary information for how far
it has gotten by sending it the USR1 signal.
gnu dd supports --help and --version options, the only "-" introduced
options that it supports.
There wasn't much in this man page that I didn't know already, and
which sound likely to be useful. I might use block and unblock
conversions or iflags and oflags, but by the time I needed nonblocking
I/O and sync, I'd probably be writing in python, rather than trying to
shell script. But you never know.