On Thu, 22 Jan 2009, Shawn Fertch wrote: > A little clarification is in order here... > > While rsync does a comparison, it only copies files that have changed. > Tar/rcp/scp/find-cpio/etc will typically copy the entire contents > depending upon the parameters specified. > > If rsync is possible, I would highly recommend using that instead. It > will preserve file permissions, ownership, date/time, etc. scp/sftp > will not unless you tar the directory's contents up and move the tarball > over. > > While it's true that rsync does chew up time doing the comparison, it's > been my experience that rsync (even with the comparison) is most times > faster than other methods**. Given the fact that it keeps permissions > so that I don't have to reset anything, even faster. Also, if this is > going to be an ongoing transfer of files within the directory, much > faster in that it only does the files/directories which have changed. If every file has to be moved, the comparing would be wasted time, but if files are large and most do not have to be moved, the comparison may massively save time, especially if the network is slow. It happens that I started to write the info below a couple of months ago to share with this list and did not finish it, but I'm finishing it now. My problem was to copy many files from one machine to another, but none of the files existed on the target machine. I really just wanted to make a gzipped tar file (.tgz) and send it to another machine. I didn't have much free disk space on the source machine so I had to do a little work to figure out the tricks. Read on: I want to move files from one GNU/Linux box to another. The disks are nearly full on box with the files currently on it, so I can't write to .tgz on the source machine and send the .tgz file. The data are about 13GB uncompressed and about 3.7GB in .tgz format. This is how I get the latter number: tar zpcf - directory | wc -c That sends the zipped tar to stdout where the bytes are counted by wc. I have about 210,000 files and directories. There are some good suggestions here on how to proceed: http://happygiraffe.net/copy-net I wanted to have the .tgz file on the other side instead of having tar unpackage it automatically, so I find out I could do this on the old machine to send files to the new machine... tar zpcf - directory | ssh user at target.machine "cat > backup.tgz" ...and it packs "directory" from the old machine into the backup.tgz file on the new machine. Nice. One small problem: I didn't have a way to be sure that there were no errors in file transmission. First some things that did not to work: tar zpcf - directory | md5sum Testing that on a small directory gave me, to my surprise, different results every time. What was changing? I didn't get it. I could tell that it was probably caused by gzip because... $ echo "x" | gzip - > test1.gz $ echo "x" | gzip - > test2.gz $ md5sum test?.gz 358cc3d6fe5d929cacd00ae4c2912bf2 test1.gz 601a8e99e56741d5d8bf42250efa7d26 test2.gz So gzip must have a random seed in it, or it is incorporating the timestamp into the file somehow -- something is changing. Then I realized that I just had to use this method of checking md5sums... On the source machine: tar pcf - directory | md5sum Then do this to transfer the data: tar zpcf - directory | ssh user at target.machine "cat > backup.tgz" After transferring, do this on the target machine: gunzip -c backup.tgz | md5sum The two md5sums are created without making new files on either side and they will match if there are no errors. I moved about 30GB of compressed data this way in three large .tgz files and found no errors -- the md5sums always matched. Mike