<div dir="ltr"><div>why it violates the unix philosophy - in my mind - is that apparent size has nothing to do with he primary function of du - which is to display disk usage. And the unix philosophy is to do one thing and do it well. <br>
<br></div><div>the apparent size flag for du is trying to get du to do things that other utilities already do. <br><br><br></div><div><br><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Apr 5, 2014 at 12:50 PM, Mike Miller <span dir="ltr"><<a href="mailto:mbmiller+l@gmail.com" target="_blank">mbmiller+l@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks, David. I thought that was the issue -- that apparent size would not include overhead, so I was not able to understand why I was getting apparent size that was smaller than ondisk size. After they moved my data to a different array, that difference reversed direction. This was explained to me last night:<br>
<br>
"on the old project spaces, zfs did some compression on the data so the apparent-size was larger than the ondisk size."<br>
<br>
So, compression is also an issue, and I wouldn't have thought of that.<br>
<br>
Now that there is no compression, I see that ondisk usage is 20GB more than apparent size:<br>
<br>
$ \du -sB GB --apparent-size miller<br>
146GB miller<br>
<br>
$ \du -sB GB miller<br>
166GB miller<br>
<br>
$ find miller | wc -l<br>
9908<br>
<br>
So there are about 2 million bytes of overhead per file, which seems like a lot, to me. I would think that implies disk blocks of multiple megabytes, which seems unlikely. There must be more that I don't understand.<br>
<br>
Regarding your idea (David)...<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">
As an aside, imho, the 'apparent size' option is really a terrible option to include in 'du' and is a violation of the unix philosophy because it has explicitly NOTHING to do with disk management. But that's neither here nor there.<br>
<br></div><div class="">
A better way to get the byte count of a file is<br>
<br>
stat --format=%s<br>
</div></blockquote>
<br>
...I guess you mean that we should do something like this to get the totals for a directory and contents:<br>
<br>
$ find miller -print0 | xargs -0 stat --format=%s | awk '{sum+=$1}END{print sum}'<br>
145159848954<br>
<br>
OK, that does work, but how horrible is it that I can get exactly the same answer like so:<br>
<br>
$ du -sb miller<br>
145159848954 miller<br>
<br>
Of course it's worse if you want to do multiple directories at once.<br>
<br>
That's a violation of unix philosophy? It isn't true that it has nothing to do with disk management. For example, when moving files between systems, it might help a lot to know the actual size. What if I want to make a .tar file from a directory? How large will that file be? How much space will the files take up on tape? If I'm using tar for tape backup, I think the size will be given by --apparent-size, not by ondisk size.<span class="HOEnZb"><font color="#888888"><br>
<br>
Mike</font></span><div class="im HOEnZb"><br>
<br>
<br>
On Fri, 4 Apr 2014, David Wagle wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
"apparent size" is the "ls -l" size of the file.<br>
<br>
which is the "rght" size for you to use is dependent on what you're trying<br>
to do.<br>
<br>
Apparent size is nearly useless for managing disks -- which is usually what<br>
you use du for.<br>
<br>
Say my disk has blocks that are 1KB. If I have a file with the nothing but<br>
the letter 'A' in it, that will have an apparent size of 1 byte. But<br>
because the smallest block size on my disk is 1KB, that 1 byte file will<br>
USE 1 KB of disk space no matter what because the physical data has to be<br>
recorded in a block and that block will then be marked 'used.'<br>
<br>
As an aside, imho, the 'apparent size' option is really a terrible option<br>
to include in 'du' and is a violation of the unix philosophy because it has<br>
explicitly NOTHING to do with disk management. But that's neither here nor<br>
there.<br>
</blockquote>
<br>
<br>
<br>
<br></div><div class="HOEnZb"><div class="h5">
On Fri, 4 Apr 2014, David Wagle wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
"apparent size" is the "ls -l" size of the file.<br>
<br>
which is the "rght" size for you to use is dependent on what you're trying<br>
to do.<br>
<br>
Apparent size is nearly useless for managing disks -- which is usually what<br>
you use du for.<br>
<br>
Say my disk has blocks that are 1KB. If I have a file with the nothing but<br>
the letter 'A' in it, that will have an apparent size of 1 byte. But<br>
because the smallest block size on my disk is 1KB, that 1 byte file will<br>
USE 1 KB of disk space no matter what because the physical data has to be<br>
recorded in a block and that block will then be marked 'used.'<br>
<br>
As an aside, imho, the 'apparent size' option is really a terrible option<br>
to include in 'du' and is a violation of the unix philosophy because it has<br>
explicitly NOTHING to do with disk management. But that's neither here nor<br>
there.<br>
<br>
<br>
On Fri, Apr 4, 2014 at 2:29 PM, Mike Miller <<a href="mailto:mbmiller%2Bl@gmail.com" target="_blank">mbmiller+l@gmail.com</a>> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Tue, 1 Apr 2014, Mike Miller wrote:<br>
<br>
On Tue, 1 Apr 2014, Ben wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
-h will always be different from the actual disk usage, you might also<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
want to play around with -B option too.<br>
<br>
</blockquote>
<br>
I've done that. Using --si -sB GB gives the same result as --si -sh. Did<br>
you think that they would be different?<br>
<br>
</blockquote>
<br>
Thanks for the suggestions. Now I have answers (below).<br>
<br>
I was misusing the --si option there. It should be used *instead* of -h,<br>
not in conjunction with it. These two commands should do the same thing<br>
when the volume in "dir" is in the multi-gigabyte range...<br>
<br>
du -s --si dir<br>
du -sB GB dir<br>
<br>
...and so should these two commands:<br>
<br>
du -sh dir<br>
du -sB G dir<br>
<br>
The first pair will report 1000*1000*1000 bytes and the second will report<br>
1024*1024*1024 bytes.<br>
<br>
<br>
<br>
What happens when you use --apparent-size option.<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
--apparent-size<br>
print apparent sizes, rather than disk usage; although the<br>
apparent size is usually smaller, it may be larger due to holes<br>
in ('sparse') files, internal fragmentation, indirect blocks,<br>
and the like<br>
<br>
</blockquote>
<br>
I want to try that, but I'm having this problem right now:<br>
<br>
$ ls /project/guanwh<br>
ls: cannot access /project/guanwh: Stale file handle<br>
<br>
</blockquote>
<br>
Yep, you nailed it. That was the issue. If I use --apparent-size, the<br>
results are consistent. According to supercomputing staff:<br>
<br>
"it is not a bug, -b is implies --apparent-size, so to compare its output<br>
to -sm/sh you have to include --apparent-size with -sm/-sh as well.<br>
<br>
"when the apparent size is different from the reported size it is not a<br>
bug in du but rather a feature of the filesystem :)"<br>
<br>
Now I just have to figure out which is the right size for me -- apparent<br>
or reported. I guess apparent sizes are the real file sizes. In this<br>
example "dir" has about 10,000 files in it with about half being 5 KB and<br>
have about 29 MB:<br>
<br>
$ du -s --si dir<br>
162G dir<br>
<br>
$ du -s --si --apparent-size dir<br>
143G dir<br>
<br>
$ du -sb dir<br>
142038799951 dir<br>
<br>
$ wc -c dir/* | tail -1<br>
142037349967 total<br>
<br>
<br>
One thing to note: It seems that du always rounds up. So if 1.1 GB are<br>
used, du will report 2 GB.<br>
<br>
<br>
Mike<br>
______________________________<u></u>_________________<br>
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota<br>
<a href="mailto:tclug-list@mn-linux.org" target="_blank">tclug-list@mn-linux.org</a><br>
<a href="http://mailman.mn-linux.org/mailman/listinfo/tclug-list" target="_blank">http://mailman.mn-linux.org/<u></u>mailman/listinfo/tclug-list</a><br>
<br>
</blockquote>
<br>
</blockquote>
______________________________<u></u>_________________<br>
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota<br>
<a href="mailto:tclug-list@mn-linux.org" target="_blank">tclug-list@mn-linux.org</a><br>
<a href="http://mailman.mn-linux.org/mailman/listinfo/tclug-list" target="_blank">http://mailman.mn-linux.org/<u></u>mailman/listinfo/tclug-list</a><br>
</div></div></blockquote></div><br></div>