On Mon, 9 Apr 2012, Gerry wrote: > TL;DR mawk is fast. gawk is not. > > Then you might be interested in this: > http://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/ > > (apologies if if you've already seent it) I hadn't seen it and I'm looking forward to studying it. I do a lot of work that's exactly like what that author does -- parsing multi-gigabyte files and having to use awk, or somesuch. The speed does become a huge issue. Related to this speed issue -- this reminds me of a cool trick I learned during the past year. I actually learned it on this list. Suppose you have a giant file with 10 million lines in which the word STRING probably appears on about 10 lines and you want to find those lines. You could do this: grep -w STRING file But that is slow. This is fast, but it doesn't match only words: grep -F STRING file That might include stuff like "fooSTRINGbaz", which we don't want, but suppose the grep -F returned only 1000 lines or 10000 lines -- that's a big step in the right direction because all the lines I want are included. So, in most cases, this is the fast way to do the job: grep -F STRING file | grep -w STRING First do the fast grep to reduce the number of lines piped into the slower but more precise word grep. The result will always be the same as if grep -w alone had been used. I often have a file with a list of words that I want to grep out of another file. Suppose the list of words is in a file called words.txt, then this will work, but slowly... grep -wf words.txt file ...and this will give the fast result: grep -Ff words.txt file | grep -wf words.txt Of course, it depends on your situation. Sometimes the fast grep alone will do what you need. Sometimes it won't help (e.g., if every line matches!). But for me it has made a huge difference. Mike