<br><br><div class="gmail_quote">On Tue, Nov 16, 2010 at 1:50 AM, Mike Miller <span dir="ltr"><<a href="mailto:mbmiller%2Bl@gmail.com">mbmiller+l@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
I thought some of you would be interested in this. See the last two<br>
paragraphs for the take-home messages.<br>
<br>
<br>
<br>
So the finding here that might be useful in many situations is that when<br>
searching for a regexp in a big file, you might do much better to filter<br>
lines of the big file with a simpler, more inclusive grep, then do the<br>
regexp search on the stdout from the simple grep.<br>
<br>
Mike<br><br></blockquote><div><br>I did enjoy reading that. I think the real takeaway is that the more complicated your regular expression, the longer it takes. And by complicated I mean the more wildcards and operators you have that induce backtracking in the regular expression engine... Big logfiles make those performance hits obvious.<br>
<br>Another trick I've learned with big logfiles is to load them into an SQL database and then I can write searches as SQL queries. Depending on the DB you're using, you may have a nice little gui that makes writing queries and manipulating results very easy. I can't take credit for that one - but the first time I saw someone do that I thought "holy cow.... why didn't I think of that?" Of course it helps if your log files are in CSV format or something similar so you can slam everything into the right column easily. <br>
<br>-Rob<br></div></div>