<br><br><div class="gmail_quote">On Sat, Mar 5, 2011 at 11:46 PM, Mike Miller <span dir="ltr"><<a href="mailto:mbmiller%2Bl@gmail.com">mbmiller+l@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
On Sat, 5 Mar 2011, Adam Morris wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Try \x{8a0} instead. I think that \x normally accepts only two following characters, so you have to use \x{} for long hexadecimal numbers.<br>
</blockquote>
<br>
You top posted, so I have to ignore you.<br>
<br>
Just kidding. I did try that and that didn't work either. Then I did this...<br>
<br>
perl -pe 's/[[:ascii:]]//g ; s/(.)/$1\n/g' file.txt | sort | uniq -c >| bad_chars.txt<br>
<br>
...and when I looked at the resulting bad_chars.txt file in emacs again, the characters looked different. Before they were appearing as purple rectangles, but now they appeared as a pair of characters that looked like this: \302\240<br>
<br>
I could represent them exactly that way in perl and delete them. I don't really get what was happening there.<br></blockquote><div><br>I'm guessing you were looking at (possibly variable-length) unicode characters, and your perl filter split them into fixed-length octets or something. <br>
<br>-Rob<br><br></div></div>