Robert P. Goldman said: > >>>>> "DS" == Dave Sherohman <esper at sherohman.org> writes: > DS> At worst, he might need to walk through the (surviving) tags > DS> with a set of flags for whether, e.g., <I> is turned on and > DS> append a </I> to the document if the submitter forgot to close > DS> it. > > But notice that this is enough to make my point! Detecting balanced > delimiters is the paradigm case of context-free versus regular > expression parsing: to match parentheses, you need to have a stack to > push the openers onto and pop off of when you find the match. That's > a pushdown automaton, not a finite state machine. Except you missed my implication that it would probably be two separate steps - first use a one-shot regex to filter out all 'unacceptable' tags, then scan for balance. If done in perl, the scan for balance could be done using a second regex similar to the first one, but using the continuation flag rather than the global flag, so it would still be regec-based, it would just run the regex more than once. Also, as there would be a small set of acceptable tags, I don't think a stack would be needed, just a set of variables (or an array or a perl hash or...) to either keep track of how many levels of each are open or just whether the attribute was last seen as an opening or a closing tag. (Which one would be appropriate is based on whether <I><I></I> leaves italics on or off.) Technically, <I><B></I></B> isn't the Right Way to write your HTML, but it happens and I've never noticed any browser having problems with it. A stack would be good for enforcing that tags must be properly nested, but would not do very well in this case without some extra logic for popping non-top yalues. -- "Two words: Windows survives." - Craig Mundie, Microsoft senior strategist "So does syphillis. Good thing we have penicillin." - Matthew Alton Geek Code 3.1: GCS d- s+: a- C++ UL++$ P+>+++ L+++>++++ E- W--(++) N+ o+ !K w---$ O M- V? PS+ PE Y+ PGP t 5++ X+ R++ tv b+ DI++++ D G e* h+ r++ y+ --------------------------------------------------------------------- To unsubscribe, e-mail: tclug-list-unsubscribe at mn-linux.org For additional commands, e-mail: tclug-list-help at mn-linux.org