I don't expect that there will be an answer, 
but just in case there is someone
on the list....


I'm looking for advice with CJK-Unicode development which I need for a
search engine which involves regex matching.

What I'm working with can be seen at:
http://staff-test.lib.umn.edu/cdm/eal/ealfinder.phtml
(...at least when the server is up.)
Presently it's a demo site.
In order to view, you'll need either 1) NJStar, or 2) one of the more
recent browsers which can easily recognize and present HTML-ized unicode.

I've developed the form and results page (as far as it goes) 
for the East Asian Library here on campus using php/mySQL.

The challenge I'm facing now has to do with
1) finding a means to recognize one of the 3 major encodings
for Chinese characters which a user might enter in the text
box, and then
2) converting these to unicode so that I can regex the string against
the database of citations.

Though I've developed this using php/mySQL, I'm at the limits of my skill,
and if I can establish that there is a java method for accomplishing these
two tasks, I'm going to hand over the project to one
of our people who is more familiar with java than I am, on the good faith
that they will be able to explore the options and complete the project more
easily than I can.

SOOOOO my question is: are there well known java objects (ie part of SDK?)
which will handle these two tasks? -- ie, encoding detection, and encoding
conversion of Chinese and Japanese multibyte characters?

BACKGROUND:
php does have some experimental 
"multibyte string functions" but these 
currently will only handle Japanese.
On the Chinese side of things, I've 
found a detection script (written in perl) 
and something written in java, 
but these won't handle Japanese.

If there is one solution for both, I'd
sure like to know about it.

gs

******************************************
George Swan
Collection Development Support Unit	VOICE:	(612) 624-5860
Room 170B, Wilson Library			FAX:	(612) 626-9353
University of Minnesota Libraries		g-swan at tc.umn.edu
309 19th Avenue South			cdm-web at tc.umn.edu
Minneapolis, MN 55455			colldev at tc.umn.edu
USA						http://staff.lib.umn.edu/cdm/