Japanese char set (OpenInsight) [Revelation On-Line Wiki]

OpenInsight, Jim Vaughan, Mike Ruane, j Vaughan, Steve Epstein, Oystein Reigem

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 17 OCT 2001 09:01:28PM Jim Vaughan wrote:

Does the new 32-bit stuff have any support for the Japanese character set?

If so would it support this character set in the menus, forms and data?

At 18 OCT 2001 07:28AM Mike Ruane wrote:

Jim-

We're looking into it- as well as Chinese. One of the problems is that we don't speak Chinese or Japanese and expect trouble installing those versions of Windows.

Mike

At 18 OCT 2001 01:32PM j Vaughan wrote:

What kind of time frame are we looking at?

At 19 OCT 2001 10:47PM Jim Vaughan wrote:

I know it's hard to guess how long something like this might take, but … I need to know. We have a customer in Japan that would like to buy but needs the Japanese char set.

Give me a best case worst case. If you think it can be done it will take from…. to….

Thanks.

At 22 OCT 2001 07:18AM Mike Ruane wrote:

Jim-

I have a new machine I can test it on, and someone who can help me get it installed. I should have some more details by next week.

Mike

At 22 OCT 2001 04:22PM j Vaughan wrote:

You guys are great.

I look forward to hearing how it goes.

At 29 OCT 2001 01:05PM Jim Vaughan wrote:

I just heard from my customer, they are meeting next week.

Would it possible to know if this is gaoing to be available by then?

At 29 OCT 2001 02:26PM Mike Ruane wrote:

Jim-

We're formatting the machine today.

Mike

At 29 OCT 2001 03:29PM Jim Vaughan wrote:

Great, keep me updated.

At 29 OCT 2001 04:25PM Steve Epstein wrote:

Dear Jim and Mike,

I have asked the same question.

I actually have a Japanese WIN2000 machine from our clients in Japan. Any testing I can do would be appreciated. I have the fonts, et al.

Steve

At 29 OCT 2001 05:24PM Mike Ruane wrote:

Guys-

Thanks-

First blush seems to be a no, as we need Unicode, which would destroy our data since we make heavy use of Ascii 251 to 255 as our system delimiters.

MIke

At 30 OCT 2001 10:27AM Jim Vaughan wrote:

So what does that mean, do you have any other avenues to pursue?

At 30 OCT 2001 06:25PM Oystein Reigem wrote:

That must be the next big project. After the 32-bit version. To rid OI of those troublesome delimiters.

Just trying to make myself popular.

- Oystein -

At 04 NOV 2001 03:57PM j Vaughan wrote:

So this is no, for now? Or no forever?

If it's no for now, when in the future might it be available.

I just need to give my customer an answer, even if it's one they don't like.

At 05 NOV 2001 06:42AM Oystein Reigem wrote:

Mike,

It would be nice if Unicode could be implemented in OpenInsight and kill dead the international-characters-versus-delimiters problem. But there are many questions on the way. I assume you've looked at some of them already.

There are many different Unicode encoding formats. Some of them are fixed-length (1, 2, 3 or 4 bytes per character), some variable (characters with a mix of different lengths).

I believe there are two basic alternatives if one wants to implement a multi-byte character encoding system in a database system like OpenInsight, where special characters or byte values are used to delimit various units of data during storage and computing.

One is to use a fixed-length character encoding format and let the delimiters be multi-byte too. This means among other things that the file system must be rewritten to handle multi-byte characters instead of single-byte characters. I don't expect that can be done overnight.

The other is as much as possible to handle multi-byte encoded text as any other byte sequence, and keep the old single-byte delimiters. But then one must choose an encoding format that avoids collisions with the delimiters. E.g with a 2-byte encoding format, none of the 2 bytes must ever be in the range 250-255.

But is the latter possible? Is there a Unicode encoding format (e.g one that can be used for Japanese) where no byte is in the range 250-255? I believe no.

But there are formats where certain other byte values never occur. E.g, the UTF-8 2-, 3- and 4-byte encodings always have byte values with the highest bit set to 1 (to distingush them from the single-byte UTF-8 encoding, which is plain old 7-bit ASCII). So perhaps by using that old trick with the bi-directional CHARMAP it's possible after all? E.g, shunt 250-255 down by 128.

Next question is how comparisons and sorting can be done on multi-byte data.

- Oystein -

PS. I don't know that much about Unicode.

But I have colleagues who know a bit more.

And there's the Unicode website .

View this thread on the Works forum...