Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 20 AUG 2004 04:57:56PM Jim Vaughan wrote:

I am trying to read in some data that is in a unicode utf-16 format, it needs to be read one character at a time.

I will not know in advance that the file is non-ASCII.

The first couple of bytes in an utf-16 file are either FE FF or FF FE.

What I wanted to do was OSBREAD the first two characters, check if they are either of the above. If they are then read two bytes at a time converting them using the unicode_str function to create utf-8 characters that OI can handle.

As you might guess I have hit a problem.

If I do this:

OSBREAD CHARACTERS FROM FILENAME AT 0 LENGTH 2
TEST=SEQ(CHARACTERS1,1)

TEST is equal to -1 and not 255 (FF).

If I turn off UTF8 mode in application properties then SEQ works as expected and TEST is equal to 255.

CHAR() has a similar problem as SEQ; so I can not simply use that.

Any ideas? Is this a bug or is it meant to work like this?


At 20 AUG 2004 05:12PM Pat McNerthney wrote:

Jim,

I assume you are in OI 7.0.

Is you found out, in UTF8 mode, SEQ and CHAR operate on whole UTF8 characters, which can range from 1 to 4 bytes of the string. Also, in UTF8 mode CHAR(255) now represents the UTF8 character that has the value 255 (which uses a 2 byte sequence) and is not equivalent to a Record Mark (which still uses the \FF\ byte value).

There are now a series of intrinsic functions for manipulating binary data in a UTF8 mode independant manner. However, for your problem, I think the simplest thing to do would be to just compare the first two bytes to the hex strings \FEFF\ and \FFFE\.

Pat


At 20 AUG 2004 05:26PM Jim Vaughan wrote:

Perfect, an explanation and a solution.

I didn't know (or had forgotten about) about the
format; many thanks Pat.

View this thread on the Works forum...

  • third_party_content/community/commentary/forums_works/171968acc47df2d185256ef600732ae5.txt
  • Last modified: 2023/12/30 11:57
  • by 127.0.0.1