OpenInsight Demo (OpenInsight 32-Bit)
At 03 OCT 2002 01:42:47PM dsig@sigafoos.org wrote:
Michael,
Not to give you more work but you might want to put a big notice on the download page IN RED that running on win98 requires the ansi switch.
We had someone here download it .. yada yada ..
fuy
dsig@sigafoos.org onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"
David Tod Sigafoos ~ SigSolutions
Phone: 971-570-2005
OS: Win2k sp2 (5.00.2195)
OI: 4.1
At 03 OCT 2002 01:46PM [url=http://www.revsoft.co.uk]Revelation Software EMEA Support[/url] wrote:
just happened to us too. 4.1.2 will be there very shortly with /ANSI as default setting.
At 03 OCT 2002 01:47PM Donald Bakke wrote:
Not to give you more work but you might want to put a big notice on the download page IN RED that running on win98 requires the ansi switch.
Really? I've tried to keep abreast with all the Unicode/ANSI threads but I thought that running OI 4.1x on Win98 platforms automatically used ANSI. Am I wrong?
dbakke@srpcs.com
At 03 OCT 2002 01:58PM dsig@sigafoos.org wrote:
cool ..
I tell the others who want to download
dsig@sigafoos.org onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"
David Tod Sigafoos ~ SigSolutions
Phone: 971-570-2005
OS: Win2k sp2 (5.00.2195)
OI: 4.1
At 03 OCT 2002 02:53PM Pat McNerthney wrote:
Don,
Originally Win98 defaulted to ANSI mode, because it could not deal with UTF8 data at all. When we realized that that was not acceptable in a mixed WinNT and Win98 environment, I enhanced the UTF8 data handling routines to know how to convert UTF8 data to ANSI data (in addition to Unicode data) and then call the Win98 ANSI based API.So now Win98 defaults to UTF8 data mode, but operates against the Win32 ANSI apis. I did however have a bug in this UTF8 data to ANSI apis code in the first release of this capability, so I suspect that maybe this is what David has bumped into. Using /ANSI would avoid this bug. This has been fixed, so UTF8 mode should run just fine now on Win98 in the next release.Thanks to all of you for your input and patience in this area. Hopefully with the ANSI mode default, things won't be quite so confusing when running against existing applications.Pat
At 03 OCT 2002 03:40PM dsig@sigafoos.org wrote:
Is it possible for the app to "know" that it is running under win98 and then go into ANSI mode else be in unicode mode ..
Is this what you are saying?
And yes .. the ansi switch fixed it just okiedokie
(a whisper from the fog)
dsig@sigafoos.org onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"
David Tod Sigafoos ~ SigSolutions
Phone: 971-570-2005
OS: Win2k sp2 (5.00.2195)
OI: 4.1
At 03 OCT 2002 04:01PM Pat McNerthney wrote:
David,
OpenInsight currently automatically detect which OS is it running under.If OpenInsight is running under UTF8 mode on Win98, then it converts the UTF8 data to ANSI data and then calls the Win32 ANSI apis. If it is running under UTF8 mode on WinNT, then it converts the UTF8 data to Unicode data and then calls the Win32 Unicode apis.If OpenInsight is running under ANSI mode on either Win98 or WinNT, then it just calls the Win32 ANSI apis directly with the ANSI data.Keep in mind that what we have here is a 2x2 matrix of possible configurations. There is ANSI data versus UTF8 data, and there are ANSI apis versus Unicode apis. Combine the two and you have 4 possibilities.I believe your problem is just a bug in the implementation of the UTF8 data using the ANSI apis. The demo would work fine under the UTF8 data mode on Win98 if the bug wasn't there.Pat
At 03 OCT 2002 04:38PM dsig@sigafoos.org wrote:
Thanks for the clarification .. that is what I *thought* you were saying just wanted to be sure
Like I mentioned the switch worked great and I am sure the next release will take care of the 'preceived' problems ..
They must not be working hard if there is nothing in their outliner
dsig@sigafoos.org onmouseover=window.status=the new revelation technology .. a refreshing change;return(true)"
David Tod Sigafoos ~ SigSolutions
Phone: 971-570-2005
OS: Win2k sp2 (5.00.2195)
OI: 4.1
At 04 OCT 2002 03:58AM Oystein Reigem wrote:
Pat,
One of my worries is getting existing ANSI apps with "foreign" characters to UTF8. Do the various conversions presently work correctly on all characters?
At one time there was a bug in the ANSI to UTF8 conversion function with many accented letters not being properly handled. A week ago I was told the bug was fixed, and downloaded a new 4.1.1 (download time: early on 2002-09-27). In that version conversion results were different from earlier but not better. I found @FM was converted to "?", and the Norwegian "ø" (char(248)) to ASCII NUL, a result so bizarre I suspected a slip-up of my own.
We were in e-mail contact about this stuff - through Mike - but I got sort of impatient not having heard anything for a week. So how's the current situation?
- Oystein -
At 04 OCT 2002 01:14PM Pat McNerthney wrote:
One of my worries is getting existing ANSI apps with "foreign" characters to UTF8. Do the various conversions presently work correctly on all characters?
Well, I think it does, but then I thought the original version did.
At one time there was a bug in the ANSI to UTF8 conversion function with many accented letters not being properly handled. A week ago I was told the bug was fixed, and downloaded a new 4.1.1 (download time: early on 2002-09-27). In that version conversion results were different from earlier but not better. I found @FM was converted to "?", and the Norwegian "ø" (char(248)) to ASCII NUL, a result so bizarre I suspected a slip-up of my own.
@FM is not converted to a '?', @FM stays as the single character char(254). Try converting the single character @FM and then do a len() and a seq() of the result and this should be confirmed. It is when you try display the @FM character in a Unicode enabled window that you "see" a '?'. This is because during the UTF8 to Unicode conversion, the @FM gets converted to Unicode character 0xFFFE. So the window is trying to display Unicode character 0XFFFE, which doesn't exist in the current font, so a '?' is displayed.
The Norwegian "ø" (char(248) is also not getting converted to NULL. Again, try converting that single character and then do a len() and a seq() of each byte to understand what it is really getting converted to. Trying to display it in a window or the debugger is not an accurate way to determine what something has been converted to.
Regardless, the conversion routines currently assume that the high 16 ANSI characters are all "system delimiters". This means that char(240) through char(255) are maintained as that single byte sequence. It also means that when they are displayed in a Unicode enabled windows, that this range (0xF0-0xFF) is converted to Unicode 0XFFF0 through 0xFFFF.
Maybe we need to add a feature to the conversion routines where they can specify what the lowest system delimiter is that should be recognized.
In the meantime, what you can do is after you convert a string through ANSI_UTF8, is to then SWAP() the characters between char(240) and char(255) that you want converted to the UTF8 sequence that represents the Unicode character that you really want.
Pat
At 07 OCT 2002 03:50AM Oystein Reigem wrote:
Pat,
Regardless, the conversion routines currently assume that the high 16 ANSI characters are all "system delimiters". This means that char(240) through char(255) are maintained as that single byte sequence. It also means that when they are displayed in a Unicode enabled windows, that this range (0xF0-0xFF) is converted to Unicode 0XFFF0 through 0xFFFF.
16? Who came up with that number? Why not 128 while you're at it?
![]()
I thought 6 was the official number of delimiters, or the closest to an official number as one can get in this murky area.
For all you western "foreigners" out there: Here are the last 16 characters in the WinLatin-1 set, so you can see the difference between 6 and 16: "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú=@STM", "
=@TM", " =@SVM", " =@VM", "
At 07 OCT 2002 04:56AM Oystein Reigem wrote:
Pat,
The Norwegian "ø" (char(248) is also not getting converted to NULL. Again, try converting that single character and then do a len() and a seq() of each byte to understand what it is really getting converted to. Trying to display it in a window or the debugger is not an accurate way to determine what something has been converted to.
I repeated my test. Here are the steps I followed. Please tell me where I went wrong:
- I first prepared an ANSI row with some lines of text and a lot of high-bit letters. This I did with the System Editor in OI 4.0.3, which is pre Unicode, and so should produce ANSI data rows. In my test row the high-bit letters start somewhere in line 2, so the row contains first some ordinary characters, then a @FM, then some ordinary letters, then a mixture of high-bit and other characters.
- Then I closed 4.0.3, started 4.1.1, and attached the table containing the ANSI test row.
- I ran System Editor and opened the row. System Editor showed me the beginning of the row. The first line was OK. So was the beginning of the second line. But at the first problematic letter there was a burst of unreadable characters (rectangles), then nothing more. This was as expected, the explanation assumedly being System Editor interpreting an ANSI string as a UTF8 string.
- Then I converted the row from ANSI to UTF8 and looked at the result.
I did my conversion with a simple function of my own that calls the ANSI_UTF8 function. My function does the following steps:
- Opens the table
- Reads the ANSI row into a variable RecAnsi
- Converts the contents of RecAnsi with the ANSI_UTF8 function, with the result in a RecUtf8 variable
- Writes RecUtf8 to a different row in the same table
- Also returns the content of RecUtf8 to the Results Viewer through the function's return statement.
I've listed my function below.
After converting I stayed in System Editor to inspect the result. I looked at the text in the Results Viewer, and I opened the newly written row.
The two results were the same. The line break between the first two lines (i.e, a @FM) is gone and replaced by a question mark. At the first high-bit character there is a sudden stop. The last character that shows is the character before that first high-bit character.
Then I changed my test function to return the len() and the seq() of each character in the result. I could clearly see the result extended beyond the displayed characters. All the len's were 1. The @FM had become a character which seq was 63, i.e, "?". The high-bit letters had become a character which seq was 0, i.e, NUL.
- Oystein -
Here's my test function, before I changed it to using len() and seq():
function Test_ANSI_UTF8( Dummy )
declare function ANSI_UTF8
open "TESTUTF8" to TableVar else
. call msg( @Window, "Open failure" )
. return -1
end
read RecAnsi from TableVar, "COLLSEQLETTERS" else
. call msg( @Window, "Read failure" )
. return -2
end
RecUtf8=ANSI_UTF8( RecAnsi )
write RecUtf8 on TableVar, "COLLSEQLETTERS_UTF8" else
. call msg( @Window, "Write failure" )
. return -3
end
return RecUtf8
At 07 OCT 2002 01:47PM Pat McNerthney wrote:
Oystein,
I really don't have anything concrete I can duplicate from your description. Here is a test program I have been running:subroutine UTF8_TEST(void) declare function UTF8_ANSI, ANSI_UTF8 ansi_all=' utf8_all=' for ch=128 to 255 ansi=char(ch) utf8=ANSI_UTF8(ansi) ansi_from_utf8=UTF8_ANSI(utf8) if ansi_from_utf8 ansi then debug len=len(utf8) for i=1 to len seq=seq(utf8i,1) next i ansi_all := ansi utf8_all := utf8 next ch utf8_from_ansi=ANSI_UTF8(ansi_all) if utf8_from_ansi utf8_all then debug ansi_from_utf8=UTF8_ANSI(utf8_all) if ansi_from_utf8 ansi_all then debug open 'SYSPROCS' to sysprocs then write ansi_all to sysprocs,'UTF8_TEST_ANSI' else null write utf8_all to sysprocs,'UTF8_TEST_UTF8' else null end returnThe main "for" loop of this program is what I mean by processing a single character at a time. If you can provide a similar program which generates it's own data that is converted, then I might be able to chew on it.
Pat
At 08 OCT 2002 07:24AM Oystein Reigem wrote:
Pat,
I now understand a bit more how Unicode is implemented in OI 4.1.x. Stored procedures that work with strings don't see Unicode characters but bytes. A Unicode character with a two byte UTF-8 representation is a string of two bytes. So the implementation is on a very primitive level. I would think other programming tools that support Unicode allow the program to work directly with characters, and hide the byte-wise representation from the program and the programmer.
I'm very curious about what tools or features there are in OI 4.1.x to work on the character level, or which tools or features there will be in the future. E.g:
- How does one determine the length of a string in Unicode characters?
- How does one determine the "alphabetic" order of two Unicode strings?
- In the same vein: How does one sort Unicode strings?
Ordering and sorting is very important in a database application. Data in indexes, popups, reports, etc, must have a consistent sort order that agrees with the local alphabet. E.g, the last 3 letters ÆØÅ in the Norwegian alphabet ABC…XYZÆØÅ must be sorted according to their place and order in that alphabet. Having the ÆØÅ letters out of their natural order would be similar to having the letters BJV out of order. The sort order must also include letters foreign to the local alphabet, in places that fit with the local alphabet. E.g, in Norway we need the Swedish and German Ö sorted in with the Ø. In Germany they need it with the O.
So we need a collation sequence, depending on which language is used.
I tried to do some sorting with V119. As expected it sorted on bytes and not on characters.
And before I reset my collation sequence from international (Norwegian) to LND_DEFAULT I got some weird results from V119. In the current situation I think a collation sequence does more harm than good.
Comments, please.
- Oystein -
At 08 OCT 2002 09:50AM Oystein Reigem wrote:
Pat,
I can't get that test program of yours to run. If ansi is any high-bit character utf8=ANSI_UTF8(ansi) becomes NUL. This is consistent with my earlier results.
I now tried to run OI 4.1.1 both with and without the /ANSI switch in case that mattered.
What exactly is it ANSI_UTF8 does? Is there anything it relies on? Any environment settings? Like the ENV or LND settings? Or any Windows language or locale settings?
- Oystein -
At 08 OCT 2002 02:41PM Pat McNerthney wrote:
Oystein,
I now understand a bit more how Unicode is implemented in OI 4.1.x. Stored procedures that work with strings don't see Unicode characters but bytes. A Unicode character with a two byte UTF-8 representation is a string of two bytes. So the implementation is on a very primitive level. I would think other programming tools that support Unicode allow the program to work directly with characters, and hide the byte-wise representation from the program and the programmer.
That is one way to implement Unicode depending on the criteria of what one needs to support.
In our implementation, we have the extremely critical criteria of the support of the Pick style system delimiters in a 100% backwards compatible manner. This was considered such an important criteria that if it could not be met, then there was no point in even attempting Unicode support. After investigating all the different ways system delimiters are used throughout by the OpenInsight system code and by end developers, there was no getting around the fact that system delimiters are the specific, single byte values Dick Pick declared them to be. There are just too many places where these system delimiter values are hard coded as literals and then used against data in many different ways.
Another criteria was to not create an implementation that was specific to Unicode as implemented by Windows NT. The 16-bit wide Unicode character format has already been superceded by a 32-bit wide character format in the latest Unicode specifications.
The beauty of our UTF8 implementation is that we have met this criteria. Both the system code and your code can continue operate on all of your data in the same exact manner it has always done, as strings of single-byte values with the Pick system delimiters continuing provide the organization of the data. In addition, we are in position to support the Unicode implementations of other operating systems.
I'm very curious about what tools or features there are in OI 4.1.x to work on the character level, or which tools or features there will be in the future. E.g: - How does one determine the length of a string in Unicode characters?
At the moment, the easiest way is to convert it to a Unicode string (there is a UTF8_UNICODE function) and then divide it's length by 2.
If this is a useful item, I can make a routine to do just this to avoid the overhead of doing the actual conversion.
- How does one determine the "alphabetic" order of two Unicode strings? - In the same vein: How does one sort Unicode strings?
Both of these are automatically handles by the encoding scheme of the UTF8 format. Here is a good link that explains how the bits are twiddle between Unicode and UTF8: UTF-8 and Unicode FAQ
Essentially, doing straight byte compares of two UTF8 strings is the same as doing compares against the equivalent Unicode character strings.
Ordering and sorting is very important in a database application. Data in indexes, popups, reports, etc, must have a consistent sort order that agrees with the local alphabet. E.g, the last 3 letters ÆØÅ in the Norwegian alphabet ABC…XYZÆØÅ must be sorted according to their place and order in that alphabet. Having the ÆØÅ letters out of their natural order would be similar to having the letters BJV out of order. The sort order must also include letters foreign to the local alphabet, in places that fit with the local alphabet. E.g, in Norway we need the Swedish and German Ö sorted in with the Ø. In Germany they need it with the O. So we need a collation sequence, depending on which language is used. I tried to do some sorting with V119. As expected it sorted on bytes and not on characters. And before I reset my collation sequence from international (Norwegian) to LND_DEFAULT I got some weird results from V119. In the current situation I think a collation sequence does more harm than good. Comments, please.
This is an area that is lacking in the current implementation and needs to be addressed. It does make sense that having a collation sequence in place would cause more harm than good.
Pat
At 08 OCT 2002 02:58PM Pat McNerthney wrote:
Oystein,
I can't get that test program of yours to run. If ansi is any high-bit character utf8=ANSI_UTF8(ansi) becomes NUL. This is consistent with my earlier results. I now tried to run OI 4.1.1 both with and without the /ANSI switch in case that mattered.
Beats me…except that I am running against the very latest version. We are about to do a 4.1.2 beta cycle and you are on the top of the list of beta testers. Let's wait until you have that version to untangle what is going on.
What exactly is it ANSI_UTF8 does? Is there anything it relies on? Any environment settings? Like the ENV or LND settings? Or any Windows language or locale settings?
First, it checks if the ANSI value is a system delimiter and if it is it stays that value. If it is not a system delimiter, it takes the 8-bit ANSI value and converts it to a 16-bit Unicode value by using the current code page in effect on that machine. It then converts that 16-bit Unicode value to the appropriate UTF8 byte sequence. The UTF8 character sequence will between 1 and 3 bytes long.
Pat