UTF8 and Binary Manipulation
Published 03 MAR 2010 at 09:45:00AM by Captain C
Updated on 05 MAR 2010 at 09:45:00AM
One of the most important points to bear in mind when using the Basic+ string handling functions is that all normal string operations are character-based - not byte-based. This has major implications if you wish to manipulate your data in a byte-oriented fashion when in UTF8 mode, because UTF8 is a multi-byte encoding scheme; i.e. it doesn't always follow that one byte represents one character as is the case in ANSI mode.
To overcome this issue Revelation introduced several new Basic+ functions way back in OpenInsight 7.0 that explicitly allows binary manipulation regardless of the string-handling mode you are currently in (Note that these functions are intrinsic to the Basic+ language and do not need to be declared before use).
These functions are:
- GetByteSize
- GetBinaryValue
- PutBinaryValue
- CreateBinaryData
The intention of this blog post is to document these functions and to make you aware of them so that you can develop your applications correctly should you wish to work in UTF8 mode.
(Also note that most of these functions expect you to specify a variable type when using them. This type should be chosen from one of the standard "C" types understood by the Basic+ compiler and listed at the end of this post)
GetByteSize
Returns the number of bytes occupied by the specified variable. This is in contrast to the Len() function which returns the number of characters.
sizeInBytes = GetByteSize( varData )
Argument | Description |
---|---|
varData | Variable to query. |
E.g.
rec = Xlate( "SYSOBJ", "$WRITE_ROW", "", "X" ) recSize = GetByteSize( rec )
GetBinaryValue
This function extracts a binary value from a variable at a specified offset. You must specify the type of data to extract, and if you are extracting a type with a variable length, such as a string of bytes, you must also pass the number of bytes you wish to copy.
binVal = GetBinaryValue( varData, byteOffset, varType, [,noOfBytes] )
Argument | Description |
---|---|
varData | Variable to extract the binary value from. |
byteOffset | 1-based offset into varData to extract the binary value from. |
varType | Type of data to extract. This must be one of the Basic+ "C" types as listed below. |
noOfBytes | Number of bytes to extract. This argument is only required if varType is CHAR or BINARY. |
E.g.
rec = Xlate( "SYSOBJ", "$WRITE_ROW", "", "X" ) // Get the first byte of the record as a number firstByte = GetBinaryValue( rec, 1, BYTE ) // Get the next 10 bytes as a binary string someBytes = GetBinaryValue( rec, 2, BINARY, 10 )
PutBinaryValue
This subroutine modifies a variable by replacing binary data at a specifed byte offset. You must specify the type of data you wish to insert as well as the data itself.
PutBinaryValue( binData, byteOffset, varType, varData )
Argument | Description |
---|---|
binData | Variable containing binary data to modify. |
byteOffset | 1-based starting starting position to begin the modification from. |
varType | Type of data to copy into binData. This must be one of the Basic+ "C" types as listed below. |
varData | Data to copy into binData. OpenEngine converts this to the binary format specified by the varType argument before copying. |
E.g.
* // Example showing how to access and update * // a Windows API structure using * // the binary operators. * // * // A RECT structure consists of four LONG types * // (32-bit signed integer, each 4 bytes long) * // * // typedef tagRECT{ * // LONG left, * // LONG top, * // LONG right, * // LONG bottom * // } RECT; * // We're going to use the GetWindowRect API function * // to get some RECT coordinates hwnd = Get_Property( @window, "HANDLE" ) rect = blank_Struct( "RECT" ) rect = GetWindowRect( hwnd, rect ) * // Increment the top member by 10 top = GetBinaryValue( rect, 5, LONG ) top += 10 PutBinaryValue( rect, 5, LONG, top )
CreateBinaryData
This function creates and returns a "blank" binary variable of the specified type.
binVal = CreateBinaryData( varType, varData )
Argument | Description |
---|---|
varType | Type of variable to create. This must be one of the Basic+ "C" types as listed below. |
varData | Initial value of the new variable. |
E.g.
* // Create a binary integer with an initial value of * // 100 a = "100" intA = CreateBinaryData( INT, a )
Basic+ "C" types
The following is a list of variable types that may be used with the Basic+ binary manipulation functions described above.
* CHAR * BYTE * UBYTE * SHORT * USHORT * LONG * ULONG * FLOAT * LPVOID * LPCHAR | * LPBYTE * LPUBYTE * LPSHORT * LPUSHORT * LPLONG * LPULONG * LPFLOAT * LPDOUBLE * DOUBLE * HANDLE | * INT * UINT * LPINT * LPUINT * LPHANDLE * ACHAR * WCHAR * LPACHAR * LPWCHAR * LPSTR | * LPASTR * LPWSTR * BINARY * LPBINARY |
[EDIT: 05 March 2010]
Due to a recently discovered compiler bug (since fixed) the following "C" types will NOT work with the binary manipulation functions prior to OpenInsight 9.2.0:
- ACHAR
- WCHAR
- LPACHAR
- LPWCHAR
- LPSTR
- LPASTR
- LPWSTR
- BINARY
- LPBINARY
Probably the biggest impact this will have is processing BINARY types, but you can work around this by using the CHAR type instead as they both perform exactly the same operation.