Stripping HTML from a string (OpenInsight 32-bit Specific)
At 17 NOV 2005 05:08:39PM Adam Fox wrote:
Can anyone tell me an easy way of stripping html tags from a string? Are there any equivalent tools to regular expressions etc that I might use to accomplish this? I am working on a legacy application (maintenance) and I need to reliably strip out HTML tags from a string. The existing code is trying to do this but some tags slip through causing the application to crash out.
I know this can be quite easily accomplished in Perl, PHP and Javascript for example and was wondering if any of the Gurus on here might know a reliable way to do it in Basic+?
Thanks in advance,
Adam
At 17 NOV 2005 05:43PM psimonsen@srpcs.com's Paul Simonsen wrote:
Adam,
One method would be to get the length of the string and loop through it character by character, copying each character from the original html string to another variable. If you come across a "" character. That way your new variable will not have any html tags.
However, if you need to trap data that is within those tags, you'll have to code around that. For example, many times you'll have data in value=data".
Hope this helps,
psimonsen@srpcs.com
At 17 NOV 2005 07:50PM dsig _at_ sigafoos.org wrote:
Adam ..
Now i am just blowing this out .. but
you might be able to find a script regex which works with the microsoft scripting engine. I believe that bob carten mentioned that OI can use the scripting engine ..
Just a (partially formed) thought
dsig
At 18 NOV 2005 02:55AM Adam Fox wrote:
Hmmm, now that sounds interesting. I was wondering if there would be any way to use regexpressions. Thanks for the suggestion re parsing through the string but the original solution does this and it has proven to let stuff slip through the gap; so yes, thee is a fair amount of coding around that. Trouble is being new to OI I'm not aware of anything that could make the job easier.
Nothing like pattern matching or regular expressions native to OI then?
Adam
At 18 NOV 2005 02:53PM Bob Carten wrote:
No Regular expressions, however very powerful string operations
with square bracket, indexc and swap functions you can do well.
A nice way to use vbscript or javascript from OI 7.2 is to use the windows scripting component. There is a nice article here
you can create a component as a plain text file, say myComponent.wsc
use regsvr32 to register it then in OI you can use something like
myObj=OleCreateInstance("MyComponent.wsc")
OlePutProperty(MyObj, 'MyString', mystring)
OlePutProperty(MyObj, 'MyPattern', myPattern)
match=OleCallMethod(MyObj, 'Match')
In earlier versions you can put the WSC object in an OLE control on a window, use Set_Porerty, Send_MEssage and Get_Property to drive it.
HTH
Bob
At 23 NOV 2005 03:16PM Mark Glicksman wrote:
I've written a function to strip HTML - seems to work pretty well. You might be able to adapt it:
Function Strip_HTML_Tags(InputText)
*Revised 1/31/05
*
*Function strips the html tags, scripts, etc. out of InputText, and returns just plain text.
*
Declare Function Count_MultiVal, UCase
*
ReturnText=" ;*initialize return value
Hold=InputText
swap " " with " " in Hold
swap " " with " " in Hold
swap "
" with " " in Hold
swap "
" with " " in Hold
swap "" with " " in Hold
swap "" with " " in Hold
Swap @tm with "" in Hold
Hold2=Hold
Swap "
Swap ]" with @fm:" " in TextPiece
Ignore scripts and style statementsBegin CaseCase UCase(TextPiece)1,5=TITLE"ReturnText="Case UCase(TextPiece)1,5=STYLE"ReturnText="Case UCase(TextPiece)1,6=SCRIPT"ReturnText="Case 1ReturnText=TextPieceEnd CaseNext I
Begin Case
Case Count_MultiVal(Hold)=1
ReturnText=Hold2Case 1
Swap @vm with "" in ReturnTextEnd Case
*
Gosub Swap_Special_Chars
*
*
EndSub:
Return Trim(ReturnText)
*
*
*
Swap_Special_Chars:
*Swap the codes for special characters with the actual characters
Swap "  " with char(160):" " in ReturnText
Swap "¡ " with char(161):" " in ReturnText
Swap "¢ " with char(162):" " in ReturnText
Swap "£ " with char(163):" " in ReturnText
Swap "¤ " with char(164):" " in ReturnText
Swap "¥ " with char(165):" " in ReturnText
Swap "¦ " with char(166):" " in ReturnText
Swap "§ " with char(167):" " in ReturnText
Swap "¨ " with char(168):" " in ReturnText
Swap "© " with char(169):" " in ReturnText
Swap "ª " with char(170):" " in ReturnText
Swap "« " with char(171):" " in ReturnText
Swap "¬ " with char(172):" " in ReturnText
Swap "­ " with char(173):" " in ReturnText
Swap "® " with char(174):" " in ReturnText
Swap "¯ " with char(175):" " in ReturnText
Swap "° " with char(176):" " in ReturnText
Swap "± " with char(177):" " in ReturnText
Swap "² " with char(178):" " in ReturnText
Swap "³ " with char(179):" " in ReturnText
Swap "´ " with char(180):" " in ReturnText
Swap "µ " with char(181):" " in ReturnText
Swap "¶ " with char(182):" " in ReturnText
Swap "· " with char(183):" " in ReturnText
Swap "¸ " with char(184):" " in ReturnText
Swap "¹ " with char(185):" " in ReturnText
Swap "º " with char(186):" " in ReturnText
Swap "» " with char(187):" " in ReturnText
Swap "¼ " with char(188):" " in ReturnText
Swap "½ " with char(189):" " in ReturnText
Swap "¾ " with char(190):" " in ReturnText
Swap "¿ " with char(191):" " in ReturnText
Swap "À " with char(192):" " in ReturnText
Swap "Á " with char(193):" " in ReturnText
Swap "Â " with char(194):" " in ReturnText
Swap "Ã " with char(195):" " in ReturnText
Swap "Ä " with char(196):" " in ReturnText
Swap "Å " with char(197):" " in ReturnText
Swap "Æ " with char(198):" " in ReturnText
Swap "Ç " with char(199):" " in ReturnText
Swap "È " with char(200):" " in ReturnText
Swap "É " with char(201):" " in ReturnText
Swap "Ê " with char(202):" " in ReturnText
Swap "Ë " with char(203):" " in ReturnText
Swap "Ì " with char(204):" " in ReturnText
Swap "Í " with char(205):" " in ReturnText
Swap "Î " with char(206):" " in ReturnText
Swap "Ï " with char(207):" " in ReturnText
Swap "Ð " with char(208):" " in ReturnText
Swap "Ñ " with char(209):" " in ReturnText
Swap "Ò " with char(210):" " in ReturnText
Swap "Ó " with char(211):" " in ReturnText
Swap "Ô " with char(212):" " in ReturnText
Swap "Õ " with char(213):" " in ReturnText
Swap "Ö " with char(214):" " in ReturnText
Swap "× " with char(215):" " in ReturnText
Swap "Ø " with char(216):" " in ReturnText
Swap "Ù " with char(217):" " in ReturnText
Swap "Ú " with char(218):" " in ReturnText
Swap "Û " with char(219):" " in ReturnText
Swap "Ü " with char(220):" " in ReturnText
Swap "Ý " with char(221):" " in ReturnText
Swap "Þ " with char(222):" " in ReturnText
Swap "ß " with char(223):" " in ReturnText
Swap "à " with char(224):" " in ReturnText
Swap "á " with char(225):" " in ReturnText
Swap "â " with char(226):" " in ReturnText
Swap "ã " with char(227):" " in ReturnText
Swap "ä " with char(228):" " in ReturnText
Swap "å " with char(229):" " in ReturnText
Swap "æ " with char(230):" " in ReturnText
Swap "ç " with char(231):" " in ReturnText
Swap "è " with char(232):" " in ReturnText
Swap "é " with char(233):" " in ReturnText
Swap "ê " with char(234):" " in ReturnText
Swap "ë " with char(235):" " in ReturnText
Swap "ì " with char(236):" " in ReturnText
Swap "í " with char(237):" " in ReturnText
Swap "î " with char(238):" " in ReturnText
Swap "ï " with char(239):" " in ReturnText
Swap "ð " with char(240):" " in ReturnText
Swap "ñ " with char(241):" " in ReturnText
Swap "ò " with char(242):" " in ReturnText
Swap "ó " with char(243):" " in ReturnText
Swap "ô " with char(244):" " in ReturnText
Swap "õ " with char(245):" " in ReturnText
Swap "ö " with char(246):" " in ReturnText
Swap "÷ " with char(247):" " in ReturnText
Swap "ø " with char(248):" " in ReturnText
Swap "ù " with char(249):" " in ReturnText
Swap "ú " with char(250):" " in ReturnText
Swap "û " with char(251):" " in ReturnText
Swap "ü " with char(252):" " in ReturnText
Swap "ý " with char(253):" " in ReturnText
Swap "þ " with char(254):" " in ReturnText
Swap "ÿ " with char(255):" " in ReturnText
*
Swap " " with char(160) in ReturnText
Swap "¡" with char(161) in ReturnText
Swap "¢" with char(162) in ReturnText
Swap "£" with char(163) in ReturnText
Swap "¤ " with char(164) in ReturnText
Swap "¥" with char(165) in ReturnText
Swap "¦" with char(166) in ReturnText
Swap "§" with char(167) in ReturnText
Swap "¨" with char(168) in ReturnText
Swap "©" with char(169) in ReturnText
Swap "ª" with char(170) in ReturnText
Swap "«" with char(171) in ReturnText
Swap "¬" with char(172) in ReturnText
Swap "­" with char(173) in ReturnText
Swap "®" with char(174) in ReturnText
Swap "¯" with char(175) in ReturnText
Swap "°" with char(176) in ReturnText
Swap "±" with char(177) in ReturnText
Swap "²" with char(178) in ReturnText
Swap "³" with char(179) in ReturnText
Swap "´" with char(180) in ReturnText
Swap "µ" with char(181) in ReturnText
Swap "¶" with char(182) in ReturnText
Swap "·" with char(183) in ReturnText
Swap "¸" with char(184) in ReturnText
Swap "¹" with char(185) in ReturnText
Swap "º" with char(186) in ReturnText
Swap "»" with char(187) in ReturnText
Swap "¼" with char(188) in ReturnText
Swap "½" with char(189) in ReturnText
Swap "¾" with char(190) in ReturnText
Swap "¿" with char(191) in ReturnText
Swap "À" with char(192) in ReturnText
Swap "Á" with char(193) in ReturnText
Swap "Â" with char(194) in ReturnText
Swap "Ã" with char(195) in ReturnText
Swap "Ä" with char(196) in ReturnText
Swap "Å" with char(197) in ReturnText
Swap "Æ" with char(198) in ReturnText
Swap "Ç" with char(199) in ReturnText
Swap "È" with char(200) in ReturnText
Swap "É" with char(201) in ReturnText
Swap "Ê" with char(202) in ReturnText
Swap "Ë" with char(203) in ReturnText
Swap "Ì" with char(204) in ReturnText
Swap "Í" with char(205) in ReturnText
Swap "Î" with char(206) in ReturnText
Swap "Ï" with char(207) in ReturnText
Swap "Ð" with char(208) in ReturnText
Swap "Ñ" with char(209) in ReturnText
Swap "Ò" with char(210) in ReturnText
Swap "Ó" with char(211) in ReturnText
Swap "Ô" with char(212) in ReturnText
Swap "Õ" with char(213) in ReturnText
Swap "Ö" with char(214) in ReturnText
Swap "×" with char(215) in ReturnText
Swap "Ø" with char(216) in ReturnText
Swap "Ù" with char(217) in ReturnText
Swap "Ú" with char(218) in ReturnText
Swap "Û" with char(219) in ReturnText
Swap "Ü" with char(220) in ReturnText
Swap "Ý" with char(221) in ReturnText
Swap "Þ" with char(222) in ReturnText
Swap "ß" with char(223) in ReturnText
Swap "à" with char(224) in ReturnText
Swap "á" with char(225) in ReturnText
Swap "â" with char(226) in ReturnText
Swap "ã" with char(227) in ReturnText
Swap "ä" with char(228) in ReturnText
Swap "å" with char(229) in ReturnText
Swap "æ" with char(230) in ReturnText
Swap "ç" with char(231) in ReturnText
Swap "è" with char(232) in ReturnText
Swap "é" with char(233) in ReturnText
Swap "ê" with char(234) in ReturnText
Swap "ë" with char(235) in ReturnText
Swap "ì" with char(236) in ReturnText
Swap "í" with char(237) in ReturnText
Swap "î" with char(238) in ReturnText
Swap "ï" with char(239) in ReturnText
Swap "ð" with char(240) in ReturnText
Swap "ñ" with char(241) in ReturnText
Swap "ò" with char(242) in ReturnText
Swap "ó" with char(243) in ReturnText
Swap "ô" with char(244) in ReturnText
Swap "õ" with char(245) in ReturnText
Swap "ö" with char(246) in ReturnText
Swap "÷" with char(247) in ReturnText
Swap "ø" with char(248) in ReturnText
Swap "ù" with char(249) in ReturnText
Swap "ú" with char(250) in ReturnText
Swap "û" with char(251) in ReturnText
Swap "ü" with char(252) in ReturnText
Swap "ý" with char(253) in ReturnText
Swap "þ" with char(254) in ReturnText
Swap "ÿ" with char(255) in ReturnText
*
For I=160 to 255
Swap "&#" : I : " " with Char(I) : " " in ReturnTextSwap "&#" : I : ";" with Char(I) in ReturnTextNext I
Return