{{tag>category:"AREV Specific"}}
[[https://www.revelation.com/|Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community]]
==== SOUNDEX (AREV Specific) ====
=== At 18 MAR 2004 03:58:39PM ps wing wrote: ===
Hello,
Im sure Ive seen the system function SOUNDEX in Arev somewhere, but can not find it now. Is there one or should I create one using the source code provided in OI7 help file?
Phil
----
=== At 18 MAR 2004 06:11PM Ralph Johler wrote: ===
As I recall it was in one of the Developer Utilities (these can be downloaded from the site) and they were called "Metaphone" not Soundex.
----
=== At 18 MAR 2004 06:38PM ps wing wrote: ===
Thanks,
I knew it was around somewhere.
Looks a bit more complicated than the one Ive just written, less standard in fact. But will compare.
Phil
----
=== At 22 MAR 2004 12:46PM Warren wrote: ===
There was a SOUNDEX oconv from Cogent Systems that was either published in the old Revelations magazine, available for download from the BBS/Compuserve. It may have included in one of the utility disks at sometime.
This is what I have. I don't recall of the commented outlines were my doing or not.
SUBROUTINE SOUNDEX(TYPE,DATA.TO.CONVERT,SUBR.LABEL,RETURNED.DATA)
*-------------------------------------------------------------------------*
* Title : Soundex Conversion
* Author : Cogent Information Systems
* Date : 12/87
*-------------------------------------------------------------------------*
* Purpose : This program converts a word to a soundex code. It is used as a
* conversion subroutine to be used with the R/Basic ICONV function.
*-------------------------------------------------------------------------*
*
BEGIN CASE
CASE TYPE EQ 'OCONV'
CONVERT @LOWER.CASE TO @UPPER.CASE IN DATA.TO.CONVERT
FIRST.LETTER=DATA.TO.CONVERT1,1
* CODE=DATA.TO.CONVERT2,LEN(DATA.TO.CONVERT)-1
CODE=DATA.TO.CONVERT
CONVERT "1234567890-=;'`\,./AIEOUWHY!@#$%^&*()_+{}:~|?":'"' TO '' IN CODE
CONVERT "BPFVCKQJGSXZDTLMNR" TO "112233344555667889" IN CODE
* CODE=FMT(CODE1,3,'R(0)#3')
* RETURNED.DATA=FIRST.LETTER:CODE
RETURNED.DATA=FMT(CODE1,4,'R(0)#4')
CASE TYPE=ICONV'
* ICONV is not used for this item
END CASE
RETURN
----
=== At 22 MAR 2004 01:23PM Richard Hunt wrote: ===
You might want to consider additional items...
1) Remove trailing "S"s.
2) Remove pairs of letters.
3) Remove prefixing "0"s (zeros).
----
=== At 22 MAR 2004 02:20PM The Sprezzatura Group wrote: ===
[size=2]Or this from DISK1 of REVMEDIA (available FOC
from [url=http://www.sprezzatura.com/downloads.htm]
http://www.sprezzatura.com/downloads.htm[/url]). I'm sure if I wrote it again I'd
structure it a bit differently but this was 15 years ago ;-)[/size]
[size=2]SUBROUTINE
SOUNDEX(TYPE,PASSED.VALUE,BRANCH,RETURNED.VALUE)
*
* Author AMcA
* Purpose To provide a user defined conversion for use with symbolic dict
* items. This routine will return a SOUNDEX code for a passed
* word. It is suggested that this routine only be used with single
* word fields like surname. Assuming that surname were in field
* 1 of @RECORD and that this subroutine were catalogued as SOUNDEX
* the symbolic dict item would have SOUNDEX in the oconv field
* and @ANS=ICONV(@RECORD<1>,"SOUNDEX") in the formula. This
* field could be BTREEd for maxiumn efficiency.
*
* Notes To produce a Soundex code for a passed word according to the
* Oxford Name Compression Algorithm (ONCA). The steps in this
* Soundexing algorithm are a modified form of those suggested by
* the Unit of Clinical Epidemiology at Oxford. The preprocessing
* stage is based on a modified form of the NYSIIS (New York State
* Information and Intelligence System) Code used for record
* linkage studies by Statistics Canada.
*
* A soundex routine when passed a word returns a code for that
* word. This code then can be compared to the code produced by
* another word to see if they are identical. If they are, the
* words "sound alike". This has obvious applications with large
* mailing databases and the like.
*
* There are several variants of the soundexing algorithm, some
* more efficient than others. The "Standard" algorithm, the
* "Russell Soundex Code" misses common variations such as THOMSON
* and THOMPSON. The Soundex presented here is very accurate eg
*
* Name Code
* SMITH S530
* SMITS S530
* SCHMITZ S530
* SCHMIDT S530
*
* Obviously, fine tuning of the preprocessing module can be used
* to take account of foreign pronunciations et al
*
* Copyright 1989 Sprezzatura Ltd. Permission is given for
* commercial and non-commercial use of this subroutine by
* legitimate Revmedia subscribers. No warranty express or
* implied, no responsibility accepted. [/size]
[size=2] *
* Set up equivalence groups for later use
*
EQU GROUP$ TO "111122222222334556"
EQU POSS.CHAR$ TO "BFPVCGJKQSXZDTLMNR"
EQU FALSE$ TO 0
EQU TRUE$ TO 1
CODE="" [/size]
[size=2] BEGIN CASE
CASE TYPE="ICONV"
WORD=PASSED.VALUE
GOSUB INITIALISE ; * Set up swap arrays and such
GOSUB PREPROCESS ; * Preprocess word to swap "Sounds-like" strings for
* other more common ones
GOSUB SOUNDEX ; * Main Soundexing filter
RETURNED.VALUE=WORD
CASE 1
*
* Should be OCONV, as soundexing is one way do not attempt to restore
* to original value just display Soundex code
*
RETURNED.VALUE=PASSED.VALUE
END CASE
RETURN [/size]
[size=2]INITIALISE:
WORD=TRIM(WORD)
CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD
*
* Set up swap array for later processing
*
SECONDARY.COUNTER=11
DIM SS(SECONDARY.COUNTER)
*
* First field is value to swap, second is value to swap it with
*
SS(1)="EV" : @FM : "AF"
SS(2)="MP" : @FM : "N"
SS(3)="KN" : @FM : "N"
SS(4)="LM" : @FM : "N"
SS(5)="SH" : @FM : "S"
SS(6)="EIGH" : @FM : "A"
SS(7)="PH" : @FM : "F"
SS(8)="SCH" : @FM : "S"
SS(9)="GHT" : @FM : "T"
SS(10)="ATCH" : @FM : "ACH"
SS(11)="OARE" : @FM : "AWE"
GOSUB SET.UP.VARS[/size]
[size=2]RETURN [/size]
[size=2]PREPROCESS:
*
GOSUB BEGINNING ; * Firstly transform character strings at beginning of name
GOSUB ENDING ; * Now strings at end of name
GOSUB REMAINING ; * Now transform all remaining letters starting with 2nd
*
* Now remove last letter if a or s
*
IF INDEX("AS",WORD-1,1,1) THEN WORD-1,1=""
*[/size]
[size=2]RETURN [/size]
[size=2]SET.UP.VARS:
*
FIRST.3=WORD1,3
FIRST.2=WORD1,2
FIRST.1=WORD1,1
LAST.2=WORD-2,2
RETURN [/size]
[size=2]SOUNDEX:
FIRST.LETTER=WORD1,1
WORD=WORD2,999
CONVERT "AWHY " TO " " IN WORD ; * Leave spaces to indicate that a vowel
* used to be here. Used later when the
* system deletes multiple adjacent
* equivalence class values when original
* consonants not separated by vowel
* EG dad --> 33 but dd --> 3
*
* Remove adjacent identical letters
*
WORD.LEN=LEN(WORD)
NEW.WORD=""
LAST.LETTER=""
FOR X=1 TO WORD.LEN
IF WORD1,1 # LAST.LETTER THEN
LAST.LETTER=WORD1,1
NEW.WORD := LAST.LETTER
END
WORD1,1=""
NEXT
WORD=NEW.WORD
*
* Now do equivalence group transformation
*
CONVERT POSS.CHAR$ TO GROUP$ IN WORD
*
* Now delete multiple adjacent equivalence groups as described above
*
WORD.LEN=LEN(WORD)
NEW.WORD=""
LAST.LETTER=""
FOR X=1 TO WORD.LEN
UNTIL LEN(NEW.WORD)=3
IF WORD1,1 # LAST.LETTER THEN
LAST.LETTER=WORD1,1
IF LAST.LETTER # " " THEN NEW.WORD := LAST.LETTER
END
WORD1,1=""
NEXT
WORD=FIRST.LETTER : NEW.WORD "L(0)#3" ; * ensure that code of 4 chars
RETURN [/size]
[size=2]BEGINNING:
BEGIN CASE
CASE FIRST.3="MAC" OR FIRST.3="MAK"
WORD1,3="MC"
CASE FIRST.3="SCH"
WORD1,3="S"
CASE FIRST.2="WR"
WORD1,2="R"
CASE FIRST.2="KN"
WORD1,2="N"
CASE FIRST.2="RH"
WORD1,2="R"
CASE FIRST.2="PH"
WORD1,2="F"
CASE FIRST.2="DG"
WORD1,2="G"
CASE FIRST.2="XH"
WORD1,2="K" ; * Xhosa etc
CASE FIRST.1="K"
WORD1,1="C"
CASE FIRST.1="X"
WORD1,1="Z"
END CASE
RETURN [/size]
[size=2]ENDING:
BEGIN CASE
CASE LAST.2="EE" OR LAST.2="IE" OR LAST.2="YE"
WORD-2,2="Y"
CASE LAST.2="DT" OR LAST.2="RT" OR LAST.2="RD"
WORD-2,2="D"
CASE LAST.2="NT" OR LAST.2="ND"
WORD-2,2="N"
CASE LAST.2="IX"
WORD-2,3="ICK"
CASE LAST.2="EX"
WORD-2,3="ECK"
END CASE
IF INDEX("SZ",WORD-1,1,1) THEN WORD-1,1=""
RETURN [/size]
[size=2]REMAINING:
FIRST.LETTER=WORD1,1
WORD=WORD2,99999
GOSUB SWAP.SWAPS
CONVERT "QMZK" TO "GNSC" IN WORD
CONVERT "1234567890" TO "" IN WORD
CONVERT "AEIOU" TO "AAAAA" IN WORD
*
* Now remove all occurences of h in *h* where h is not a vowel
*
SWAP "HA" WITH "!" IN WORD
SWAP "AH" WITH "#" IN WORD
REST=WORD1,LEN(WORD)-1
CONVERT "H" TO "" IN REST
WORD=REST : WORD-1,1
SWAP "!" WITH "HA" IN WORD
SWAP "#" WITH "AH" IN WORD
WORD=FIRST.LETTER : WORD
RETURN [/size]
[size=2]SWAP.SWAPS:
EXIT.SET=FALSE$
LOOP
UNTIL EXIT.SET
*
* Swap all potentially confusing strings with easier equivalents
*
OLD.WORD=WORD
FOR X=1 TO SECONDARY.COUNTER
IF INDEX(WORD,SS(X)<1>,1) THEN
SWAP SS(X)<1> WITH SS(X)<2> IN WORD
END
NEXT
IF WORD=OLD.WORD THEN EXIT.SET=TRUE$
REPEAT
RETURN
[/size]
[url=http://www.sprezzatura.com]The Sprezzatura Group[/url]
[i]World Leaders in all Things RevSoft[/i]
[img]http://www.sprezzatura.com/zz.gif[/img]
----
=== At 24 MAR 2004 11:43PM ps wing wrote: ===
I found the SOUNDEX on the util disk, but wrote my own in the end:
FUNCTION SOUNDSLIKE(TEXT1,TEXT2)
EQUATE SOUNDCODES TO "0123012_02245501262301_202" ;* ignores W & H
TEXT=TEXT1
GOSUB SOUNDEX
CODE1=SOUNDEXCODE
IF ASSIGNED(TEXT2) THEN
TEXT=TEXT2
GOSUB SOUNDEX
CODE2=SOUNDEXCODE
RETURNDATA=(CODE1=CODE2)
END ELSE RETURNDATA=CODE1
RETURN RETURNDATA
SOUNDEX:
WORD=TRIM(TEXT)1," "
CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD
TEMP=WORD
CONVERT @UPPER.CASE TO "" IN TEMP
CONVERT TEMP TO "" IN WORD ;* remove the crap
* Extended logic follows - may add more...
IF WORD1,2=PS" THEN WORD1,2=S"
IF WORD1,2=PF" THEN WORD1,2=F"
SWAP "DG" WITH "G" IN WORD
SWAP "GH" WITH "H" IN WORD
SWAP "KN" WITH "N" IN WORD
SWAP "GN" WITH "N" IN WORD
SWAP "MB" WITH "M" IN WORD
SWAP "PH" WITH "F" IN WORD
SWAP "TCH" WITH "CH" IN WORD
SWAP "MPS" WITH "MS" IN WORD
SWAP "MPT" WITH "MT" IN WORD
SWAP "MPZ" WITH "MZ" IN WORD
************************
SOUNDEX=WORD
CONVERT @UPPER.CASE TO SOUNDCODES IN SOUNDEX
LASTCODE=SOUNDEX1,1
SOUNDEXCODE=WORD1,1
FOR POS=2 TO LEN(SOUNDEX) WHILE LEN(SOUNDEXCODE) LT 4
CODE=SOUNDEXPOS,1
IF CODE NE "_" THEN
IF CODE NE "0" AND CODE NE LASTCODE THEN SOUNDEXCODE:=CODE
LASTCODE=CODE
END
NEXT POS
SOUNDEXCODE=SOUNDEXCODE "L(0)#4"
RETURN
----
=== At 04 NOV 2009 11:10AM Dale Jessop wrote: ===
SUBROUTINE tools_soundex(ARG,S.RES)
!--------------------------------------------------------------------
! A Simple SOUNDEX Algorithm
!--------------------------------------------------------------------
!
! Format: SOUNDEX(string,result)
!
! Purpose: To Convert a string into is SOUNDEX equivalent.
! An Alphabetic text string is converted into a phonetic
! equivalent by using the published "SOUNDEX" algorithm.
!
! Remarks: A soundex equivalent is a four character code in the
! form of 'Annn' where:
!
! A=the first letter of the input string.
! n=a number from the following table:
!
! 1 ) THEN
S.TBL=BFPV" ; S.TBL=CGJKQSXZ"
S.TBL=DT" ; S.TBL=L"
S.TBL=MN" ; S.TBL=R"
S.RES=ARG1,1
FOR S.PNTR=2 TO LEN(ARG)
FOR S.LVL=1 TO 6
IF INDEX(S.TBL,ARGS.PNTR,1,1) ] 0 AND ARGS.PNTR-1,1 # ARGS.PNTR,1 AND LEN(S.RES) =S.RES:S.LVL
END
NEXT S.LVL
NEXT S.PNTR
* Pad With 0's if ID ) LT 4 THEN
S.PAD=4-LEN(S.RES)
S.RES=S.RES:STR("0",S.PAD)
END
END ELSE
S.RES=ARG
END
NEXT X
RETURN
----
=== At 09 NOV 2009 08:14AM Bob Carten wrote: ===
[[https://www.revelation.com/revweb/oecgi4p.php/O4W_HANDOFF?DESTN=O4W_RUN_FORM&INQID=NONWORKS_READ&SUMMARY=1&KEY=55D823E026CB776685256E5B006DBDB1|View this thread on the forum...]]