Hello,
Im sure Ive seen the system function SOUNDEX in Arev somewhere, but can not find it now. Is there one or should I create one using the source code provided in OI7 help file?
Phil
As I recall it was in one of the Developer Utilities (these can be downloaded from the site) and they were called "Metaphone" not Soundex.
Thanks,
I knew it was around somewhere.
Looks a bit more complicated than the one Ive just written, less standard in fact. But will compare.
Phil
There was a SOUNDEX oconv from Cogent Systems that was either published in the old Revelations magazine, available for download from the BBS/Compuserve. It may have included in one of the utility disks at sometime.
This is what I have. I don't recall of the commented outlines were my doing or not.
SUBROUTINE SOUNDEX(TYPE,DATA.TO.CONVERT,SUBR.LABEL,RETURNED.DATA) *-------------------------------------------------------------------------* * Title : Soundex Conversion * Author : Cogent Information Systems * Date : 12/87 *-------------------------------------------------------------------------* * Purpose : This program converts a word to a soundex code. It is used as a * conversion subroutine to be used with the R/Basic ICONV function. *-------------------------------------------------------------------------* * BEGIN CASE CASE TYPE EQ 'OCONV' CONVERT @LOWER.CASE TO @UPPER.CASE IN DATA.TO.CONVERT FIRST.LETTER=DATA.TO.CONVERT1,1 * CODE=DATA.TO.CONVERT2,LEN(DATA.TO.CONVERT)-1 CODE=DATA.TO.CONVERT CONVERT "1234567890-=;'`\,./AIEOUWHY!@#$%^&*()_+{}:~|?":'"' TO '' IN CODE CONVERT "BPFVCKQJGSXZDTLMNR" TO "112233344555667889" IN CODE * CODE=FMT(CODE1,3,'R(0)#3') * RETURNED.DATA=FIRST.LETTER:CODE RETURNED.DATA=FMT(CODE1,4,'R(0)#4') CASE TYPE=ICONV' * ICONV is not used for this item END CASE RETURN
You might want to consider additional items…
1) Remove trailing "S"s.
2) Remove pairs of letters.
3) Remove prefixing "0"s (zeros).
Or this from DISK1 of REVMEDIA (available FOC from http://www.sprezzatura.com/downloads.htm). I'm sure if I wrote it again I'd structure it a bit differently but this was 15 years ago
![]()
SUBROUTINE SOUNDEX(TYPE,PASSED.VALUE,BRANCH,RETURNED.VALUE) * * Author AMcA * Purpose To provide a user defined conversion for use with symbolic dict * items. This routine will return a SOUNDEX code for a passed * word. It is suggested that this routine only be used with single * word fields like surname. Assuming that surname were in field * 1 of @RECORD and that this subroutine were catalogued as SOUNDEX * the symbolic dict item would have SOUNDEX in the oconv field * and @ANS=ICONV(@RECORD<1>,"SOUNDEX") in the formula. This * field could be BTREEd for maxiumn efficiency. * * Notes To produce a Soundex code for a passed word according to the * Oxford Name Compression Algorithm (ONCA). The steps in this * Soundexing algorithm are a modified form of those suggested by * the Unit of Clinical Epidemiology at Oxford. The preprocessing * stage is based on a modified form of the NYSIIS (New York State * Information and Intelligence System) Code used for record * linkage studies by Statistics Canada. * * A soundex routine when passed a word returns a code for that * word. This code then can be compared to the code produced by * another word to see if they are identical. If they are, the * words "sound alike". This has obvious applications with large * mailing databases and the like. * * There are several variants of the soundexing algorithm, some * more efficient than others. The "Standard" algorithm, the * "Russell Soundex Code" misses common variations such as THOMSON * and THOMPSON. The Soundex presented here is very accurate eg * * Name Code * SMITH S530 * SMITS S530 * SCHMITZ S530 * SCHMIDT S530 * * Obviously, fine tuning of the preprocessing module can be used * to take account of foreign pronunciations et al * * Copyright 1989 Sprezzatura Ltd. Permission is given for * commercial and non-commercial use of this subroutine by * legitimate Revmedia subscribers. No warranty express or * implied, no responsibility accepted.
* * Set up equivalence groups for later use * EQU GROUP$ TO "111122222222334556" EQU POSS.CHAR$ TO "BFPVCGJKQSXZDTLMNR" EQU FALSE$ TO 0 EQU TRUE$ TO 1 CODE=""
BEGIN CASE CASE TYPE="ICONV" WORD=PASSED.VALUE GOSUB INITIALISE ; * Set up swap arrays and such GOSUB PREPROCESS ; * Preprocess word to swap "Sounds-like" strings for * other more common ones GOSUB SOUNDEX ; * Main Soundexing filter RETURNED.VALUE=WORD CASE 1 * * Should be OCONV, as soundexing is one way do not attempt to restore * to original value just display Soundex code * RETURNED.VALUE=PASSED.VALUE END CASE RETURN
INITIALISE: WORD=TRIM(WORD) CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD * * Set up swap array for later processing * SECONDARY.COUNTER=11 DIM SS(SECONDARY.COUNTER) * * First field is value to swap, second is value to swap it with * SS(1)="EV" : @FM : "AF" SS(2)="MP" : @FM : "N" SS(3)="KN" : @FM : "N" SS(4)="LM" : @FM : "N" SS(5)="SH" : @FM : "S" SS(6)="EIGH" : @FM : "A" SS(7)="PH" : @FM : "F" SS(8)="SCH" : @FM : "S" SS(9)="GHT" : @FM : "T" SS(10)="ATCH" : @FM : "ACH" SS(11)="OARE" : @FM : "AWE" GOSUB SET.UP.VARS
RETURN
PREPROCESS: * GOSUB BEGINNING ; * Firstly transform character strings at beginning of name GOSUB ENDING ; * Now strings at end of name GOSUB REMAINING ; * Now transform all remaining letters starting with 2nd * * Now remove last letter if a or s * IF INDEX("AS",WORD-1,1,1) THEN WORD-1,1="" *
RETURN
SET.UP.VARS: * FIRST.3=WORD1,3 FIRST.2=WORD1,2 FIRST.1=WORD1,1 LAST.2=WORD-2,2 RETURN
SOUNDEX: FIRST.LETTER=WORD1,1 WORD=WORD2,999 CONVERT "AWHY " TO " " IN WORD ; * Leave spaces to indicate that a vowel * used to be here. Used later when the * system deletes multiple adjacent * equivalence class values when original * consonants not separated by vowel * EG dad –> 33 but dd –> 3 * * Remove adjacent identical letters * WORD.LEN=LEN(WORD) NEW.WORD="" LAST.LETTER="" FOR X=1 TO WORD.LEN IF WORD1,1 # LAST.LETTER THEN LAST.LETTER=WORD1,1 NEW.WORD := LAST.LETTER END WORD1,1="" NEXT WORD=NEW.WORD * * Now do equivalence group transformation * CONVERT POSS.CHAR$ TO GROUP$ IN WORD * * Now delete multiple adjacent equivalence groups as described above * WORD.LEN=LEN(WORD) NEW.WORD="" LAST.LETTER="" FOR X=1 TO WORD.LEN UNTIL LEN(NEW.WORD)=3 IF WORD1,1 # LAST.LETTER THEN LAST.LETTER=WORD1,1 IF LAST.LETTER # " " THEN NEW.WORD := LAST.LETTER END WORD1,1="" NEXT WORD=FIRST.LETTER : NEW.WORD "L(0)#3" ; * ensure that code of 4 chars RETURN
BEGINNING: BEGIN CASE CASE FIRST.3="MAC" OR FIRST.3="MAK" WORD1,3="MC" CASE FIRST.3="SCH" WORD1,3="S" CASE FIRST.2="WR" WORD1,2="R" CASE FIRST.2="KN" WORD1,2="N" CASE FIRST.2="RH" WORD1,2="R" CASE FIRST.2="PH" WORD1,2="F" CASE FIRST.2="DG" WORD1,2="G" CASE FIRST.2="XH" WORD1,2="K" ; * Xhosa etc CASE FIRST.1="K" WORD1,1="C" CASE FIRST.1="X" WORD1,1="Z" END CASE RETURN
ENDING: BEGIN CASE CASE LAST.2="EE" OR LAST.2="IE" OR LAST.2="YE" WORD-2,2="Y" CASE LAST.2="DT" OR LAST.2="RT" OR LAST.2="RD" WORD-2,2="D" CASE LAST.2="NT" OR LAST.2="ND" WORD-2,2="N" CASE LAST.2="IX" WORD-2,3="ICK" CASE LAST.2="EX" WORD-2,3="ECK" END CASE IF INDEX("SZ",WORD-1,1,1) THEN WORD-1,1="" RETURN
REMAINING: FIRST.LETTER=WORD1,1 WORD=WORD2,99999 GOSUB SWAP.SWAPS CONVERT "QMZK" TO "GNSC" IN WORD CONVERT "1234567890" TO "" IN WORD CONVERT "AEIOU" TO "AAAAA" IN WORD * * Now remove all occurences of h in *h* where h is not a vowel * SWAP "HA" WITH "!" IN WORD SWAP "AH" WITH "#" IN WORD REST=WORD1,LEN(WORD)-1 CONVERT "H" TO "" IN REST WORD=REST : WORD-1,1 SWAP "!" WITH "HA" IN WORD SWAP "#" WITH "AH" IN WORD WORD=FIRST.LETTER : WORD RETURN
SWAP.SWAPS: EXIT.SET=FALSE$ LOOP UNTIL EXIT.SET * * Swap all potentially confusing strings with easier equivalents * OLD.WORD=WORD FOR X=1 TO SECONDARY.COUNTER IF INDEX(WORD,SS(X)<1>,1) THEN SWAP SS(X)<1> WITH SS(X)<2> IN WORD END NEXT IF WORD=OLD.WORD THEN EXIT.SET=TRUE$ REPEAT RETURN
World Leaders in all Things RevSoft
I found the SOUNDEX on the util disk, but wrote my own in the end:
FUNCTION SOUNDSLIKE(TEXT1,TEXT2)
EQUATE SOUNDCODES TO "0123012_02245501262301_202" ;* ignores W & H
TEXT=TEXT1
GOSUB SOUNDEX
CODE1=SOUNDEXCODE
IF ASSIGNED(TEXT2) THEN
TEXT=TEXT2
GOSUB SOUNDEX
CODE2=SOUNDEXCODE
RETURNDATA=(CODE1=CODE2)
END ELSE RETURNDATA=CODE1
RETURN RETURNDATA
SOUNDEX:
WORD=TRIM(TEXT)1," "
CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD
TEMP=WORD
CONVERT @UPPER.CASE TO "" IN TEMP
CONVERT TEMP TO "" IN WORD ;* remove the crap
* Extended logic follows - may add more…
IF WORD1,2=PS" THEN WORD1,2=S"
IF WORD1,2=PF" THEN WORD1,2=F"
SWAP "DG" WITH "G" IN WORD
SWAP "GH" WITH "H" IN WORD
SWAP "KN" WITH "N" IN WORD
SWAP "GN" WITH "N" IN WORD
SWAP "MB" WITH "M" IN WORD
SWAP "PH" WITH "F" IN WORD
SWAP "TCH" WITH "CH" IN WORD
SWAP "MPS" WITH "MS" IN WORD
SWAP "MPT" WITH "MT" IN WORD
SWAP "MPZ" WITH "MZ" IN WORD
SOUNDEX=WORD
CONVERT @UPPER.CASE TO SOUNDCODES IN SOUNDEX
LASTCODE=SOUNDEX1,1
SOUNDEXCODE=WORD1,1
FOR POS=2 TO LEN(SOUNDEX) WHILE LEN(SOUNDEXCODE) LT 4
CODE=SOUNDEXPOS,1
IF CODE NE "_" THEN
IF CODE NE "0" AND CODE NE LASTCODE THEN SOUNDEXCODE:=CODELASTCODE=CODEEND
NEXT POS
SOUNDEXCODE=SOUNDEXCODE "L(0)#4"
RETURN
SUBROUTINE tools_soundex(ARG,S.RES)
!——————————————————————–
! A Simple SOUNDEX Algorithm
!——————————————————————–
!
! Format: SOUNDEX(string,result)
!
! Purpose: To Convert a string into is SOUNDEX equivalent.
! An Alphabetic text string is converted into a phonetic
! equivalent by using the published "SOUNDEX" algorithm.
!
! Remarks: A soundex equivalent is a four character code in the
! form of 'Annn' where:
!
! A=the first letter of the input string.
! n=a number from the following table:
!
! 1 ) THEN
S.TBL=BFPV" ; S.TBL=CGJKQSXZ"S.TBL=DT" ; S.TBL=L"S.TBL=MN" ; S.TBL=R"S.RES=ARG1,1FOR S.PNTR=2 TO LEN(ARG)FOR S.LVL=1 TO 6IF INDEX(S.TBL,ARGS.PNTR,1,1) ] 0 AND ARGS.PNTR-1,1 # ARGS.PNTR,1 AND LEN(S.RES) =S.RES:S.LVLENDNEXT S.LVLNEXT S.PNTR
Pad With 0's if ID ) LT 4 THENS.PAD=4-LEN(S.RES)S.RES=S.RES:STR("0",S.PAD)ENDEND ELSES.RES=ARGENDNEXT X
RETURN