{{tag>category:"AREV Specific"}} [[https://www.revelation.com/|Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community]] ==== SOUNDEX (AREV Specific) ==== === At 18 MAR 2004 03:58:39PM ps wing wrote: === Hello, Im sure Ive seen the system function SOUNDEX in Arev somewhere, but can not find it now. Is there one or should I create one using the source code provided in OI7 help file? Phil ---- === At 18 MAR 2004 06:11PM Ralph Johler wrote: === As I recall it was in one of the Developer Utilities (these can be downloaded from the site) and they were called "Metaphone" not Soundex. ---- === At 18 MAR 2004 06:38PM ps wing wrote: === Thanks, I knew it was around somewhere. Looks a bit more complicated than the one Ive just written, less standard in fact. But will compare. Phil ---- === At 22 MAR 2004 12:46PM Warren wrote: === There was a SOUNDEX oconv from Cogent Systems that was either published in the old Revelations magazine, available for download from the BBS/Compuserve. It may have included in one of the utility disks at sometime. This is what I have. I don't recall of the commented outlines were my doing or not. SUBROUTINE SOUNDEX(TYPE,DATA.TO.CONVERT,SUBR.LABEL,RETURNED.DATA) *-------------------------------------------------------------------------* * Title : Soundex Conversion * Author : Cogent Information Systems * Date : 12/87 *-------------------------------------------------------------------------* * Purpose : This program converts a word to a soundex code. It is used as a * conversion subroutine to be used with the R/Basic ICONV function. *-------------------------------------------------------------------------* * BEGIN CASE CASE TYPE EQ 'OCONV' CONVERT @LOWER.CASE TO @UPPER.CASE IN DATA.TO.CONVERT FIRST.LETTER=DATA.TO.CONVERT1,1 * CODE=DATA.TO.CONVERT2,LEN(DATA.TO.CONVERT)-1 CODE=DATA.TO.CONVERT CONVERT "1234567890-=;'`\,./AIEOUWHY!@#$%^&*()_+{}:~|?":'"' TO '' IN CODE CONVERT "BPFVCKQJGSXZDTLMNR" TO "112233344555667889" IN CODE * CODE=FMT(CODE1,3,'R(0)#3') * RETURNED.DATA=FIRST.LETTER:CODE RETURNED.DATA=FMT(CODE1,4,'R(0)#4') CASE TYPE=ICONV' * ICONV is not used for this item END CASE RETURN ---- === At 22 MAR 2004 01:23PM Richard Hunt wrote: === You might want to consider additional items... 1) Remove trailing "S"s. 2) Remove pairs of letters. 3) Remove prefixing "0"s (zeros). ---- === At 22 MAR 2004 02:20PM The Sprezzatura Group wrote: === [size=2]Or this from DISK1 of REVMEDIA (available FOC from [url=http://www.sprezzatura.com/downloads.htm] http://www.sprezzatura.com/downloads.htm[/url]). I'm sure if I wrote it again I'd structure it a bit differently but this was 15 years ago ;-)[/size] [size=2]SUBROUTINE SOUNDEX(TYPE,PASSED.VALUE,BRANCH,RETURNED.VALUE) * * Author AMcA * Purpose To provide a user defined conversion for use with symbolic dict * items. This routine will return a SOUNDEX code for a passed * word. It is suggested that this routine only be used with single * word fields like surname. Assuming that surname were in field * 1 of @RECORD and that this subroutine were catalogued as SOUNDEX * the symbolic dict item would have SOUNDEX in the oconv field * and @ANS=ICONV(@RECORD<1>,"SOUNDEX") in the formula. This * field could be BTREEd for maxiumn efficiency. * * Notes To produce a Soundex code for a passed word according to the * Oxford Name Compression Algorithm (ONCA). The steps in this * Soundexing algorithm are a modified form of those suggested by * the Unit of Clinical Epidemiology at Oxford. The preprocessing * stage is based on a modified form of the NYSIIS (New York State * Information and Intelligence System) Code used for record * linkage studies by Statistics Canada. * * A soundex routine when passed a word returns a code for that * word. This code then can be compared to the code produced by * another word to see if they are identical. If they are, the * words "sound alike". This has obvious applications with large * mailing databases and the like. * * There are several variants of the soundexing algorithm, some * more efficient than others. The "Standard" algorithm, the * "Russell Soundex Code" misses common variations such as THOMSON * and THOMPSON. The Soundex presented here is very accurate eg * * Name Code * SMITH S530 * SMITS S530 * SCHMITZ S530 * SCHMIDT S530 * * Obviously, fine tuning of the preprocessing module can be used * to take account of foreign pronunciations et al * * Copyright 1989 Sprezzatura Ltd. Permission is given for * commercial and non-commercial use of this subroutine by * legitimate Revmedia subscribers. No warranty express or * implied, no responsibility accepted. [/size] [size=2]    *     * Set up equivalence groups for later use     *     EQU GROUP$ TO "111122222222334556"     EQU POSS.CHAR$ TO "BFPVCGJKQSXZDTLMNR"     EQU FALSE$ TO 0     EQU TRUE$ TO 1     CODE="" [/size] [size=2]    BEGIN CASE         CASE TYPE="ICONV"         WORD=PASSED.VALUE         GOSUB INITIALISE ; * Set up swap arrays and such         GOSUB PREPROCESS ; * Preprocess word to swap "Sounds-like" strings for         * other more common ones         GOSUB SOUNDEX ; * Main Soundexing filter         RETURNED.VALUE=WORD     CASE 1         *         * Should be OCONV, as soundexing is one way do not attempt to restore         * to original value just display Soundex code         *         RETURNED.VALUE=PASSED.VALUE     END CASE RETURN [/size] [size=2]INITIALISE:     WORD=TRIM(WORD)     CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD     *     * Set up swap array for later processing     *     SECONDARY.COUNTER=11     DIM SS(SECONDARY.COUNTER)     *     * First field is value to swap, second is value to swap it with     *     SS(1)="EV" : @FM : "AF"     SS(2)="MP" : @FM : "N"     SS(3)="KN" : @FM : "N"     SS(4)="LM" : @FM : "N"     SS(5)="SH" : @FM : "S"     SS(6)="EIGH" : @FM : "A"     SS(7)="PH" : @FM : "F"     SS(8)="SCH" : @FM : "S"     SS(9)="GHT" : @FM : "T"     SS(10)="ATCH" : @FM : "ACH"     SS(11)="OARE" : @FM : "AWE"     GOSUB SET.UP.VARS[/size] [size=2]RETURN [/size] [size=2]PREPROCESS:     *     GOSUB BEGINNING ; * Firstly transform character strings at beginning of name     GOSUB ENDING ; * Now strings at end of name     GOSUB REMAINING ; * Now transform all remaining letters starting with 2nd     *     * Now remove last letter if a or s     *     IF INDEX("AS",WORD-1,1,1) THEN WORD-1,1=""     *[/size] [size=2]RETURN [/size] [size=2]SET.UP.VARS:     *     FIRST.3=WORD1,3     FIRST.2=WORD1,2     FIRST.1=WORD1,1     LAST.2=WORD-2,2 RETURN [/size] [size=2]SOUNDEX:     FIRST.LETTER=WORD1,1     WORD=WORD2,999     CONVERT "AWHY " TO " " IN WORD ; * Leave spaces to indicate that a vowel     * used to be here. Used later when the     * system deletes multiple adjacent     * equivalence class values when original     * consonants not separated by vowel     * EG dad --> 33 but dd --> 3     *     * Remove adjacent identical letters     *     WORD.LEN=LEN(WORD)     NEW.WORD=""     LAST.LETTER=""     FOR X=1 TO WORD.LEN         IF WORD1,1 # LAST.LETTER THEN             LAST.LETTER=WORD1,1             NEW.WORD := LAST.LETTER         END         WORD1,1=""     NEXT     WORD=NEW.WORD     *     * Now do equivalence group transformation     *     CONVERT POSS.CHAR$ TO GROUP$ IN WORD     *     * Now delete multiple adjacent equivalence groups as described above     *     WORD.LEN=LEN(WORD)     NEW.WORD=""     LAST.LETTER=""     FOR X=1 TO WORD.LEN     UNTIL LEN(NEW.WORD)=3         IF WORD1,1 # LAST.LETTER THEN             LAST.LETTER=WORD1,1             IF LAST.LETTER # " " THEN NEW.WORD := LAST.LETTER         END         WORD1,1=""     NEXT     WORD=FIRST.LETTER : NEW.WORD "L(0)#3" ; * ensure that code of 4 chars RETURN [/size] [size=2]BEGINNING:     BEGIN CASE         CASE FIRST.3="MAC" OR FIRST.3="MAK"             WORD1,3="MC"         CASE FIRST.3="SCH"             WORD1,3="S"         CASE FIRST.2="WR"             WORD1,2="R"         CASE FIRST.2="KN"             WORD1,2="N"         CASE FIRST.2="RH"             WORD1,2="R"         CASE FIRST.2="PH"             WORD1,2="F"         CASE FIRST.2="DG"             WORD1,2="G"         CASE FIRST.2="XH"             WORD1,2="K" ; * Xhosa etc         CASE FIRST.1="K"             WORD1,1="C"         CASE FIRST.1="X"             WORD1,1="Z"     END CASE RETURN [/size] [size=2]ENDING:     BEGIN CASE         CASE LAST.2="EE" OR LAST.2="IE" OR LAST.2="YE"             WORD-2,2="Y"         CASE LAST.2="DT" OR LAST.2="RT" OR LAST.2="RD"             WORD-2,2="D"         CASE LAST.2="NT" OR LAST.2="ND"             WORD-2,2="N"         CASE LAST.2="IX"             WORD-2,3="ICK"         CASE LAST.2="EX"             WORD-2,3="ECK"     END CASE     IF INDEX("SZ",WORD-1,1,1) THEN WORD-1,1="" RETURN [/size] [size=2]REMAINING:     FIRST.LETTER=WORD1,1     WORD=WORD2,99999     GOSUB SWAP.SWAPS     CONVERT "QMZK" TO "GNSC" IN WORD     CONVERT "1234567890" TO "" IN WORD     CONVERT "AEIOU" TO "AAAAA" IN WORD     *     * Now remove all occurences of h in *h* where h is not a vowel     *     SWAP "HA" WITH "!" IN WORD     SWAP "AH" WITH "#" IN WORD     REST=WORD1,LEN(WORD)-1     CONVERT "H" TO "" IN REST     WORD=REST : WORD-1,1     SWAP "!" WITH "HA" IN WORD     SWAP "#" WITH "AH" IN WORD     WORD=FIRST.LETTER : WORD RETURN [/size] [size=2]SWAP.SWAPS:     EXIT.SET=FALSE$     LOOP     UNTIL EXIT.SET         *         * Swap all potentially confusing strings with easier equivalents         *         OLD.WORD=WORD         FOR X=1 TO SECONDARY.COUNTER             IF INDEX(WORD,SS(X)<1>,1) THEN                 SWAP SS(X)<1> WITH SS(X)<2> IN WORD             END         NEXT         IF WORD=OLD.WORD THEN EXIT.SET=TRUE$     REPEAT RETURN  [/size] [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] [i]World Leaders in all Things RevSoft[/i] [img]http://www.sprezzatura.com/zz.gif[/img]   ---- === At 24 MAR 2004 11:43PM ps wing wrote: === I found the SOUNDEX on the util disk, but wrote my own in the end: FUNCTION SOUNDSLIKE(TEXT1,TEXT2) EQUATE SOUNDCODES TO "0123012_02245501262301_202" ;* ignores W & H TEXT=TEXT1 GOSUB SOUNDEX CODE1=SOUNDEXCODE IF ASSIGNED(TEXT2) THEN TEXT=TEXT2 GOSUB SOUNDEX CODE2=SOUNDEXCODE RETURNDATA=(CODE1=CODE2) END ELSE RETURNDATA=CODE1 RETURN RETURNDATA SOUNDEX: WORD=TRIM(TEXT)1," " CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD TEMP=WORD CONVERT @UPPER.CASE TO "" IN TEMP CONVERT TEMP TO "" IN WORD ;* remove the crap * Extended logic follows - may add more... IF WORD1,2=PS" THEN WORD1,2=S" IF WORD1,2=PF" THEN WORD1,2=F" SWAP "DG" WITH "G" IN WORD SWAP "GH" WITH "H" IN WORD SWAP "KN" WITH "N" IN WORD SWAP "GN" WITH "N" IN WORD SWAP "MB" WITH "M" IN WORD SWAP "PH" WITH "F" IN WORD SWAP "TCH" WITH "CH" IN WORD SWAP "MPS" WITH "MS" IN WORD SWAP "MPT" WITH "MT" IN WORD SWAP "MPZ" WITH "MZ" IN WORD ************************ SOUNDEX=WORD CONVERT @UPPER.CASE TO SOUNDCODES IN SOUNDEX LASTCODE=SOUNDEX1,1 SOUNDEXCODE=WORD1,1 FOR POS=2 TO LEN(SOUNDEX) WHILE LEN(SOUNDEXCODE) LT 4 CODE=SOUNDEXPOS,1 IF CODE NE "_" THEN IF CODE NE "0" AND CODE NE LASTCODE THEN SOUNDEXCODE:=CODE LASTCODE=CODE END NEXT POS SOUNDEXCODE=SOUNDEXCODE "L(0)#4" RETURN ---- === At 04 NOV 2009 11:10AM Dale Jessop wrote: === SUBROUTINE tools_soundex(ARG,S.RES) !-------------------------------------------------------------------- ! A Simple SOUNDEX Algorithm !-------------------------------------------------------------------- ! ! Format: SOUNDEX(string,result) ! ! Purpose: To Convert a string into is SOUNDEX equivalent. ! An Alphabetic text string is converted into a phonetic ! equivalent by using the published "SOUNDEX" algorithm. ! ! Remarks: A soundex equivalent is a four character code in the ! form of 'Annn' where: ! ! A=the first letter of the input string. ! n=a number from the following table: ! ! 1 ) THEN S.TBL=BFPV" ; S.TBL=CGJKQSXZ" S.TBL=DT" ; S.TBL=L" S.TBL=MN" ; S.TBL=R" S.RES=ARG1,1 FOR S.PNTR=2 TO LEN(ARG) FOR S.LVL=1 TO 6 IF INDEX(S.TBL,ARGS.PNTR,1,1) ] 0 AND ARGS.PNTR-1,1 # ARGS.PNTR,1 AND LEN(S.RES) =S.RES:S.LVL END NEXT S.LVL NEXT S.PNTR * Pad With 0's if ID ) LT 4 THEN S.PAD=4-LEN(S.RES) S.RES=S.RES:STR("0",S.PAD) END END ELSE S.RES=ARG END NEXT X RETURN ---- === At 09 NOV 2009 08:14AM Bob Carten wrote: === [[https://www.revelation.com/revweb/oecgi4p.php/O4W_HANDOFF?DESTN=O4W_RUN_FORM&INQID=NONWORKS_READ&SUMMARY=1&KEY=55D823E026CB776685256E5B006DBDB1|View this thread on the forum...]]