Sign up on the Revelation Software website to have access to the most current content, and to be able to ask questions and get answers from the Revelation community

SOUNDEX (AREV Specific)

At 18 MAR 2004 03:58:39PM ps wing wrote:

Hello,

Im sure Ive seen the system function SOUNDEX in Arev somewhere, but can not find it now. Is there one or should I create one using the source code provided in OI7 help file?

Phil


At 18 MAR 2004 06:11PM Ralph Johler wrote:

As I recall it was in one of the Developer Utilities (these can be downloaded from the site) and they were called "Metaphone" not Soundex.


At 18 MAR 2004 06:38PM ps wing wrote:

Thanks,

I knew it was around somewhere.

Looks a bit more complicated than the one Ive just written, less standard in fact. But will compare.

Phil


At 22 MAR 2004 12:46PM Warren wrote:

There was a SOUNDEX oconv from Cogent Systems that was either published in the old Revelations magazine, available for download from the BBS/Compuserve. It may have included in one of the utility disks at sometime.

This is what I have. I don't recall of the commented outlines were my doing or not.

      SUBROUTINE SOUNDEX(TYPE,DATA.TO.CONVERT,SUBR.LABEL,RETURNED.DATA)
*-------------------------------------------------------------------------*
* Title   : Soundex Conversion
* Author  : Cogent Information Systems
* Date    : 12/87
*-------------------------------------------------------------------------*
* Purpose : This program converts a word to a soundex code. It is used as a
*           conversion subroutine to be used with the R/Basic ICONV function.
*-------------------------------------------------------------------------*
*
      BEGIN CASE
         CASE TYPE EQ 'OCONV'
            CONVERT @LOWER.CASE TO @UPPER.CASE IN DATA.TO.CONVERT
            FIRST.LETTER=DATA.TO.CONVERT1,1
*   CODE=DATA.TO.CONVERT2,LEN(DATA.TO.CONVERT)-1
            CODE=DATA.TO.CONVERT
            CONVERT "1234567890-=;'`\,./AIEOUWHY!@#$%^&*()_+{}:~|?":'"' TO '' IN CODE
            CONVERT "BPFVCKQJGSXZDTLMNR" TO "112233344555667889" IN CODE
*   CODE=FMT(CODE1,3,'R(0)#3')
*   RETURNED.DATA=FIRST.LETTER:CODE
            RETURNED.DATA=FMT(CODE1,4,'R(0)#4')
         CASE TYPE=ICONV'
* ICONV is not used for this item
      END CASE
      RETURN

At 22 MAR 2004 01:23PM Richard Hunt wrote:

You might want to consider additional items…

1) Remove trailing "S"s.

2) Remove pairs of letters.

3) Remove prefixing "0"s (zeros).


At 22 MAR 2004 02:20PM The Sprezzatura Group wrote:

Or this from DISK1 of REVMEDIA (available FOC from http://www.sprezzatura.com/downloads.htm). I'm sure if I wrote it again I'd structure it a bit differently but this was 15 years ago ;-)

SUBROUTINE SOUNDEX(TYPE,PASSED.VALUE,BRANCH,RETURNED.VALUE) * * Author AMcA * Purpose To provide a user defined conversion for use with symbolic dict * items. This routine will return a SOUNDEX code for a passed * word. It is suggested that this routine only be used with single * word fields like surname. Assuming that surname were in field * 1 of @RECORD and that this subroutine were catalogued as SOUNDEX * the symbolic dict item would have SOUNDEX in the oconv field * and @ANS=ICONV(@RECORD<1>,"SOUNDEX") in the formula. This * field could be BTREEd for maxiumn efficiency. * * Notes To produce a Soundex code for a passed word according to the * Oxford Name Compression Algorithm (ONCA). The steps in this * Soundexing algorithm are a modified form of those suggested by * the Unit of Clinical Epidemiology at Oxford. The preprocessing * stage is based on a modified form of the NYSIIS (New York State * Information and Intelligence System) Code used for record * linkage studies by Statistics Canada. * * A soundex routine when passed a word returns a code for that * word. This code then can be compared to the code produced by * another word to see if they are identical. If they are, the * words "sound alike". This has obvious applications with large * mailing databases and the like. * * There are several variants of the soundexing algorithm, some * more efficient than others. The "Standard" algorithm, the * "Russell Soundex Code" misses common variations such as THOMSON * and THOMPSON. The Soundex presented here is very accurate eg * * Name Code * SMITH S530 * SMITS S530 * SCHMITZ S530 * SCHMIDT S530 * * Obviously, fine tuning of the preprocessing module can be used * to take account of foreign pronunciations et al * * Copyright 1989 Sprezzatura Ltd. Permission is given for * commercial and non-commercial use of this subroutine by * legitimate Revmedia subscribers. No warranty express or * implied, no responsibility accepted.

    *     * Set up equivalence groups for later use     *     EQU GROUP$ TO "111122222222334556"     EQU POSS.CHAR$ TO "BFPVCGJKQSXZDTLMNR"     EQU FALSE$ TO 0     EQU TRUE$ TO 1     CODE=""

    BEGIN CASE         CASE TYPE="ICONV"         WORD=PASSED.VALUE         GOSUB INITIALISE ; * Set up swap arrays and such         GOSUB PREPROCESS ; * Preprocess word to swap "Sounds-like" strings for         * other more common ones         GOSUB SOUNDEX ; * Main Soundexing filter         RETURNED.VALUE=WORD     CASE 1         *         * Should be OCONV, as soundexing is one way do not attempt to restore         * to original value just display Soundex code         *         RETURNED.VALUE=PASSED.VALUE     END CASE RETURN

INITIALISE:     WORD=TRIM(WORD)     CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD     *     * Set up swap array for later processing     *     SECONDARY.COUNTER=11     DIM SS(SECONDARY.COUNTER)     *     * First field is value to swap, second is value to swap it with     *     SS(1)="EV" : @FM : "AF"     SS(2)="MP" : @FM : "N"     SS(3)="KN" : @FM : "N"     SS(4)="LM" : @FM : "N"     SS(5)="SH" : @FM : "S"     SS(6)="EIGH" : @FM : "A"     SS(7)="PH" : @FM : "F"     SS(8)="SCH" : @FM : "S"     SS(9)="GHT" : @FM : "T"     SS(10)="ATCH" : @FM : "ACH"     SS(11)="OARE" : @FM : "AWE"     GOSUB SET.UP.VARS

RETURN

PREPROCESS:     *     GOSUB BEGINNING ; * Firstly transform character strings at beginning of name     GOSUB ENDING ; * Now strings at end of name     GOSUB REMAINING ; * Now transform all remaining letters starting with 2nd     *     * Now remove last letter if a or s     *     IF INDEX("AS",WORD-1,1,1) THEN WORD-1,1=""     *

RETURN

SET.UP.VARS:     *     FIRST.3=WORD1,3     FIRST.2=WORD1,2     FIRST.1=WORD1,1     LAST.2=WORD-2,2 RETURN

SOUNDEX:     FIRST.LETTER=WORD1,1     WORD=WORD2,999     CONVERT "AWHY " TO " " IN WORD ; * Leave spaces to indicate that a vowel     * used to be here. Used later when the     * system deletes multiple adjacent     * equivalence class values when original     * consonants not separated by vowel     * EG dad –> 33 but dd –> 3     *     * Remove adjacent identical letters     *     WORD.LEN=LEN(WORD)     NEW.WORD=""     LAST.LETTER=""     FOR X=1 TO WORD.LEN         IF WORD1,1 # LAST.LETTER THEN             LAST.LETTER=WORD1,1             NEW.WORD := LAST.LETTER         END         WORD1,1=""     NEXT     WORD=NEW.WORD     *     * Now do equivalence group transformation     *     CONVERT POSS.CHAR$ TO GROUP$ IN WORD     *     * Now delete multiple adjacent equivalence groups as described above     *     WORD.LEN=LEN(WORD)     NEW.WORD=""     LAST.LETTER=""     FOR X=1 TO WORD.LEN     UNTIL LEN(NEW.WORD)=3         IF WORD1,1 # LAST.LETTER THEN             LAST.LETTER=WORD1,1             IF LAST.LETTER # " " THEN NEW.WORD := LAST.LETTER         END         WORD1,1=""     NEXT     WORD=FIRST.LETTER : NEW.WORD "L(0)#3" ; * ensure that code of 4 chars RETURN

BEGINNING:     BEGIN CASE         CASE FIRST.3="MAC" OR FIRST.3="MAK"             WORD1,3="MC"         CASE FIRST.3="SCH"             WORD1,3="S"         CASE FIRST.2="WR"             WORD1,2="R"         CASE FIRST.2="KN"             WORD1,2="N"         CASE FIRST.2="RH"             WORD1,2="R"         CASE FIRST.2="PH"             WORD1,2="F"         CASE FIRST.2="DG"             WORD1,2="G"         CASE FIRST.2="XH"             WORD1,2="K" ; * Xhosa etc         CASE FIRST.1="K"             WORD1,1="C"         CASE FIRST.1="X"             WORD1,1="Z"     END CASE RETURN

ENDING:     BEGIN CASE         CASE LAST.2="EE" OR LAST.2="IE" OR LAST.2="YE"             WORD-2,2="Y"         CASE LAST.2="DT" OR LAST.2="RT" OR LAST.2="RD"             WORD-2,2="D"         CASE LAST.2="NT" OR LAST.2="ND"             WORD-2,2="N"         CASE LAST.2="IX"             WORD-2,3="ICK"         CASE LAST.2="EX"             WORD-2,3="ECK"     END CASE     IF INDEX("SZ",WORD-1,1,1) THEN WORD-1,1="" RETURN

REMAINING:     FIRST.LETTER=WORD1,1     WORD=WORD2,99999     GOSUB SWAP.SWAPS     CONVERT "QMZK" TO "GNSC" IN WORD     CONVERT "1234567890" TO "" IN WORD     CONVERT "AEIOU" TO "AAAAA" IN WORD     *     * Now remove all occurences of h in *h* where h is not a vowel     *     SWAP "HA" WITH "!" IN WORD     SWAP "AH" WITH "#" IN WORD     REST=WORD1,LEN(WORD)-1     CONVERT "H" TO "" IN REST     WORD=REST : WORD-1,1     SWAP "!" WITH "HA" IN WORD     SWAP "#" WITH "AH" IN WORD     WORD=FIRST.LETTER : WORD RETURN

SWAP.SWAPS:     EXIT.SET=FALSE$     LOOP     UNTIL EXIT.SET         *         * Swap all potentially confusing strings with easier equivalents         *         OLD.WORD=WORD         FOR X=1 TO SECONDARY.COUNTER             IF INDEX(WORD,SS(X)<1>,1) THEN                 SWAP SS(X)<1> WITH SS(X)<2> IN WORD             END         NEXT         IF WORD=OLD.WORD THEN EXIT.SET=TRUE$     REPEAT RETURN  

The Sprezzatura Group

World Leaders in all Things RevSoft

 


At 24 MAR 2004 11:43PM ps wing wrote:

I found the SOUNDEX on the util disk, but wrote my own in the end:

FUNCTION SOUNDSLIKE(TEXT1,TEXT2)

EQUATE SOUNDCODES TO "0123012_02245501262301_202" ;* ignores W & H

TEXT=TEXT1

GOSUB SOUNDEX

CODE1=SOUNDEXCODE

IF ASSIGNED(TEXT2) THEN

TEXT=TEXT2

GOSUB SOUNDEX

CODE2=SOUNDEXCODE

RETURNDATA=(CODE1=CODE2)

END ELSE RETURNDATA=CODE1

RETURN RETURNDATA

SOUNDEX:

WORD=TRIM(TEXT)1," "

CONVERT @LOWER.CASE TO @UPPER.CASE IN WORD

TEMP=WORD

CONVERT @UPPER.CASE TO "" IN TEMP

CONVERT TEMP TO "" IN WORD ;* remove the crap

* Extended logic follows - may add more…

IF WORD1,2=PS" THEN WORD1,2=S"

IF WORD1,2=PF" THEN WORD1,2=F"

SWAP "DG" WITH "G" IN WORD

SWAP "GH" WITH "H" IN WORD

SWAP "KN" WITH "N" IN WORD

SWAP "GN" WITH "N" IN WORD

SWAP "MB" WITH "M" IN WORD

SWAP "PH" WITH "F" IN WORD

SWAP "TCH" WITH "CH" IN WORD

SWAP "MPS" WITH "MS" IN WORD

SWAP "MPT" WITH "MT" IN WORD

SWAP "MPZ" WITH "MZ" IN WORD

SOUNDEX=WORD

CONVERT @UPPER.CASE TO SOUNDCODES IN SOUNDEX

LASTCODE=SOUNDEX1,1

SOUNDEXCODE=WORD1,1

FOR POS=2 TO LEN(SOUNDEX) WHILE LEN(SOUNDEXCODE) LT 4

CODE=SOUNDEXPOS,1

IF CODE NE "_" THEN

IF CODE NE "0" AND CODE NE LASTCODE THEN SOUNDEXCODE:=CODE
LASTCODE=CODE

END

NEXT POS

SOUNDEXCODE=SOUNDEXCODE "L(0)#4"

RETURN


At 04 NOV 2009 11:10AM Dale Jessop wrote:

SUBROUTINE tools_soundex(ARG,S.RES)

!——————————————————————–

! A Simple SOUNDEX Algorithm

!——————————————————————–

!

! Format: SOUNDEX(string,result)

!

! Purpose: To Convert a string into is SOUNDEX equivalent.

! An Alphabetic text string is converted into a phonetic

! equivalent by using the published "SOUNDEX" algorithm.

!

! Remarks: A soundex equivalent is a four character code in the

! form of 'Annn' where:

!

! A=the first letter of the input string.

! n=a number from the following table:

!

! 1 ) THEN

   S.TBL=BFPV" ; S.TBL=CGJKQSXZ"
   S.TBL=DT" ; S.TBL=L"
   S.TBL=MN" ; S.TBL=R"
   S.RES=ARG1,1
   FOR S.PNTR=2 TO LEN(ARG)
     FOR S.LVL=1 TO 6
       IF INDEX(S.TBL,ARGS.PNTR,1,1) ] 0 AND ARGS.PNTR-1,1 # ARGS.PNTR,1 AND LEN(S.RES) =S.RES:S.LVL
       END
     NEXT S.LVL
   NEXT S.PNTR
  • Pad With 0's if ID ) LT 4 THEN
     S.PAD=4-LEN(S.RES)
     S.RES=S.RES:STR("0",S.PAD)
   END
 END ELSE
   S.RES=ARG
 END

NEXT X

RETURN


At 09 NOV 2009 08:14AM Bob Carten wrote:

View this thread on the forum...