'=========================================================================== ' Subject: SOUNDEX VERSION 2 Date: 04-11-00 (16:35) ' Author: Don Schullian Code: PBCC ' Origin: d83@DASoftVSS.com Packet: PBCC.ABC '=========================================================================== #IF 0 ---------------------------- PowerBASIC/cc v2.0 ---| DASoft |------------------------------------------ ---------------------------- Code DATE: 2000-04-10 | FILE NAME SoundEX2.bas | by ---------------------------- Don Schullian, Jr. This code is released into the Public Domain ---------------------------------------------------------- No guarantee as to the viability, accuracy, or safety of use of this code is implied, warranted, or guaranteed. ---------------------------------------------------------- Use at your own risk! ---------------------------------------------------------- CONTACT AUTHOR AT d83@DASoftVSS.com ------------------------------------------------------------------------- Hi, There are 3 functions in this file along with a bit of test code. In any one program you not use all of them as fSoundEXuscb is a stand alone item and fSoundEX2str would not, particularly be required unless you wished to display the results to the user. Here's what I've learned (and think I know) about SoundEX: 1) only letters from A->Z are accepted a) I saw one augmentation that accepted the Spanish 'n' with the tone mark (as the third letter in Nina would be) that held the same value as the english "N" 2) the official version created for and used by the US Census Bureau consists of 1 letter A-Z and 3 numbers 0->6 where the zero would only be used to pad a string to 4 characters. eg: Schullian creates a soundex of "S45" but as 4 characters are required a trailing zero is added to create "S450" 3) The 4 character SoundEX can be extended up to 10 positions by packing the data into 5 bits for the leading letter and 3 bits for the following 9 numbers. This system follows exactly the same rules as the 'official' version but packs the values into bits and works on more characters. The values returned by this system use all 32 bits and would require a DWORD to keep everything in strict alphabetical order but: a) if the user never sees the code and your search/sort routines will work with negative values then a LONG INTEGER will do just fine b) you could request a 9 position SoundEX, store it in a LONG and not have to worry about negative values and/or force the use of DWORDs. (LONGs will process much faster than DWORDs) Ok, down below are all three functions for your review. Which you use and how you use any of them would depend, of course, on what your program requires. fSoundEX&& allows you to select the number of bits your SoundEX code will require. The rule is ((Characters * 3) + 2)bits so you can store 4 characters in a 2byte INTEGER [ 4+3 = 12 + 2 = 14] and never exceed 32k! By adding just one more character you'll surpass the 16bit limit of a WORD and be required to either jump into a 3byte string (yuk!) or move into a LONG. C'ya, d83 -------------------------------------------------------------------------------- fSoundEX&& (TheWord$,Letters&) PURPOSE: Create a SoundEX representation of a given STRING PARMAS: TheWord$ - the STRING to be processed Letters& - The maximum number of soundex letters to process 10 (max) will fill a 32bit integer 4 (min) will fill an 16bit integer RETURNS: ( (Letters * 3) + 2 ) significant bits NOTE: IF Letters& =< 4 then the value will never exceed 32k IF Letters& = 10 then a DWORD is required to keep all values positive fSoundEX2str$ (SndEX&&,Letters&) PURPOSE: Convert a SoundEX value into a humanized string PARAMS: SndEX&& - The soundex value Letters& - The number of letters/positions in the returning string RETURNS: A string where the first character is a letter A -> Z followed by Letters-1 numbers from 0 -> 6 EG: "S450" or "W363" NOTE: If the SndEX value was not created using the same number of Letters then the returning value may be incorrect or could even cause an error. fSoundEXuscb$ (TheName$) PURPOSE: To create a 4 character SoundEX string that is used by the US Census Beurough PARAMS: The name to be converted/computed RETURNS: A 4 character SoundEX string that has a leading letter from A -> Z followed by 3 numbers 0 -> 6 eg: S450 W363 ------------------------------------------------------------------------- #ENDIF FUNCTION fSoundEX ( BYVAL TheWord AS STRING, _ BYVAL Letters AS LONG ) EXPORT AS DWORD REGISTER SndX AS LONG ' REGISTER Char AS LONG ' character value ' DIM Last AS LOCAL LONG ' last character processed DIM Vals AS LOCAL STRING * 27 ' SndX values DIM V_ptr AS LOCAL BYTE PTR ' pointer to values DIM W_ptr AS LOCAL BYTE PTR ' pointer to incoming data DIM Widx AS LOCAL LONG ' index to incoming data pointer ' Vals = CHR$(0,0,1,2,3,0,1,2,0,0,2,2,4,5) & _ ' SndX values from _A -> M CHR$(5,0,1,2,6,2,3,0,1,0,2,0,2) ' N -> Z V_ptr = VARPTR(Vals) ' pointer to above W_ptr = STRPTR(TheWord) ' set ptr to incoming string Letters = MIN(9,Letters-1) ' 1 letter & n numbers ' FOR Widx = 0 TO LEN(TheWord)-1 ' once for each letter Char = @W_ptr[Widx] ' draw off the letter IF Char < 65 THEN ITERATE ' can't be a valid letter Char = ( Char AND &b11111 ) ' ucase & subtract 64 IF Char > 26 THEN ITERATE ' oops! this ain't a letter either IF SndX <> 0 THEN ' Char = @V_ptr[Char] ' IF Char = Last THEN ITERATE ' same as last char Last = Char ' IF Char = 0 THEN ITERATE ' SndX = (SndX OR Char) ' place value into buffer IF Letters = 0 THEN EXIT FOR ' check if buffer is full ELSE ' SndX = Char ' A = 1 Z = 26 Last = @V_ptr[Char] ' END IF ' SHIFT LEFT SndX, 3 ' DECR Letters ' decr buffer counter NEXT ' ' IF Letters > 0 THEN ' add trailing zeros SHIFT LEFT SndX, ( Letters * 3 ) ' END IF ' ' FUNCTION = SndX 'RETURN SOUNDEX VALUE END FUNCTION ' '------------------------------------------------------------------------------- ' FUNCTION fSoundEX2str( BYVAL SndEx AS DWORD, _ BYVAL Letters AS LONG ) AS STRING DIM Temp AS LOCAL STRING * 10 ' DIM T_ptr AS LOCAL BYTE PTR ' ' Temp = STRING$(10,32) ' pad working string T_ptr = VARPTR(Temp) ' set pointer to above ' DO ' start processing SndEX DECR Letters ' back up one posn for offset IF Letters > 0 THEN ' if not 1st letter @T_ptr[Letters] = ( SndEx AND &b111 ) + 48 ' set 0 -> 6 SHIFT RIGHT SndEx, 3 ' shift off last 3 bits ELSE ' 1st letter @T_ptr = ( SndEx + 64 ) ' set A -> Z EXIT LOOP ' END IF ' LOOP ' ' FUNCTION = RTRIM$(Temp) ' RETURN only used portion of $ END FUNCTION ' '------------------------------------------------------------------------------- ' FUNCTION fSoundEXuscb ( BYVAL TheWord AS STRING ) EXPORT AS STRING DIM Char AS LOCAL LONG ' character values DIM Last AS LOCAL LONG ' last char processed DIM Letters AS LOCAL LONG ' letter counter DIM Temp AS LOCAL STRING * 4 ' working string DIM T_ptr AS LOCAL BYTE PTR ' pointer to above DIM Vals AS LOCAL STRING * 27 ' SoundEX values DIM V_ptr AS LOCAL BYTE PTR ' pointer to values DIM W_ptr AS LOCAL BYTE PTR ' pointer to incoming data DIM Widx AS LOCAL LONG ' index to incoming data pointer ' ' ABCDEFGHIJKLMNOPQRSTUVWXYZ" ' corresponding letters Vals = "001230120022455012623010202" ' alignment byte + A -> Z V_ptr = VARPTR(Vals) ' pointer to above W_ptr = STRPTR(TheWord) ' set ptr to incoming string Temp = "0000" ' T_ptr = VARPTR(Temp) ' ' FOR Widx = 0 TO LEN(TheWord)-1 ' once for each letter Char = @W_ptr[Widx] ' draw off the letter IF Char < 65 THEN ITERATE ' can't be a valid letter Char = ( Char AND &b11111 ) ' ucase & subtract 64 IF Char > 90 THEN ITERATE ' oops! this ain't a letter either IF Letters > 0 THEN ' Char = @V_ptr[Char] ' IF Char = Last THEN ITERATE ' same as last char Last = Char ' IF Char = 48 THEN ITERATE ' @T_ptr[Letters] = Char ' place value into buffer IF Letters = 3 THEN EXIT FOR ' check if buffer is full ELSE ' @T_ptr[0] = Char + 64 ' A = 1 Z = 26 Last = @V_ptr[Char] ' END IF ' INCR Letters ' decr buffer counter NEXT ' ' FUNCTION = Temp 'RETURN SOUNDEX VALUE END FUNCTION ' '------------------------------------------------------------------------------- ' FUNCTION PBmain () DIM TheName AS LOCAL STRING DIM SoundEX AS LOCAL STRING * 4 DIM SndEX AS LOCAL DWORD SndEX = fSoundEX("schullian",4) PRINT FORMAT$(SndEX,",#") PRINT BIN$(SndEX,32) PRINT fSoundEXstr(SndEX,4) PRINT fSoundEXuscb("WOODYARD") PRINT fSoundEXuscb("schullian") WAITKEY$ END FUNCTION