# Going the (Levenshtein) Distance in RPG Free

August 18, 2019

By  NickLitten

Over on the RPG Cafe website, a nice chap called Smit Dixit published a sample of his RPGLE code calculating the Levenshtein distance between two words.

## What is the Levenshtein Distance?

The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.Levenshtein distance may also be referred to as edit distance, although it may also denote a larger family of distance metrics. It is closely related to pairwise string alignments.

The CueLogic Blog

The exercise here, is to find a way to find the minimum number of changes that are needed to discover the difference between two words. You can change, delete or move letters.

Since I’m investigating a search technique to allow a flexible database search this code might ( just might ) come in handy… So, after a little spring clean with my RPG modernising brush it ended up looking like this:

### How do we call this *SRVPGM?

In RPG we could call it like this:

````   dcl-pr LEVENSHTN likeds(ds1retn);`
`     *n char(100) const; // p@Source`
`     *n char(100) const; // p@Target`
`    end-pr;`

`    dcl-ds ds1Retn inz;`
`     d1LevDist zoned(9);`
`    end-ds; ````

## SQLRPGLE *SRVPGM (Service Program) called LEVENSRV

``````**FREE

// Prdgram: LEVENSRV SQLRPGLE *SRVPGM
//          The Levenshtein algorithm calculates the least number of edit operations that
//          are necessary to modify one string to obtain another string.
//          This *SRVPGM does the calc and return a 9char alpha value of either:
//           * The value
//           * *BLANKS... if the inputs were (duh) blank
//           * *SAME..... if the inputs were (yep you guessed it) the same
//
// Compile      Compile as a module and bind in *SRVPGM(LEVENSRV)
//              1. CRTSQLRPGI ??OBJ()
//                 ?*SRCFILE(NICK/QRPGLESRC)
//                 ?*SRCMBR(LEVENSRV)
//                 ?*OBJTYPE(*MODULE)
//              2. CRTSRVPGM SRVPGM(NICK/LEVENSRV) MODULE(*SRVPGM) EXPORT(*ALL)
// Credit       Based on source from RPG Cafe website, a smart chap called Smit Dixit
// History      August 2019 - nick@nicklitten.com - Created

ctl-opt
debug
option(*nodebugio:*srcstmt)
datfmt(*iso-) timfmt(*iso.)
indent('| ') truncnbr(*yes) expropts(*resdecpos)
copyright('| LEVENSRV 2019.08.18 V000 Return the Levenshtein distance between two strings')
nomain;

dcl-proc rtn_levenshtein_distance export;
dcl-pi rtn_levenshtein_distance char(9);
p_source char(100);
p_target char(100);
end-pi;

dcl-ds ds1Retn inz;
d1LevDist zoned(9);
end-ds;

dcl-s w@Curdt7 packed(7) inz(0);
dcl-s w@IsoDate date(*iso);
dcl-s w@TodayUSA date(*usa);
dcl-s SqlStm char(1024) inz;
dcl-c tck const(x'7D');
dcl-s w@Index packed(4) inz;
dcl-c validchars const('ABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890');
dcl-ds matrix dim(100) qualified;
idx packed(4) dim(100);
end-ds;
dcl-s i packed(4) inz;
dcl-s j packed(4) inz;
dcl-s s_i char(1) inz;
dcl-s t_j char(1) inz;
dcl-s n packed(4) inz;
dcl-s m packed(4) inz;
dcl-s w@Source char(100) inz;
dcl-s w@Target char(100) inz;
dcl-s w@Cost packed(1) inz;
dcl-s w@Above1 packed(4) inz;
dcl-s w@Left1 packed(4) inz;
dcl-s w@DiagCost packed(4) inz;
dcl-s w@MinVal packed(4) inz;
dcl-s w@SplPos packed(4) inz;
dcl-s w@SplChar char(1) inz;

clear ds1Retn;

//Validate Parameters
if p_Source = *Blanks or p_Target = *Blanks;
return '*BLANK';
else;

w@Source = p_Source;
w@Target = p_Target;
n = %len(%trim(w@Source));
m = %len(%trim(w@Target));

if n = 0 or m = 0;
d1LevDist = 999999999;  //Indicates at least one string is blank
return '*BLANK';
endif;

if n = m and %trim(w@Source) = %trim(w@Target); //Strings are same
d1LevDist = n;
return '*SAME';
endif;

//Set both source and target strings to upper case
EXEC SQL SET :w@Source = UPPER(:w@Source);
EXEC SQL SET :w@target = UPPER(:w@Target);

//Remove special characters from both source and target strings
clear w@SplPos;
clear w@SplChar;
dou %check(validchars:%trim(w@Source)) = 0;
w@SplPos = %check(validchars:%trim(w@Source));
if w@SplPos > 0;
w@Source = %subst(w@Source:1:(w@SplPos-1)) + %subst(w@Source:(w@SplPos+1));
else;
leave;
endif;
enddo;

dou %check(validchars:%trim(w@Target)) = 0;
w@SplPos = %check(validchars:%trim(w@Target));
if w@SplPos > 0;
w@Target = %subst(w@Target:1:(w@SplPos-1)) + %subst(w@Target:(w@SplPos+1));
else;
leave;
endif;
enddo;

//Recalculate lengths of cleaned up strings
n = %len(%trim(w@Source));
m = %len(%trim(w@Target));

//Calculate Levenshtein Distance based on cleaned up strings
//Step 1 - Initialize 1st row for source string
for i = 1 by 1 to (n + 1);
matrix(1).idx(i) = i - 1;
endfor;

//Step 2 - Initialize 1st column for target string
for j = 1 by 1 to (m + 1);
matrix(j).idx(1) = j - 1;
endfor;

//Step 3 - Start calculating distances
for i = 2 by 1 to (n + 1);

s_i = %subst(%trim(w@Source):(i-1):1);

for j = 2 by 1 to (m + 1);

t_j = %subst(%trim(w@Target):(j-1):1);

if (s_i = t_j);
w@Cost = 0;
else;
w@Cost = 1;
endif;

clear w@Above1;
clear w@Left1;
clear w@DiagCost;
w@Above1    = matrix(j-1).idx(i) + 1;
w@Left1     = matrix(j).idx(i-1) + 1;
w@DiagCost  = matrix(j-1).idx(i-1) + w@Cost;
w@MinVal = w@Above1;

if (w@Left1 < w@MinVal);
w@MinVal = w@Left1;
endif;

if (w@DiagCost < w@MinVal);
w@MinVal = w@DiagCost;
endif;

matrix(j).idx(i) = w@MinVal;

endfor;

endfor;

d1LevDist = matrix(m + 1).idx(n + 1);
endif;

return %char(ds1Retn);

end-proc;                      ``````

We compile the Service Program:

Now lets add that service program to a binding directory so we can reference it in other programs. Obviously, if you already have a binding directory to use, skip this step:

`CRTBNDDIR BNDDIR(NICK/LEVENBIND) TEXT('This is the binding directory for LEVENSRV')`

Add the new *SRVPGM to your binding directory:

## Example CLLE calling the *SRVPGM LEVENSRV

I like IBM i Control Language as much as I like Marmite. And I love Marmite:

```PGM

DCL        VAR(&INPUT1) TYPE(CHAR) LEN(100)
DCL        VAR(&INPUT2) TYPE(CHAR) LEN(100)
DCL        VAR(&RTN) TYPE(*CHAR) LEN(9)

DCLPRCOPT  BNDDIR(LEVENBIND)

CHGVAR     VAR(&INPUT1) VALUE('GILY')
CHGVAR     VAR(&INPUT2) VALUE('GEELY')

CALLPRC    PRC(RTN_LEVENSHTEIN_DISTANCE) PARM((&INPUT1) +
(&INPUT2)) RTNVAL(&RTN)

SNDPGMMSG  MSGID(CPF9898) MSGF(QCPFMSG) +
MSGDTA('Levenshtein distance between' +
*BCAT &INPUT1 *TCAT ' and' *BCAT &INPUT2 +
*TCAT ' is' BCAT &RTN) TOPGMQ(PRV) +
MSGTYPE(*COMP)

CHGVAR     VAR(&INPUT1) VALUE('HONDA')
CHGVAR     VAR(&INPUT2) VALUE('HYUNDAI')

CALLPRC    PRC(RTN_LEVENSHTEIN_DISTANCE) PARM((&INPUT1) +
(&INPUT2)) RTNVAL(&RTN)

SNDPGMMSG  MSGID(CPF9898) MSGF(QCPFMSG) +
MSGDTA('Levenshtein distance between' +
*BCAT &INPUT1 *TCAT ' and' *BCAT &INPUT2 +
*TCAT ' is' BCAT &RTN) TOPGMQ(PRV) +
MSGTYPE(*COMP)
ENDPGM
```

Compile it.

Call It.

#feeltheCLgroove

And those are the exact results I was hoping for ðŸ™‚

Now lets look at a quick example of calling it from RPGLE:

## Example RPGLE calling the *SRVPGM LEVENSRV

A little piece of RPG code to call the *SRVPGM and see what values we get back:

```**FREE
// Program: LEVENTEST2
// History: August 2019 - nick@nicklitten.com - Created
ctl-opt
debug
option(nodebugio:srcstmt)
datfmt(iso-) timfmt(iso.)
bnddir('LEVENBIND')
indent('| ') truncnbr(yes) expropts(resdecpos)
copyright('| LEVENTEST2 2019.08.18 V000 Test example using the Levenshtein *SRVPGM');

dcl-s source char(100);
dcl-s target char(100);
dcl-s rtnval char(9);

// This is the procedure definition for our *SRVPGM. Obviously you should put this in a COPYBOOK but
// since its a test I'm being uber lazy, until I get fired up by this second cup of coffee
dcl-pr rtn_levenshtein_distance char(9);
*n char(100);
*n char(100);
end-pr;

// now call the rtn_levenshtein_distance procedure and see what it gives us:
source = 'GILY';
target = 'GEELY';
rtnval = rtn_levenshtein_distance(source:target);
dsply ('1. Value GILY/GEELY is ' + rtnval);

source = 'HONDA';
target = 'HYUNDAI';
rtnval = rtn_levenshtein_distance(source:target);
dsply ('2. Value HONDA/HYUNDAI is ' + rtnval);

*inlr = *on;                                                                                                                 ```

We can compile this and then just call it with the two words you want to test:

## Levenshtein Distance Test Results

Levenshtein distance between GILY and GEELY is 2

Levenshtein distance between HONDA and HYUNDAI is 3

### NickLitten

IBM i Software Developer, Digital Dad, AS400 Anarchist, RPG Modernizer, Shameless Trekkie, Belligerent Nerd, Englishman Abroad and Passionate Eater of Cheese and Biscuits. Nick Litten Dot Com is a mixture of blog posts that can be sometimes serious, frequently playful and probably down-right pointless all in the space of a day. Enjoy your stay, feel free to comment and remember: If at first you don't succeed then skydiving probably isn't a hobby you should look into.

Nick Litten

related posts:

## Swap Homes and Travel the World â€“ Join Home Exchange

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

### Get In Touch

Iâ€™m always looking for awesome input, feedback and critique!

## 20% Discount

I have partnered with SNUG CBD givING you Organic CBD
20% discount code "NL20"