TADS Bug Database

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000088TADS 3Interpreterpublic2010-10-20 05:362010-10-20 16:42
Assigned ToMichael Roberts 
PlatformfrobOSlinuxOS Version
Summary0000088: Regular expression implementation problems
DescriptionWhen I have following string and regexp:

verbPhrase = 'vyhrabat/hrabes/vyhrabala (v cem) (cim)';
rexSearch('<lparen>(.*?)<space>*<alpha>+<rparen>', verbPhrase);

then rexGroup(1)[3] is "v", this is ok. Next consider following two:

rexSearch('<lparen>(.*?)(<space>*<alpha>+)<rparen>', verbPhrase);
rexSearch('<lparen>(.*?)<space>*(<alpha>+)<rparen>', verbPhrase);

then again rexGroup(1)[3] is "v", but rexGroup(2)[3] is not set. I'm expecting to be set to " cem" and "cem" respectively. Basicaly it is from TIAction::announceDefaultObject where I'm trying to extend searching not only for preposition(s), but also for the last word in parenthesis in one go.

This is a clean test, but when I was preparing this bugreport and testing behavior in place of whole library and whole game, second rexSearch sometimes worked returning " cem" as rexGroup(2)[3], but third rexSearch never worked leaving rexGroup(2) always unset, so reproducibility is little strange.

Different approach to the same task would be:

rexSearch('<lparen>(|(.*?)<space>)(<alpha>+)<rparen>', verbPhrase);

But this doesn't match at all, it apparently doesn't like empty alternative to the left of the vertical bar "(|". Running the same in mainstream regexp implementation such as sed on linux produces expected results:

bash> echo "(a xyz)" | sed -r 's/\((|(.*) )([[:alpha:]]+)\)/1="\1" 2="\2" 3="\3"/'
1="a " 2="a" 3="xyz"
Steps To Reproduce//verbPhrase = '(a xyz)';
verbPhrase = 'vyhrabat/hrabes/vyhrabala (v cem) (cim)';
//verbPhrase = 'vzit/beres/vzala (co) (z ceho)';

rexSearch('<lparen>(.*?)<space>*(<alpha>+)<rparen>', verbPhrase);
//rexSearch('<rparen>.*<lparen>(.*?)(<space>*<alpha>+)<rparen>', verbPhrase);
//rexSearch('<lparen>(|(.*)<space>)(<alpha>+)<rparen>.*<lparen>', verbPhrase);

if(rexGroup(1)) say('1="' + rexGroup(1)[3] + '" ');
else say('1 is unset ');
if(rexGroup(2)) say('2="' + rexGroup(2)[3] + '" ');
else say('2 is unset ');
if(rexGroup(3)) say('3="' + rexGroup(3)[3] + '" ');
else say('3 is unset ');
TagsNo tags attached.
Fixed In Version3.0.19
Attached Files

- Relationships

-  Notes
Michael Roberts (administrator)
2010-10-20 16:42
edited on: 2010-10-20 16:43

Confirmed, and fixed for the next update. This was actually two separate problems - the non match for the empty | alternative, and the capturing group returning the wrong string. The empty | is something I caught separately and fixed recently. The capture group problem was new to me, and I think this is a pretty minimal test case for it. The key is that there has to be backtracking on a branch that partially matches the group but fails after the group, so I think the test case is basically two closures before the group and a literal after it, which is precisely what we have here. Anyway, this is now fixed (and now part of the regex test suite :).

- Issue History
Date Modified Username Field Change
2010-10-20 05:36 tomasb New Issue
2010-10-20 16:42 Michael Roberts Fixed In Version => 3.0.19
2010-10-20 16:42 Michael Roberts Note Added: 0000163
2010-10-20 16:42 Michael Roberts Assigned To => Michael Roberts
2010-10-20 16:42 Michael Roberts Status new => resolved
2010-10-20 16:42 Michael Roberts Resolution open => fixed
2010-10-20 16:43 Michael Roberts Note Edited: 0000163

Copyright © 2000 - 2018 MantisBT Team
Powered by Mantis Bugtracker