admin管理员组文章数量:1431918
I am having difficulty creating a regular expression that is looking for a string similar to F:\work\object\src
. I've created the following demonstration of things I've tried. Please note, that the $match is coming in from a database field, so that's why its defined as a string that is then made a RE by the qr operator.
#!/opt/perl/bin/perl
use Try::Tiny;
use Data::Dumper::Concise;
my $str = 'F:\work\object\src';
my @matches = (
"\b F\\:\\work\\object\\src \b",
'\b F\:\work\object\src \b',
q{\b F\\:\\work\\object\\src \b},
q{\b\QF:\work\object\src\E \b},
q{\b F\\:\\work\\\\object\\src \b},
qq{\b F\\:\\work\\\\object\\src \b},
);
my $i = 0;
foreach my $match (@matches) {
print "attempt ".$i++."\n";
try {
my $re = qr{($match)}xims;
print "Built successfully.\n";
if ($str =~ /$re/) {
print "Match\n";
}
else {
print "But did not match!\n";
print Dumper($re);
}
}
catch {
print "$match failed to build re\n";
print "$_\n";
};
}
The output of this test program is as follows:
attempt 0
F\:\work\object\src failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/ F\:\work\o <-- HERE bject\sr)/ at ./reparse.pl line 20.
attempt 1
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.
attempt 2
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.
attempt 3
\b\QF:\work\object\src\E \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b\QF:\work\o <-- HERE bject\src\E \b)/ at ./reparse.pl line 20.
attempt 4
Built successfully.
But did not match!
qr/(\b F\:\work\\object\src \b)/msix
attempt 5
Built successfully.
But did not match!
qr/ F\:\work\\object\src)/msi
Attempts 4 and 5 seem to escape the \o but fail in matching the string. Would appreciate help crafting the string that will work.
I am having difficulty creating a regular expression that is looking for a string similar to F:\work\object\src
. I've created the following demonstration of things I've tried. Please note, that the $match is coming in from a database field, so that's why its defined as a string that is then made a RE by the qr operator.
#!/opt/perl/bin/perl
use Try::Tiny;
use Data::Dumper::Concise;
my $str = 'F:\work\object\src';
my @matches = (
"\b F\\:\\work\\object\\src \b",
'\b F\:\work\object\src \b',
q{\b F\\:\\work\\object\\src \b},
q{\b\QF:\work\object\src\E \b},
q{\b F\\:\\work\\\\object\\src \b},
qq{\b F\\:\\work\\\\object\\src \b},
);
my $i = 0;
foreach my $match (@matches) {
print "attempt ".$i++."\n";
try {
my $re = qr{($match)}xims;
print "Built successfully.\n";
if ($str =~ /$re/) {
print "Match\n";
}
else {
print "But did not match!\n";
print Dumper($re);
}
}
catch {
print "$match failed to build re\n";
print "$_\n";
};
}
The output of this test program is as follows:
attempt 0
F\:\work\object\src failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/ F\:\work\o <-- HERE bject\sr)/ at ./reparse.pl line 20.
attempt 1
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.
attempt 2
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.
attempt 3
\b\QF:\work\object\src\E \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b\QF:\work\o <-- HERE bject\src\E \b)/ at ./reparse.pl line 20.
attempt 4
Built successfully.
But did not match!
qr/(\b F\:\work\\object\src \b)/msix
attempt 5
Built successfully.
But did not match!
qr/ F\:\work\\object\src)/msi
Attempts 4 and 5 seem to escape the \o but fail in matching the string. Would appreciate help crafting the string that will work.
Share Improve this question asked Nov 18, 2024 at 20:58 ToddTodd 7467 silver badges19 bronze badges 1 |2 Answers
Reset to default 5If those matching strings aren't set in stone, I'd change it like this --
Define your matching string(s) as single-quoted and without those escapes, like
q(F:\work\object\src)
and build patterns using
qr{\Q$match\E}
If this is for some reason unfeasible the \Q..\E
(quotemeta) can be used directly in the regex. Add word boundaries or whatever else needed in your regex of course.†
A one-liner example (in Linux)
perl -wE'$s=q(F:\o); $m=q(F:\o); $p=qr{\Q$m\E}; say $1 if $s =~ m{\b($p)\b}'
Prints F:\o
.
And, just so, I'd also match them using a character other than /
as a delimiter, like =~ m{$pattern}
so that it's "portable" to paths that use /
. For your string this isn't needed.
If the matching string(s) need be in a database I'd consider it even more important to have the exact paths there, without escapes or any such. Then protect your patterns.
† Patterns with metacharacters, like \b
, can't be used under the \Q..\E
sequence as they get escaped and thus denied as a pattern. (quotemeta escapes all ASCII characters that aren't a word character, [a-zA-Z0-9_]
.)
Either string up a pattern with it for later use, like
my $patt = '\b' . qr{\Q$match\E} . '\b'
what can then go under qr
if needed ($patt = qr{$patt}
), or add it directly in the regex
... =~ m{\b$match\b}
[ Edit: Along with adding this footnote I added that word boundary to the one-liner example above ]
As far as I can see there are a couple syntax problems. First you are going back and forth between double and single quoted strings, then you put those strings inside a qr
string which is causing weird behavior.
If you are using regular expressions that are coming from a database, basically @matches
, you should use qr()
instead of both '' or q()
and "" or qq()
. Going back and forth between the three is gonna cause weird difficult to predict behavior. If you are using regular expressions contained in a string, I would highly suggest using qr()
.
My suggestion would be to build @matches
something like this. I go into greater detail in the complete code at the end.
my @matches = (
qr"\b F\\:\\work\\object\\src \b",
qr"\b F\:\work\object\src \b",
qr"\b F\\:\\work\\object\\src \b",
qr"\b\QF:\work\object\src\E \b",
qr"\b F\\:\\work\\\\object\\src \b",
qr"\b F\\:\\work\\\\object\\src \b",
qr"F:\\work\\object\\src"
);
In a double quoted string, a backslash is a command that will escape the next character. In a single quoted string, a backslash is treated like a regular character. It is pretty good practice to use double quotes and escape the characters correctly. The characters that must be escaped are listed in perldoc -f quotemeta
as \ | ( ) [ { ^ $ * + ? .
, and therefore the backslashes in the directory path must be escaped, but the colon is not mandatory. So that is your first problem you will have to pick a style and be consistent with it. Going back and forth between multiple styles will cause a lot of mistakes and confusion. Check perldoc for the exact single quote, double quote, and qr behavior
$ perldoc -f q
"q/*STRING*/"
A single-quoted, literal string. A backslash represents a
backslash unless followed by the delimiter or another backslash,
in which case the delimiter or backslash is interpolated
$ perldoc -f qq
"qq/*STRING*/"
A double-quoted, interpolated string.
$perldoc -f qr
"qr/*STRING*/msixpodualn"
This operator quotes (and possibly compiles) its *STRING* as a
regular expression. *STRING* is interpolated the same way as
*PATTERN* in "m/*PATTERN*/". If "'" is used as the delimiter, no
variable interpolation is done.
Perlmonks has a nice article on the subject as well. https://www.perlmonks./?node_id=401006
Ok the next problem is the syntax of the regular expressions themselves. There are many problems with most of them. A lot of them fall under double or single quoting problems. Also none of them will match properly because they are all looking for a preceding word boundary.
perldoc perlre
describes a word boundary as
\b Match a \w\W or \W\w boundary
And there are no word boundaries in $str
. If you want to match $str
, the correct matching regular expression is as follows. I go into greater detail in the complete code at the end.
$willMatch = qr"F:\\work\\object\\src";
There should be no word boundaries or \b
because there are no word boundaries in $str
. The regex engine will look for a \b
, not find one, and return false
every time. You will never get a match that way. I tried to figure out what was wrong with each of the regular expressions in @matches
and this is what I could come up with
"\b F\\:\\work\\object\\src \b"
RE1 whiffs because it is improperly escaped, a \
will escape the next character, so if you use qr this matches 'words F\:\work\object\src words'
. Because of the incorrectly placed slash in front of the colon, and missing leading and trailing words and spaces, this RE will never match the string
'\b F\:\work\object\src \b'
RE2 will not build. Improperly escaped. The directory paths are taken as character classes \w
\o
and \s
, a word character, an octal, and a space character respectively. An octal must be followed by something like {000}
and that is why this one will not build. The regex engine is looking for octals. From perldoc perlre
\o{}, \000 character whose ordinal is the given octal number
q{\b F\\:\\work\\object\\src \b}
RE3 whiffs for the same reason as RE1, but if you use qr it WILL match the following string 'words F\:\work\object\src words'
q{\b\QF:\work\object\src\E \b}
RE4 will not build, similar as RE2, improperly excaped, also looking for octals because of the \o. It is single escaped instead of double escaped so that might be why the \Q \E is not working properly
q{\b F\\:\\work\\\\object\\src \b}
RE5 will miss because you are using single quotes instead of double, but you tried to escape the backslashes anyway. If you use qr this will match 'words F\:\work\\object\src words'
qq{\b F\\:\\work\\\\object\\src \b}
RE6 Misses because of backslash before the colon, and there are two backslashes between work and object. This will match the same string as RE5 if you use qr
"F:\\work\\object\\src"
This regular expression is my addition and will match what you are looking for
#!/usr/bin/perl -w
use Try::Tiny;
use Data::Dumper::Concise;
#Ok first it matters if the string is enclosed in single or double quotes.
#Single quotes ('' or q() strings) are not interpolated while double
#quotes ("" or qq() ) are. perldoc says the following about double and
#single quoted strings
## "q/*STRING*/"
## A single-quoted, literal string. A backslash represents a
## backslash unless followed by the delimiter or another backslash,
## in which case the delimiter or backslash is interpolated
##
##
## "qq/*STRING*/"
## A double-quoted, interpolated string.
##
##The characters that must be escaped in double quotes are listed in
##perldoc -f quotemeta as \ | ( ) [ { ^ $ * + ? .
##
##So the following two strings are equivalent, you have to be sure to use
##the correct convention depending on which set of quotes you use
my $str = 'F:\work\object\src'; #same as q(F:\work\object\src)
my $equivalentDoubleQuotedString = "F:\\work\\object\\src";# same as qq(F:\\work\\object\\src)
my $regexStringDirectlyFromDatabaseDoubleQuoted = $equivalentDoubleQuotedString;
my $regexStringDirectlyFromDatabaseSingleQuoted = $str;
my $quoteMetaDoubleQuotedString = quotemeta($equivalentDoubleQuotedString);
my $quoteMetaSingleQuotedString = quotemeta($str);
my @matches = (
qr"\b F\\:\\work\\object\\src \b", #will not match searching for (F\:\work) and looking for word boundaries
#qr"\b F\:\work\object\src \b", #will not build, improperly escaped, no word boundaries, looking for octals
qr"\b F\\:\\work\\object\\src \b", #will not match searching for (F\:\work)
#qr"\b\QF:\work\object\src\E \b", #will not build, improperly escaped, no word boundaries, looking for octals
qr"\b F\\:\\work\\\\object\\src \b", #will not work searching for (F\:\work) also (work\\object), also no word boundaries
qr"\b F\\:\\work\\\\object\\src \b", #same as above
qr"F:\\work\\object\\src", #should work
qr"\Q$regexStringDirectlyFromDatabaseDoubleQuoted\E", #will work as long as you put it inside \Q \E
qr"\Q$regexStringDirectlyFromDatabaseSingleQuoted\E", #works like double quoted just use \Q \E
qr"$quoteMetaDoubleQuotedString", #functional equivalent of above dbl quoted
qr"$quoteMetaSingleQuotedString" #functional equivalent of above single quoted
);
my $i = 1;
foreach my $match (@matches) {
print "Regular Expression ".$i++."\t";
try {
#Now they will all build successfully";
if ($str =~ /$match/xims) {
print "Matched\t\t" . Dumper($match);
}
else {
print "Did not match\t" . Dumper($match);
}
}
catch {
print "$match failed to build re\n";
print "$_\n";
};
}
output looks like this
$ perl missing.regex.2.pl
Regular Expression 1 Did not match qr/\b F\\:\\work\\object\\src \b/
Regular Expression 2 Did not match qr/\b F\\:\\work\\object\\src \b/
Regular Expression 3 Did not match qr/\b F\\:\\work\\\\object\\src \b/
Regular Expression 4 Did not match qr/\b F\\:\\work\\\\object\\src \b/
Regular Expression 5 Matched qr/F:\\work\\object\\src/
Regular Expression 6 Matched qr/F\:\\work\\object\\src/
Regular Expression 7 Matched qr/F\:\\work\\object\\src/
Regular Expression 8 Matched qr/F\:\\work\\object\\src/
Regular Expression 9 Matched qr/F\:\\work\\object\\src/
That explanation got a little long but that should be exactly what you are looking for
本文标签: Perl Regex fails to compile looking for objectStack Overflow
版权声明:本文标题:Perl Regex fails to compile looking for object - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745594640a2665401.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
:
which should not be there. There is something mysterious going on, which I cannot identify. I even got the regex engine complaining about "unrecognized escape \Q". – TLP Commented Nov 19, 2024 at 1:31