syntax
Assertions
Assertions
Assertions
An assertion is a test on the characters following
or preceding the current matching point that does not actually
consume any characters. The simple assertions coded as \b, \B, \A,
\Z, \z, ^ and $ are described in escape sequences.
More complicated assertions are coded as subpatterns. There are two
kinds: those that look ahead of the
current position in the subject string, and those that look behind it.
An assertion subpattern is matched in the normal
way, except that it does not cause the current matching position to
be changed. Lookahead assertions start
with (?= for positive assertions and (?! for negative assertions.
For example, \w+(?=;) matches a word followed by a
semicolon, but does not include the semicolon in the match, and
foo(?!bar) matches any occurrence of “foo” that is not
followed by “bar”. Note that the apparently similar pattern
(?!foo)bar does not find an occurrence of “bar” that is
preceded by something other than “foo”; it finds any occurrence of
“bar” whatsoever, because the assertion (?!foo) is always
TRUE
when the next three characters
are “bar”. A lookbehind assertion is needed to achieve this
effect.
Lookbehind assertions
start with (?<= for positive assertions and (?<! for negative
assertions. For example, (?<!foo)bar does find an
occurrence of “bar” that is not preceded by “foo”. The contents of
a lookbehind assertion are restricted such that all the strings it
matches must have a fixed length. However, if there are several
alternatives, they do not all have to have the same fixed length.
Thus (?<=bullock|donkey) is permitted, but
(?<!dogs?|cats?) causes an error at compile time.
Branches that match different length strings are permitted only at
the top level of a lookbehind assertion. This is an extension
compared with Perl 5.005, which requires all branches to match the
same length of string. An assertion such as
(?<=ab(c|de)) is not permitted, because its single
top-level branch can match two different lengths, but it is
acceptable if rewritten to use two top-level branches:
(?<=abc|abde) The implementation of lookbehind
assertions is, for each alternative, to temporarily move the
current position back by the fixed width and then try to match. If
there are insufficient characters before the current position, the
match is deemed to fail. Lookbehinds in conjunction with once-only
subpatterns can be particularly useful for matching at the ends of
strings; an example is given at the end of the section on once-only
subpatterns.
Several assertions (of any sort) may occur in
succession. For example, (?<=\d{3})(?<!999)foo
matches “foo” preceded by three digits that are not “999”. Notice
that each of the assertions is applied independently at the same
point in the subject string. First there is a check that the
previous three characters are all digits, then there is a check
that the same three characters are not “999”. This pattern does not
match “foo” preceded by six characters, the first of which are
digits and the last three of which are not “999”. For example, it
doesn’t match “123abcfoo”. A pattern to do that is
(?<=\d{3}…)(?<!999)foo
This time the first assertion looks at the
preceding six characters, checking that the first three are digits,
and then the second assertion checks that the preceding three
characters are not “999”.
Assertions can be nested in any combination. For
example, (?<=(?<!foo)bar)baz matches an occurrence
of “baz” that is preceded by “bar” which in turn is not preceded by
“foo”, while (?<=\d{3}…(?<!999))foo is another
pattern which matches “foo” preceded by three digits and any three
characters that are not “999”.
Assertion subpatterns are not capturing
subpatterns, and may not be repeated, because it makes no sense to
assert the same thing several times. If any kind of assertion
contains capturing subpatterns within it, these are counted for the
purposes of numbering the capturing subpatterns in the whole
pattern. However, substring capturing is carried out only for
positive assertions, because it does not make sense for negative
assertions.
Assertions count towards the maximum of 200
parenthesized subpatterns.