regexp-php-reference-php-assertions-0

  • PCRE regex
    syntax
  • Assertions

  • Assertions
  • Assertions

    Assertions

    An assertion is a test on the characters following
    or preceding the current matching point that does not actually
    consume any characters. The simple assertions coded as \b, \B, \A,
    \Z, \z, ^ and $ are described in escape sequences.
    More complicated assertions are coded as subpatterns. There are two
    kinds: those that look ahead of the
    current position in the subject string, and those that look behind it.

    An assertion subpattern is matched in the normal
    way, except that it does not cause the current matching position to
    be changed. Lookahead assertions start
    with (?= for positive assertions and (?! for negative assertions.
    For example, \w+(?=;) matches a word followed by a
    semicolon, but does not include the semicolon in the match, and
    foo(?!bar) matches any occurrence of “foo” that is not
    followed by “bar”. Note that the apparently similar pattern
    (?!foo)bar does not find an occurrence of “bar” that is
    preceded by something other than “foo”; it finds any occurrence of
    “bar” whatsoever, because the assertion (?!foo) is always
    TRUE when the next three characters
    are “bar”. A lookbehind assertion is needed to achieve this
    effect.

    Lookbehind assertions
    start with (?<= for positive assertions and (?<! for negative
    assertions. For example, (?<!foo)bar does find an
    occurrence of “bar” that is not preceded by “foo”. The contents of
    a lookbehind assertion are restricted such that all the strings it
    matches must have a fixed length. However, if there are several
    alternatives, they do not all have to have the same fixed length.
    Thus (?<=bullock|donkey) is permitted, but
    (?<!dogs?|cats?) causes an error at compile time.
    Branches that match different length strings are permitted only at
    the top level of a lookbehind assertion. This is an extension
    compared with Perl 5.005, which requires all branches to match the
    same length of string. An assertion such as
    (?<=ab(c|de)) is not permitted, because its single
    top-level branch can match two different lengths, but it is
    acceptable if rewritten to use two top-level branches:
    (?<=abc|abde) The implementation of lookbehind
    assertions is, for each alternative, to temporarily move the
    current position back by the fixed width and then try to match. If
    there are insufficient characters before the current position, the
    match is deemed to fail. Lookbehinds in conjunction with once-only
    subpatterns can be particularly useful for matching at the ends of
    strings; an example is given at the end of the section on once-only
    subpatterns.

    Several assertions (of any sort) may occur in
    succession. For example, (?<=\d{3})(?<!999)foo
    matches “foo” preceded by three digits that are not “999”. Notice
    that each of the assertions is applied independently at the same
    point in the subject string. First there is a check that the
    previous three characters are all digits, then there is a check
    that the same three characters are not “999”. This pattern does not
    match “foo” preceded by six characters, the first of which are
    digits and the last three of which are not “999”. For example, it
    doesn’t match “123abcfoo”. A pattern to do that is
    (?<=\d{3}…)(?<!999)foo

    This time the first assertion looks at the
    preceding six characters, checking that the first three are digits,
    and then the second assertion checks that the preceding three
    characters are not “999”.

    Assertions can be nested in any combination. For
    example, (?<=(?<!foo)bar)baz matches an occurrence
    of “baz” that is preceded by “bar” which in turn is not preceded by
    “foo”, while (?<=\d{3}…(?<!999))foo is another
    pattern which matches “foo” preceded by three digits and any three
    characters that are not “999”.

    Assertion subpatterns are not capturing
    subpatterns, and may not be repeated, because it makes no sense to
    assert the same thing several times. If any kind of assertion
    contains capturing subpatterns within it, these are counted for the
    purposes of numbering the capturing subpatterns in the whole
    pattern. However, substring capturing is carried out only for
    positive assertions, because it does not make sense for negative
    assertions.

    Assertions count towards the maximum of 200
    parenthesized subpatterns.