• PCRE regex
  • Back references

  • Back references
  • Back references

    Back references

    Outside a character class, a backslash followed by
    a digit greater than 0 (and possibly further digits) is a back
    reference to a capturing subpattern earlier (i.e. to its left) in
    the pattern, provided there have been that many previous capturing
    left parentheses.

    However, if the decimal number following the
    backslash is less than 10, it is always taken as a back reference,
    and causes an error only if there are not that many capturing left
    parentheses in the entire pattern. In other words, the parentheses
    that are referenced need not be to the left of the reference for
    numbers less than 10. A “forward back reference” can make sense
    when a repetition is involved and the subpattern to the right has
    participated in an earlier iteration. See the section entitled
    “Backslash” above for further details of the handling of digits
    following a backslash.

    A back reference matches whatever actually matched
    the capturing subpattern in the current subject string, rather than
    anything matching the subpattern itself. So the pattern
    (sens|respons)e and \1ibility matches “sense and
    sensibility” and “response and responsibility”, but not “sense and
    responsibility”. If case-sensitive (caseful) matching is in force
    at the time of the back reference, then the case of letters is
    relevant. For example, ((?i)rah)\s+\1 matches “rah rah”
    and “RAH RAH”, but not “RAH rah”, even though the original
    capturing subpattern is matched case-insensitively

    There may be more than one back reference to the
    same subpattern. If a subpattern has not actually been used in a
    particular match, then any back references to it always fail. For
    example, the pattern (a|(bc))\2 always fails if it starts
    to match “a” rather than “bc”. Because there may be up to 99 back
    references, all digits following the backslash are taken as part of
    a potential back reference number. If the pattern continues with a
    digit character, then some delimiter must be used to terminate the
    back reference. If the PCRE_EXTENDED option is set, this can be whitespace.
    Otherwise an empty comment can be used.

    A back reference that occurs inside the parentheses
    to which it refers fails when the subpattern is first used, so, for
    example, (a\1) never matches. However, such references can be
    useful inside repeated subpatterns. For example, the pattern
    (a|b\1)+ matches any number of “a”s and also “aba”,
    “ababba” etc. At each iteration of the subpattern, the back
    reference matches the character string corresponding to the
    previous iteration. In order for this to work, the pattern must be
    such that the first iteration does not need to match the back
    reference. This can be done using alternation, as in the example
    above, or by a quantifier with a minimum of zero.

    As of PHP 5.2.2, the \g escape sequence
    can be used for absolute and relative referencing of subpatterns.
    This escape sequence must be followed by an unsigned number or a
    negative number, optionally enclosed in braces. The sequences
    \1, \g1 and \g{1} are synonymous with
    one another. The use of this pattern with an unsigned number can
    help remove the ambiguity inherent when using digits following a
    backslash. The sequence helps to distinguish back references from
    octal characters and also makes it easier to have a back reference
    followed by a literal number, e.g. \g{2}1.

    The use of the \g sequence with a negative
    number signifies a relative reference. For example,
    (foo)(bar)\g{-1} would match the sequence “foobarbar” and
    (foo)(bar)\g{-2} matches “foobarfoo”. This can be useful
    in long patterns as an alternative to keeping track of the number
    of subpatterns in order to reference a specific previous

    Back references to the named subpatterns can be
    achieved by (?P=name) or, since PHP 5.2.2, also by
    \k<name> or \k’name’. Additionally PHP
    5.2.4 added support for \k{name} and \g{name},
    and PHP 5.2.7 for \g<name> and