Pacific Beach Drive

Mike's Drive.

Follow me on GitHub

Pattern Matching - Repetition Qualifiers

  • affects the count to match from the preceding character or group
  • Can have a variable or specific range

Repetition Qualifier Characters

  • match 0 or more occurrences ? match 0 or 1 occurrence of the previous entity
  • match 1 or more occurrences

Specific Repetition Qualifiers

{n}
# match exactly n occurrences
{n,}
# match at least n occurrences
{n,m}
# match between n and m occurrences
{,m}
# match between 0 and m occurrences

Repetition Example 1

  • \d* match 0 or more consecutive digits
Pattern String result
\d* (empty string) match
\d* 8 match
\d* 1234 match
\d* 12345 match

Repetition Example 2

  • \d? match 0 or one consecutive digit
Pattern String result
\d? (empty string) match
\d? 8 match
\d? 1234 match
\d? 12345 match

Repetition Example 3

  • \d+ match 1 or more consecutive digit
Pattern String result
\d+ (empty string) no match
\d+ 8 match
\d+ 1234 match
\d+ 12345 match

Repetition Example 4

  • \d{5} match any 5 consecutive digit
Pattern String result
\d{5} (empty string) no match
\d{5} 8 no match
\d{5} 1234 no match
\d{5} 12345 match

Repetition Example 5

  • \d{1,4} match 1 to 4 consecutive digit
Pattern String result
\d{1,4} (empty string) match
\d{1,4} 8 match
\d{1,4} 1234 match
\d{1,4} 12345 match

Greedy Qualifiers

  • Match as much text as possible (Greedy)
  • Include all repetition qualifiers (*,+,?)
  • may lead to unexpected results
  • avoid using . together with greedy qualifiers (it matches pretyt much everything)

Greedy Example

import re

s = '<tag>data</tag>'
pattern = '<.*>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>data</tag>

Handling Greedy Matching

  • use alternaties to repetition qualifiers (to avoid greedy )
  • to turn off greedy matching add the ? right after the qualifier (makes it non-greedy)
  • Use specific or custom character classes instead of the dot (.)

Non-Greedy Matching

import re

s = '<tag>data</tag>'
pattern = '<.*?>'
match = re.search(pattern,s)
print(match.group(0))
# Output <tag>