1. Introduction to the re
Module¶
The re
module allows you to work with regular expressions in Python. Regular expressions are sequences of characters that define a search pattern. They are used for string matching, searching, and manipulation.
2. Basic Pattern Matching¶
The simplest use of regular expressions is to match strings.
In [1]:
import re
pattern = "hello"
text = "hello world"
# Search for the pattern in the text
match = re.search(pattern, text)
if match:
print("Pattern found!")
else:
print("Pattern not found.")
Pattern found!
3. Using Special Characters¶
Regular expressions use special characters for pattern matching. Here are some common ones:
.
: Matches any character except newline.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches 0 or more repetitions.+
: Matches 1 or more repetitions.?
: Matches 0 or 1 repetition.\d
: Matches any digit (equivalent to [0-9]).\D
: Matches any non-digit.\w
: Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).\W
: Matches any non-alphanumeric character.\s
: Matches any whitespace character.\S
: Matches any non-whitespace character.
In [6]:
text = "There are 123.5 apples and 45 oranges."
pattern = r"\d+"
matches = re.findall(pattern, text)
print(matches)
['123', '5', '45']
In [7]:
pattern = r"."
matches = re.findall(pattern, text)
print(matches)
['T', 'h', 'e', 'r', 'e', ' ', 'a', 'r', 'e', ' ', '1', '2', '3', '.', '5', ' ', 'a', 'p', 'p', 'l', 'e', 's', ' ', 'a', 'n', 'd', ' ', '4', '5', ' ', 'o', 'r', 'a', 'n', 'g', 'e', 's', '.']
In [10]:
pattern = r".+"
matches = re.findall(pattern, text)
print(matches)
['There are 123.5 apples and 45 oranges.']
In [11]:
pattern = r"(\d{3})-(\d{2})-(\d{4})"
text = "My phone number is 123-45-6789."
match = re.search(pattern, text)
if match:
print("Full match:", match.group(0))
print("Area code:", match.group(1))
print("Exchange code:", match.group(2))
print("Subscriber number:", match.group(3))
Full match: 123-45-6789 Area code: 123 Exchange code: 45 Subscriber number: 6789
5. Substituting Text¶
You can use re.sub() to replace parts of the text that match a pattern.
Example:
In [12]:
pattern = r"\d+"
text = "There are 123 apples and 45 oranges."
# Replace all digits with '#'
result = re.sub(pattern, "#", text)
print(result) # Output: "There are # apples and # oranges."
There are # apples and # oranges.
6. Compiling Patterns¶
You can compile regular expressions into pattern objects for reuse, which can be more efficient if the pattern is used multiple times.
Example:
In [13]:
pattern = re.compile(r"\d+")
text = "There are 123 apples and 45 oranges."
matches = pattern.findall(text)
print(matches) # Output: ['123', '45']
['123', '45']
7. Flags and Options¶
The re module provides various flags to modify the behavior of regular expressions.
Common flags:
re.IGNORECASE
orre.I
: Ignore case.re.MULTILINE
orre.M
: Multi-line mode.re.DOTALL
orre.S
: Dot matches all characters, including newlines.
In [14]:
pattern = r"hello"
text = "Hello World"
# Search with case-insensitive flag
match = re.search(pattern, text, re.IGNORECASE)
if match:
print("Pattern found!")
Pattern found!
In [ ]: