Python Regular Expressions - Cheat Sheet

Valeria Aynbinder
Coding
16 Oct, 2024

Many code examples + useful tips. Bonus added to the end of the post.

Regular expressions is an extremely useful tool, and like any developer, I use it a lot when working with texts. Since I always forget the syntax related to regular expressions, I thought creating a simple cheat sheet might help me and maybe others as well :)

Python’s re most useful methods and differences between them

Reminder: re module implements Regular Expressions functionality in Python.

import re statement, a fundamental step in Python coding for importing the regular expressions module for text processing and pattern matching.

re.match matches a regex pattern to the beginning of a string (returned: re.Match)

Python code snippet demonstrating regular expression matching using the re.match() function, with examples of matching specific characters and patterns within a string.

re.fullmatch matches a regex pattern to the whole string (returned: re.Match)

Python code snippet demonstrating regular expression full matching using the re.fullmatch() function, with examples of matching entire strings or patterns within a string.

re.search searches a string for the presence of a regex pattern (returned: re.Match )

Python code snippet demonstrating regular expression searching using the re.search() function, with examples of finding specific characters and patterns within a string, a common task in machine learning applications.

re.sub substitutes occurrences of a regex pattern found in a string, searches for the pattern like in re.search (returned: string)

Python code snippet demonstrating regular expression substitution using the re.sub() function, with examples of replacing specific characters and patterns within a string, a common task in natural language processing (NLP) applications.

re.findall searches for all occurrences of a regex pattern in a string, searches for the pattern like in re.search (returned: list of strings)

Python code snippet demonstrating regular expression finding all occurrences using the re.findall() function, with examples of finding multiple instances of specific characters and patterns within a string, a common task in machine learning and artificial intelligence applications.

re.split splits a string by the occurrences of a regex pattern (returned: list of strings). ❗️note the empty strings returned ❗️

Python code snippet demonstrating regular expression splitting using the re.split() function, with examples of splitting a string based on various patterns, a common task in text preprocessing.

Special meta-characters

Table explaining regular expression metacharacters and their corresponding meanings, including examples of how they are used to match different patterns within strings, a fundamental concept in chatbot development. Python code snippet demonstrating regular expression matching using the re.match() function, with examples of matching various character patterns within strings, including single characters, character ranges, and character groups.

Repetition meta-characters

A table below summarizes the amount of matches of (greedy) repetitions re module performs for the preceding Regular Expression. Greedy means that re module will match as many repetitions as possible.

Table explaining regular expression quantifier metacharacters and their corresponding meanings, essential for pattern matching in Python coding. Python code snippet demonstrating regular expression matching using the re.match() function, with examples of matching various character patterns within strings, including single characters, character ranges, and character groups.

Useful characters classes

Table of regular expression special characters and their equivalent character sets, including decimal digits, non-digits, whitespace, non-whitespace characters, alphanumeric characters, and non-alphanumeric characters, essential for learning Python.

Extremely useful tip: grouping and groups extraction

Grouping allows you not only to match text sequences inside strings but also to extract sub-sequences according to groups you define in a pattern.

You can define groups in a pattern by using round brackets — (), and you can extract the groups from the matched sequences by calling group() method on re.Match object.

For example, let’s say we want to match an email in a text, but we also want to easily extract the username, the domain, and the extension out of the email. So if we get the following text: “abc@gmail.com”, we want (1) to detect that this is an email (according to a pattern) and (2) to detect that the username in this email is “abc”, domain name is “gmail”, and the extension in “com”. First, let’s define a simple pattern that will detect this email. Note, I will use a simplified pattern that assumes that there are solely alphanumeric chars in each of the email component, which is not true in a real life, but it will work for our grouping example.

Python code snippet defining a regular expression pattern to match email addresses, using the regex syntax, a common task in Python coding courses.

Now, I will add groups to the pattern. Technically, I will just add brackets around different parts of my pattern: username, domain, extension:

Python code snippet defining a regular expression pattern to match email addresses, using the regex syntax, a common task in Python development.

We are now ready to match our pattern:

Python code snippet demonstrating regular expression matching using the re.match() function to validate an email address, with the pattern used to match the username, domain name, and top-level domain.

And here comes the magic of groups! We can not only get the matched text, but also extract separate groups by group index as it has been defined in the pattern, just like this:

Python code snippet demonstrating how to extract parts of a matched email address using the re.match() function and the group() method, returning the username, domain name, and top-level domain.

Nice, ah?

By the way, group(0) will return string that represents the whole match. In our case: “your_name@gmail.com”

Bonus

Sometimes you want to play with all these methods and you need it quick, so we organized them in a Google Colab file which is extremely simple to play with. Just don't forget to run "import re" first at the beggining of the file.

That’s all for now, folks. Happy coding!