What is Regex?
A regular expression (often shortened to "regex" or "regexp") is a sequence of characters that defines a search pattern. Think of it as a mini programming language dedicated entirely to finding, matching, and manipulating text. Regular expressions are supported in virtually every modern programming language, text editor, and command-line tool.
Regex is used for a wide range of tasks: validating user input (such as email addresses and phone numbers), searching through log files, extracting data from structured text, and performing find-and-replace operations that would be impossible with simple string matching. Once you understand the fundamentals, regex becomes one of the most powerful tools in your developer toolkit.
The learning curve can feel steep at first because regex patterns look cryptic. A pattern like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ might seem intimidating, but by the end of this guide, you will be able to read and write patterns like this with confidence.
Basic Syntax
At the most basic level, a regex pattern is just a sequence of literal characters. The pattern hello matches the exact string "hello" wherever it appears. However, regex becomes powerful when you introduce metacharacters — special symbols that have meaning beyond their literal value.
.(dot) — Matches any single character except a newline. The patternc.tmatches "cat", "cot", "cut", and even "c9t".*(asterisk) — Matches the preceding element zero or more times. The patternab*cmatches "ac", "abc", "abbc", "abbbc", and so on.+(plus) — Matches the preceding element one or more times. Unlike*, the element must appear at least once. The patternab+cmatches "abc" and "abbc" but not "ac".?(question mark) — Matches the preceding element zero or one time, making it optional. The patterncolou?rmatches both "color" and "colour".{n,m}(quantifier) — Matches the preceding element between n and m times. For example,a{2,4}matches "aa", "aaa", or "aaaa". You can also use{n}for exactly n times, or{n,}for n or more times.
To match a metacharacter literally, escape it with a backslash. For instance, \. matches an actual period, and \* matches an actual asterisk.
Character Classes
Character classes let you match one character from a defined set. They are enclosed in square brackets and are one of the most frequently used regex features.
[a-z]— Matches any lowercase letter from a to z.[A-Z]— Matches any uppercase letter from A to Z.[0-9]— Matches any digit from 0 to 9.[aeiou]— Matches any vowel (you can list specific characters).[^a-z]— The caret inside brackets negates the class, matching any character that is not a lowercase letter.
Regex also provides convenient shorthand character classes that save you from writing out full bracket expressions:
\d— Equivalent to[0-9]. Matches any digit.\w— Equivalent to[a-zA-Z0-9_]. Matches any "word" character (letters, digits, and underscore).\s— Matches any whitespace character (spaces, tabs, newlines).\D,\W,\S— The uppercase versions negate the class.\Dmatches any non-digit,\Wmatches any non-word character, and\Smatches any non-whitespace character.
Anchors
Anchors do not match characters — they match positions within a string. The two most important anchors are:
^(caret) — Matches the start of a string. When the multiline flag is enabled, it matches the start of each line.$(dollar) — Matches the end of a string. With the multiline flag, it matches the end of each line.
Anchors are critical for validation. Without them, a pattern might match a substring rather than the entire input. For example, the pattern \d+ would match "42" inside "abc42xyz". But ^\d+$ only matches if the entire string consists of digits, which is what you want when validating numeric input.
Another useful anchor is \b, the word boundary. It matches the position between a word character and a non-word character. The pattern \bcat\b matches the word "cat" but not "category" or "concatenate".
Groups and Capture Groups
Parentheses () serve two purposes in regex: they group parts of a pattern together, and they capture the matched text so you can reference it later.
Basic capture group: The pattern (\d{3})-(\d{4}) matches a string like "555-1234" and captures "555" as group 1 and "1234" as group 2. You can reference these groups in replacements or in your code.
Non-capturing group: If you need grouping for structure but do not need to capture the result, use (?:...). For example, (?:https?|ftp):// groups the protocol options without creating a capture group, which is slightly more efficient.
Named capture groups: For readability, you can assign names to groups using the syntax (?<name>...). A date pattern like (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) lets you reference matches by name (e.g., match.groups.year in JavaScript) instead of numeric indices.
Flags
Flags (also called modifiers) change how the regex engine interprets your pattern. They are typically appended after the closing delimiter of the pattern. The most commonly used flags are:
g(global) — Find all matches in the string, not just the first one. Without this flag, the engine stops after the first match.i(case-insensitive) — Makes the pattern match regardless of letter case. The pattern/hello/imatches "Hello", "HELLO", and "hElLo".m(multiline) — Changes the behavior of^and$so that they match the start and end of each line, rather than the start and end of the entire string.s(dotAll) — Makes the.metacharacter match any character including newlines. By default,.does not match newline characters.
In JavaScript, you write flags like this: /pattern/gi. In Python, you pass them as arguments: re.findall(pattern, string, re.IGNORECASE).
Practical Examples
Let us put everything together with real-world validation patterns that you will encounter frequently.
Email validation (simplified):
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern ensures the string starts with one or more valid characters, followed by an @ symbol, a domain name with at least one dot, and a top-level domain of two or more letters.
URL validation:
^https?:\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(\/\S*)?$
This matches URLs starting with http:// or https://, followed by a domain name and an optional path. The s? makes the "s" in "https" optional.
Phone number (US format):
^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
This pattern matches numbers in formats like "5551234567", "555-123-4567", "(555) 123-4567", and "555.123.4567". The parentheses and separators are all made optional with ?.
Date validation (YYYY-MM-DD):
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
This validates dates in ISO 8601 format. The month portion (0[1-9]|1[0-2]) accepts 01 through 12, and the day portion accepts 01 through 31. Note that this does not check whether the day is actually valid for the given month — regex alone cannot handle that logic.
Common Mistakes Beginners Make
Regex is powerful, but there are several pitfalls that trip up newcomers. Being aware of these will save you hours of debugging.
- Forgetting anchors: Without
^and$, your pattern matches substrings. A pattern like\d{3}will match the first three digits inside any longer string, which is rarely what you intend for validation. - Greedy vs lazy matching: By default, quantifiers like
*and+are greedy — they match as much text as possible. This can cause unexpected behavior, especially when matching content between delimiters. Add?after a quantifier to make it lazy:.*?matches as little as possible. - Not escaping special characters: Characters like
.,*,+,?,(,),[,],{,},^,$,|, and\all have special meaning. If you want to match them literally, you must escape them with a backslash. - Overcomplicating patterns: It is tempting to write one massive regex that handles every edge case. In practice, breaking complex validation into multiple simpler patterns or combining regex with code logic is more readable and maintainable.
- Ignoring performance: Poorly written patterns with nested quantifiers (e.g.,
(a+)+) can cause catastrophic backtracking, where the engine takes exponentially longer as input size grows. Always test your regex with both valid and adversarial inputs. - Platform differences: Regex syntax varies slightly between languages. Features like lookbehinds, named groups, and Unicode property escapes are not universally supported. Always check your target language's documentation.
The best way to improve at regex is through practice. Start with simple patterns, test them in a live tool, and gradually build up to more complex expressions as your confidence grows.