Regex Cheatsheet

Anchors

Syntax	Description	Example
^	Start of string / line	^Hello
$	End of string / line	world$
\b	Word boundary	\bword\b
\B	Not a word boundary	\Bword\B
\A	Start of string	\AStart
\Z	End of string (before newline)	end\Z

Character Classes

Syntax	Description	Example
.	Any character except newline	a.c
\d	Digit [0-9]	\d{3}
\D	Not a digit	\D+
\w	Word character [A-Za-z0-9_]	\w+
\W	Not a word character	\W+
\s	Whitespace	a\sb
\S	Not whitespace	\S+
[abc]	Any character in set	[aeiou]
[^abc]	Any character not in set	[^0-9]
[a-z]	Any character in range	[a-zA-Z]
\n	Newline	line\n
\t	Tab	col\tcol
\r	Carriage return	text\r

Quantifiers

Syntax	Description	Example
*	0 or more (greedy)	a*
+	1 or more (greedy)	a+
?	0 or 1 (optional)	colou?r
{n}	Exactly n	\d{4}
{n,}	n or more	\d{2,}
{n,m}	Between n and m	\d{2,4}
*?	0 or more (lazy)	<.*?>
+?	1 or more (lazy)	a+?

Groups & Lookaround

Syntax	Description	Example
(abc)	Capturing group	(\d+)
(?:abc)	Non-capturing group	(?:http\|https)
(?=abc)	Positive lookahead	\d(?=\.)
(?!abc)	Negative lookahead	\w(?!\d)
(?<=abc)	Positive lookbehind	(?<=\$)\d+
(?<!abc)	Negative lookbehind	(?<!@)\w+
\|	Alternation (OR)	cat\|dog
\1	Backreference to group 1	(\w)\1
(?<name>)	Named group	(?<year>\d{4})

Flags

Syntax	Description	Example
g	Global match (all matches)	/pattern/g
i	Case insensitive	/abc/i
m	Multiline (^ $ match lines)	/^line/m
s	Dotall (. matches newline)	/a.b/s
u	Unicode support	/\p{L}/u
y	Sticky matching	/abc/y

Common Patterns

Syntax	Description	Example
\d{4}-\d{2}-\d{2}	Date YYYY-MM-DD	2024-01-15
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}	Email address	[email protected]
https?://[^\s]+	URL	https://example.com
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b	IPv4 address	192.168.1.1
[A-Fa-f0-9]{64}	SHA-256 hash	a3f2b1...
\b[A-Za-z0-9._%+-]+\|[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b	Domain or email	example.com

Unicode Properties

Syntax	Description	Example
\p{L}	Any letter	\p{L}+
\p{N}	Any number	\p{N}+
\p{P}	Punctuation	\p{P}
\p{Script=Latin}	Latin script	\p{Script=Latin}+
\p{Emoji}	Emoji character	\p{Emoji}

Escaped Characters

Syntax	Description	Example
\.	Literal dot	example\.com
\\	Literal backslash	C:\\path
\*	Literal asterisk	\*required
\+	Literal plus	\+1
\?	Literal question mark	who\?
	Literal parentheses	$text$
\[ \]	Literal brackets	\[tag\]
\{ \}	Literal braces	\{json\}

技术详情

正则表达式速查表

工具功能

正则表达式速查表提供完整的正则表达式语法参考，分类清晰、示例丰富。涵盖：字符类、数量限定（*、+、?、n）、锚断言（^、$）、分组捕获、前瞻后顾、标志（g、i、m、s、u、y）以及常用的实用模式（邮箱、URL、IP 地址、中文匹配等）。每个语法点配有简洁示例说明。

常见开发者使用场景

正则表达式在文本处理和表单验证中是核心技术。前端表单验证（邮箱格式、手机号格式、密码复杂度）离不开正则。后端日志分析中使用正则解析 Apache/Nginx 日志中的 IP、时间、URL 和状态码。数据处理脚本中提取 CSV/JSON 之外的非结构化文本信息。代码编辑器和 IDE 的查找替换功能支持正则后威力大增。路由框架（Express、Koa）的路径匹配也是基于正则。Web 安全中检测 XSS 和 SQL 注入也使用正则模式匹配。

配合正则测试器实时测试你的正则表达式，或使用正则表达式测试器在更多特性（如回溯、性能分析）下进行测试。

正则表达式引擎原理

理解正则引擎工作原理有助于写出高效且安全的正则：

DFA（确定性有限自动机）：从左到右线性扫描，O(n) 时间复杂度。不支持回溯和捕获组。grep、awk 使用。速度快但功能少。
NFA（非确定性有限自动机）：支持回溯，因此有表达式能力更强的功能（捕获组、前瞻/后顾、反向引用）。JavaScript、Python、Java 使用。但可能触发灾难性回溯（Catastrophic Backtracking）。
回溯陷阱：(a+)+b 这样的模式在匹配不成功时会导致指数级的时间复杂度（ReDoS 攻击）。永远不要在未限制长度的重复中嵌套重复。
原子组和占有量词：(?>...) 和 ++、*+ 等占有量词可以禁用回溯，提高性能并预防 ReDoS，但不是所有引擎都支持。

常见陷阱与注意事项

不要用正则解析 HTML：HTML 不是正则语言，用正则解析 HTML/XML 会失败（嵌套标签无法正确匹配）。使用 DOM 解析器（如 DOMParser、cheerio）。
ReDoS 攻击：服务器端处理用户提供的正则表达式是极高风险的操作——恶意正则可以在数毫秒内造成 CPU 100% 占用。
Unicode 支持：ES6 引入 u 标志提供完整的 Unicode 支持（如 \pScript=Han 匹配中文），但旧浏览器不支持。邮件和 URL 的"标准"正则应在 \\w 基础上添加考虑 Unicode。
转义：在构建动态正则时，所有特殊字符需要转义。大多数语言没有内置的 escapeRegex 函数，需要手动实现。

何时使用此工具而非代码

在编写表单验证、日志解析脚本、或学习正则语法时使用此速查表。实际匹配和测试时使用 text/regex 和 testers/regex-tester 工具进行交互式调试。