3

一日一技:让你的正则表达式可读性提高一百倍

 1 year ago
source link: https://www.kingname.info/2022/06/20/readable-re/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

一日一技:让你的正则表达式可读性提高一百倍

2022-06-20

|

Python

|

1

正则表达式这个东西,强大是强大,但写出来跟个表情符号一样。自己写的表达式,过一个月来看,自己都不记得是什么意思了。比如下面这个:

pattern = r"((?:\(\s*)?[A-Z]*H\d+[a-z]*(?:\s*\+\s*[A-Z]*H\d+[a-z]*)*(?:\s*[\):+])?)(.*?)(?=(?:\(\s*)?[A-Z]*H\d+[a-z]*(?:\s*\+\s*[A-Z]*H\d+[a-z]*)*(?:\s*[\):+])?(?![^\w\s])|$)"

有没有什么办法提高正则表达式的可读性呢?我们知道,提高代码可读性的方法之一就是写注释,那么正则表达式能不能写注释呢?

例如对于下面这个句子:

msg = '我叫青南,我的密码是:123kingname456,请注意保密。'

我要提取其中的密码123kingname456,那么我的正则表达式可能是这样的:

pattern = ':(.*?),'

我能不能把它写成这样:

pattern = '''
: # 开始标志
(.*?) #从开始标志的下一个字符开始的任意字符
, #遇到英文逗号就停止
'''

这样写就清晰多了,每个部分是什么作用全都清清楚楚。

但显然直接使用肯定什么都提取不到,如下图所示:

20220610105224.png

但我今天在逛Python正则表达式文档的时候,发现了一个好东西:

20220610105723.png

使用它,可以让你的正则表达式拥有注释,如下图所示:

20220610105851.png

re.VERBOSE也可以简称为re.X,如下图所示:

20220610105935.png

本文最开头的复杂正则表达式,使用了注释以后,就会变的更可读:

pattern = r"""
( # code (capture)
# BEGIN multicode

(?: \( \s* )? # maybe open paren and maybe space

# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix

(?: # maybe followed by other codes,
\s* \+ \s* # ... plus-separated

# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
)*

(?: \s* [\):+] )? # maybe space and maybe close paren or colon or plus

# END multicode
)

( .*? ) # message (capture): everything ...

(?= # ... up to (but excluding) ...
# ... the next code

# BEGIN multicode

(?: \( \s* )? # maybe open paren and maybe space

# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix

(?: # maybe followed by other codes,
\s* \+ \s* # ... plus-separated

# code
[A-Z]*H # prefix
\d+ # digits
[a-z]* # suffix
)*

(?: \s* [\):+] )? # maybe space and maybe close paren or colon or plus

# END multicode

# (but not when followed by punctuation)
(?! [^\w\s] )

# ... or the end
| $
)
"""
谢乾坤 | Kingname wechat
第一时间获取最新文章更新,请订阅我的微信公众号:未闻Code

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK