regex¶
vim¶
lua¶
shell¶
regex¶
本文由 简悦 SimpRead 转码, 原文地址 quickref.me
A quick reference for regular expressions (regex), including symbols, ranges, grouping, assertions an......
#re.search()¶
>>> sentence = 'This is a sample string'
>>> bool(re.search(r'this', sentence, flags=re.I))
True
>>> bool(re.search(r'xyz', sentence))
False
#re.findall()¶
>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']
>>> re.findall(r'\b0*[1-9]\d{2,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']
#re.finditer()¶
>>> m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']
#re.split()¶
>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']
#re.sub()¶
>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', r'* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat
#re.compile()¶
>>> pet = re.compile(r'dog')
>>> type(pet)
<class '_sre.SRE_Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False
本文由 简悦 SimpRead 转码, 原文地址 www.hackingnote.com
HackingNote
Syntax¶
|
: or()
: group
Characters¶
.
: any character (dot matches everything except newlines)\w
: alphanumeric character plus_
, equivalent to[A-Za-z0-9_]
\W
: non-alphanumeric character excluding_
, equivalent to[^A-Za-z0-9_]
\s
: whitespace\S
: anything BUT whitespace\d
: digit, equivalent to[0-9]
\D
: non-digit, equivalent to[^0-9]
[...]
: one of the characters[^...]
: anything but the characters listed
Anchors¶
^
: beginning of a line or string$
: end of a line or string\b
: zero-width word-boundary (like the caret and the dollar sign)\A
: Matches the beginning of a string (but not an internal line).\z
: Matches the end of a string (but not an internal line).
Repetition Operators¶
?
: match 0 or 1 times+
: match at least once*
: match 0 or multiple times{M,N}
: minimum M matches and maximum N matches{M,}
: match at least M times{0,N}
: match at most N times
Greedy vs Lazy¶
.*
: match as long as possible.*?
: match as short as possible
BRE vs ERE vs PCRE¶
The only difference between basic and extended regular expressions is in the behavior of a few characters: ?
, +
, parentheses (()
), and braces ({}
).
- basic regular expressions (BRE): should be escaped to behave as special characters
- extended regular expressions (ERE) : should be escaped to match a literal character.
- Perl Compatible Regular Expressions (PCRE): much more powerful and flexible than BRE and ERE.
Multiple flavors may be supported by the tools:
- sed
sed
: basicsed -E
: extended
- grep
grep
: basicegrep
orgrep -E
JavaScript¶
str.search
str.match
str.matchAll
str.replace
Example: split country name and country code in strings like "China (CN)"
> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'
Match all:
const regex = /.*/g;
const matches = content.matchAll(regex);
for (let match of matches) {
}
Literal vs. Constructor¶
- Literal:
re = /.../g
- Constructor:
re = new RegExp("...")
- can use string concat:
re = new RegExp("..." + some_variable + "...")
- can use string concat:
Local vs. Global¶
re = /.../
:re.match(str)
will return a list of captures of the FIRST match.re = /.../g
:re.match(str)
will return a list of matches but NOT captures.
match vs. exec¶
str.match()
: as stated above.regex.exec()
: return captures, more detailed info; exec multiple times.
Example
var match;
while ((match = re.exec(str)) !== null) {}
Python¶
match
, search
and findall
:
re.match()
: only match at the beginning of the string, returns amatch
object.re.search()
: locate a match anywhere in string, returns amatch
object.re.findall()
: find all occurrences, returns a list of strings.
>>> type(re.search("foo", "foobarfoo"))
<class '_sre.SRE_Match'>
>>> type(re.match("foo", "foobarfoo"))
<class '_sre.SRE_Match'>
re.match()/re.search()¶
re.match()
and re.search()
return a match
object:
>>> match = re.search("f(.*?),", "foo,faa,fuu,bar")
>>> match.groups()
('oo',)
match.group(0)
returns the string snippet that matches the pattern:
>>> match.group(0)
'foo,'
other group
captures the ones in ()
:
>>> match.group(1)
'oo'
re.findall()¶
re.findall()
returns a list, extract value using []
:
>>> match = re.findall("f(.*?),", "foo,faa,fuu,bar")
>>> match
['oo', 'aa', 'uu']
>>> match[0]
'oo'
Compiled Patterns¶
pattern = re.compile(pattern_string)
result = pattern.match(string)
is equivalent to
result = re.match(pattern_string, string)
re.compile()
returns a SRE_Pattern
object:
>>> type(re.compile("pattern"))
<class '_sre.SRE_Pattern'>