4.81. kink/regex/REGEX¶
The mod provides a regular expression engine backed by java.util.regex package.
The syntax of regex patterns is same as one of Java. See:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
There are three basic types.
• regex: A regular expression pattern. A regex val provides features such as matching, searching, replacing and splitting.
• match: A result of a successful matching. A regex_match contains groups. See kink/regex/MATCH for details.
• group: A slice in the matched text, which can be the entire slice of the regex_match, or a slice of a named capturing group. See kink/regex/GROUP for details.
`match` type is a subtype of `group`, which represents the entire slice of the matched area.
Example:
:REGEX.require_from('kink/regex/')
:Num_regex <- REGEX.compile('0x(?<Hex>[0-9a-f]+)|(?<Dec>[0-9]+)')
Num_regex.search_all('0xa5a5 42 0xcafe').each{(:Match)
if(Match.have_group?('Hex')
{ stdout.print_line('Hex: {}'.format(Match.group('Hex').slice)) }
{ stdout.print_line('Dec: {}'.format(Match.group('Dec').slice)) }
)
}
# Output:
# Hex: a5a5
# Dec: 42
# Hex: cafe
Regexes handle indices of runes or code points, in contrast to java.util.regex which handles indices of UTF-16 units.
4.81.1. type regex¶
`regex` is a type of immutable regular expression pattern.
`regex` provides features such as matching, searching, replacing and splitting.
4.81.1.1. R.pattern¶
`pattern` returns the pattern str from which the regex is made.
Example:
:REGEX.require_from('kink/regex/')
:Regex <- REGEX.compile('.*')
stdout.print_line(Regex.pattern.repr) # => ".*"
4.81.1.2. R.accept?(Text)¶
`accept?` returns whether the entire `Text` str matches the pattern of the regex
Precondition:
• `Text` must be a str.
Example:
:REGEX.require_from('kink/regex/')
:Verb_regex <- REGEX.compile('[a-z_][a-z0-9_?]*')
stdout.print_line(Verb_regex.accept?('white_light').repr) # => true
stdout.print_line(Verb_regex.accept?('<black_heat>').repr) # => false
4.81.1.3. R.match(Text)¶
`match` tries to match the regex to the entire `Text` str.
Precondition:
• `Text` must be a str.
Result:
• If the match succeeds, `match` returns a single-element vec [Match], where `Match` is a `match` val.
• If the match fails, `match` returns an empty vec [].
Example:
:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x(?<Digits>[0-9a-f]+)')
:handle <- {(:Text)
Hex_regex.match(Text).with_just_or(
{(:Match)
:Entire = Match.slice
:Digits = Match.group('Digits').slice
stdout.print_line('hex={} digits={}'.format(Entire.repr Digits.repr))
}
{ stdout.print_line('unmatched') }
)
}
handle('0xa5a5') # => hex="0xa5a5" digits="a5a5"
handle('<0xff>') # => unmatched
4.81.1.4. R.search(Text Start_pos)¶
`search` searches for the first slice of the `Text` which the regex matches. The search starts from `Start_pos`.
Preconditions:
• `Text` must be a str
• `Start_pos` must be an int num in the range [0, Text.size]
Result:
• If the search succeeds, `search` returns a single-element vec [Match], where `Match` is a `match`.
• If the search fails, `search` returns an empty vec [].
Example:
:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x[0-9a-f]+')
:handle <- {(:Text :Start_pos)
Hex_regex.search(Text Start_pos).with_just_or(
{(:Match)
stdout.print_line('from={} to={}'.format(Match.from Match.to))
}
{ stdout.print_line('not found') }
)
}
:Program <- '0xca 0xfe'
handle(Program 2) # => from=5 to=9
handle(Program 6) # => not found
4.81.1.5. R.search_all(Text)¶
`search_all` searches for all the slices of the `Text` which the regex matches, and returns an iter of `match` vals for the slices.
Precondition:
• `Text` must be a str.
The first search is attempted from the beginning of the Text.
If a search from an Ind results in a `match` for an empty slice, the next search is attempted from (Ind + 1). Otherwise, the next search is attempted from the Match.to ind of the current `match`.
Example:
:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x[0-9a-f]+')
:Program <- '0xca 0xfe'
Hex_regex.search_all(Program)
.map{(:M) M.slice }
.each{(:Hex)
stdout.print_line(Hex.repr)
}
# Output:
# "0xca"
# "0xfe"
4.81.1.6. R.replace_all(Text $match_to_str)¶
`replace_all` replaces all the slices of the `Text` which the regex matches.
Precondition:
• `Text` must be a str.
Searching is done in the way same as `search_all`.
Matched slices are replaced by $match_to_str. $match_to_str must take a `match`, and return a str.
Example: convert hex literals to decimal
:REGEX.require_from('kink/regex/')
:NUM.require_from('kink/')
:Hex_regex <- REGEX.compile('0x(?<Digits>[0-9a-f]+)')
:Program <- '0xca 0xfe'
:Decimal_program <- Hex_regex.replace_all(Program){(:M)
:Digits = M.group('Digits').slice
[:N] = NUM.parse_int(Digits 16)
N.show
}
stdout.print_line(Decimal_program.repr)
# => "202 254"
4.81.1.7. R.split(Text ...[Max_field_count])¶
`split` splits `Text` into a vec of str vals, using the regex as the pattern of delimiters.
Precondition:
• `Max_field_count` must be an int num greater than or equal to 1
If `Max_field_count` is specified, spliting is performed at most `Max_field_count - 1` times, so that the number of fields is limited to `Max_field_count`.
If `Max_field_count` is not specified, splitting is performed on all the matches.
Example:
:REGEX.require_from('kink/regex/')
:Sep_regex <- REGEX.compile(' *, *')
stdout.print_line(Sep_regex.split('foo, bar, baz').repr) # => ["foo" "bar" "baz"]
stdout.print_line(Sep_regex.split('foo, bar').repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo').repr) # => ["foo"]
stdout.print_line(Sep_regex.split('').repr) # => [""]
stdout.print_line(Sep_regex.split('foo, bar, baz' 2).repr) # => ["foo" "bar, baz"]
stdout.print_line(Sep_regex.split('foo, bar' 2).repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo' 2).repr) # => ["foo"]
stdout.print_line(Sep_regex.split('' 2).repr) # => [""]
4.81.2. REGEX.compile(Pattern ...[$config])¶
`compile` makes a regex from `Pattern`.
Preconditions:
• Pattern must be a str.
• $config must be a fun which takes a conf val.
The conf val provides the following methods:
• C.on_success($success_cont): specifies $success_cont as the success cont. If `on_success` is not called, VAL.identity is used as the default success cont.
• C.on_error($error_cont): specifies $error_cont as the error cont. If `on_error` is not called, a fun which raises an exception is used as the default error cont.
Result:
• If `Pattern` can be compiled as a regex, `compile` tail-calls the success cont with the created regex val.
• If compilation hits a syntax error, `compile` tail-calls the error cont with (Error_msg, Ind), where `Error_msg` is a str of the error message str, and `Ind` is an int num of the index of the error place in `Pattern`.
Example with the default conts:
:REGEX.require_from('kink/regex/')
stdout.print_line(REGEX.compile('[a-z]*').repr)
# => (regex "[a-z]*")
REGEX.compile('(**)')
# Output:
# ...
# {builtin:kink-mods/kink/regex/REGEX.kn L444 C7 error_cont} -->error_cont(Msg Ind)
# {builtin:kink-mods/kink/regex/REGEX.kn L476 C5 raise} -->raise('REGEX.compile(Pattern ...[$config]): syntax error: {}: {}'.format(Msg Place_desc))
# REGEX.compile(Pattern ...[$config]): syntax error: Dangling meta character '*': (-->**)
Specifying conts:
:REGEX.require_from('kink/regex/')
:try_compile <- {(:Pattern)
REGEX.compile(Pattern){(:C)
C.on_success{(:Regex)
stdout.print_line('compiled: {}'.format(Regex.repr))
}
C.on_error{(:Msg :Ind)
stdout.print_line('error at index={}: {}'.format(Ind Msg))
}
}
}
try_compile('[a-z]*')
# => compiled: (regex "[a-z]*")
try_compile('(**)')
# => error at index=1: Dangling meta character '*'
4.81.3. REGEX.is?(Val)¶
`is?` returns whether the `Val` is a regex val.
4.81.4. REGEX.escape(Str)¶
`escape` escapes regex special characters in `Str`. The result can be used as an arg of REGEX.compile, or can be embedded in a regex pattern.
Precondition:
• `Str` must be a str.
Example:
:REGEX.require_from('kink/regex/')
:Escaped <- REGEX.escape('int main() { puts("hello\n"); }')
stdout.print_line(Escaped.repr)
# => "\\Qint main() { puts(\"hello\\n\"); }\\E"