3.75. kink/regex/REGEX

The mod provides a regular expression engine backed by java.util.regex package.

The syntax of regex patterns is same as one of Java. See:

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

There are three basic types.

• regex: A regular expression pattern. A regex val provides features such as matching, searching, replacing and splitting.

• match: A result of a successful matching. A regex_match contains groups. See kink/regex/MATCH for details.

• group: A slice in the matched text, which can be the entire slice of the regex_match, or a slice of a named capturing group. See kink/regex/GROUP for details.

`match` type is a subtype of `group`, which represents the entire slice of the matched area.

Example:

:REGEX.require_from('kink/regex/')
:Num_regex <- REGEX.compile('0x(?<Hex>[0-9a-f]+)|(?<Dec>[0-9]+)')
Num_regex.search_all('0xa5a5 42 0xcafe').each{(:Match)
  if(Match.have_group?('Hex')
    { stdout.print_line('Hex: {}'.format(Match.group('Hex').slice)) }
    { stdout.print_line('Dec: {}'.format(Match.group('Dec').slice)) }
  )
}
# Output:
#   Hex: a5a5
#   Dec: 42
#   Hex: cafe

Regexes handle indices of runes or code points, in contrast to java.util.regex which handles indices of UTF-16 units.

3.75.1. type regex

`regex` is a type of immutable regular expression pattern.

`regex` provides features such as matching, searching, replacing and splitting.

R.pattern

`pattern` returns the pattern str from which the regex is made.

Example:

:REGEX.require_from('kink/regex/')
:Regex <- REGEX.compile('.*')
stdout.print_line(Regex.pattern.repr) # => ".*"

R.accept?(Text)

`accept?` returns whether the entire `Text` str matches the pattern of the regex

Precondition:

• `Text` must be a str.

Example:

:REGEX.require_from('kink/regex/')
:Verb_regex <- REGEX.compile('[a-z_][a-z0-9_?]*')
stdout.print_line(Verb_regex.accept?('white_light').repr)  # => true
stdout.print_line(Verb_regex.accept?('<black_heat>').repr) # => false

R.match(Text)

`match` tries to match the regex to the entire `Text` str.

Precondition:

• `Text` must be a str.

Result:

• If the match succeeds, `match` returns a single-element vec [Match], where `Match` is a `match` val.

• If the match fails, `match` returns an empty vec [].

Example:

:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x(?<Digits>[0-9a-f]+)')
:handle <- {(:Text)
  Hex_regex.match(Text).for_maybe(
    {(:Match)
      :Entire = Match.slice
      :Digits = Match.group('Digits').slice
      stdout.print_line('hex={} digits={}'.format(Entire.repr Digits.repr))
    }
    { stdout.print_line('unmatched') }
  )
}
handle('0xa5a5') # => hex="0xa5a5" digits="a5a5"
handle('<0xff>') # => unmatched

R.search(Text Start_pos)

`search` searches for the first slice of the `Text` which the regex matches. The search starts from `Start_pos`.

Preconditions:

• `Text` must be a str

• `Start_pos` must be an int num in the range [0, Text.size]

Result:

• If the search succeeds, `search` returns a single-element vec [Match], where `Match` is a `match`.

• If the search fails, `search` returns an empty vec [].

Example:

:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x[0-9a-f]+')
:handle <- {(:Text :Start_pos)
  Hex_regex.search(Text Start_pos).for_maybe(
    {(:Match)
      stdout.print_line('from={} to={}'.format(Match.from Match.to))
    }
    { stdout.print_line('not found') }
  )
}
:Program <- '0xca 0xfe'
handle(Program 2) # => from=5 to=9
handle(Program 6) # => not found

R.search_all(Text)

`search_all` searches for all the slices of the `Text` which the regex matches, and returns an iter of `match` vals for the slices.

Precondition:

• `Text` must be a str.

The first search is attempted from the beginning of the Text.

If a search from an Ind results in a `match` for an empty slice, the next search is attempted from (Ind + 1). Otherwise, the next search is attempted from the Match.to ind of the current `match`.

Example:

:REGEX.require_from('kink/regex/')
:Hex_regex <- REGEX.compile('0x[0-9a-f]+')
:Program <- '0xca 0xfe'
Hex_regex.search_all(Program)
.map{(:M) M.slice }
.each{(:Hex)
  stdout.print_line(Hex.repr)
}
# Output:
#   "0xca"
#   "0xfe"

R.replace_all(Text $match_to_str)

`replace_all` replaces all the slices of the `Text` which the regex matches.

Precondition:

• `Text` must be a str.

Searching is done in the way same as `search_all`.

Matched slices are replaced by $match_to_str. $match_to_str must take a `match`, and return a str.

Example: convert hex literals to decimal

:REGEX.require_from('kink/regex/')
:NUM.require_from('kink/')
:Hex_regex <- REGEX.compile('0x(?<Digits>[0-9a-f]+)')

:Program <- '0xca 0xfe'
:Decimal_program <- Hex_regex.replace_all(Program){(:M)
  :Digits = M.group('Digits').slice
  [:N] = NUM.parse_int(Digits 16)
  N.show
}
stdout.print_line(Decimal_program.repr)
# => "202 254"

R.split(Text ...[Max_field_count])

`split` splits `Text` into a vec of str vals, using the regex as the pattern of delimiters.

Precondition:

• `Max_field_count` must be an int num greater than or equal to 1

If `Max_field_count` is specified, spliting is performed at most `Max_field_count - 1` times, so that the number of fields is limited to `Max_field_count`.

If `Max_field_count` is not specified, splitting is performed on all the matches.

Example:

:REGEX.require_from('kink/regex/')
:Sep_regex <- REGEX.compile(' *, *')

stdout.print_line(Sep_regex.split('foo, bar, baz').repr) # => ["foo" "bar" "baz"]
stdout.print_line(Sep_regex.split('foo, bar').repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo').repr) # => ["foo"]
stdout.print_line(Sep_regex.split('').repr) # => [""]

stdout.print_line(Sep_regex.split('foo, bar, baz' 2).repr) # => ["foo" "bar, baz"]
stdout.print_line(Sep_regex.split('foo, bar' 2).repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo' 2).repr) # => ["foo"]
stdout.print_line(Sep_regex.split('' 2).repr) # => [""]

3.75.2. REGEX.compile(Pattern ...[$config])

`compile` makes a regex from `Pattern`.

Preconditions:

• Pattern must be a str.

• $config must be a fun which takes a conf val.

The conf val provides the following methods:

• C.on_success($success_cont): specifies $success_cont as the success cont. If `on_success` is not called, CORE.identity is used as the default success cont.

• C.on_error($error_cont): specifies $error_cont as the error cont. If `on_error` is not called, a fun which raises an exception is used as the default error cont.

Result:

• If `Pattern` can be compiled as a regex, `compile` tail-calls the success cont with the created regex val.

• If compilation hits a syntax error, `compile` tail-calls the error cont with (Error_msg, Ind), where `Error_msg` is a str of the error message str, and `Ind` is an int num of the index of the error place in `Pattern`.

Example with the default conts:

:REGEX.require_from('kink/regex/')

stdout.print_line(REGEX.compile('[a-z]*').repr)
# => (regex "[a-z]*")

REGEX.compile('(**)')
# Output:
#   ...
#   [builtin:kink-mods/kink/regex/REGEX.kn L472 C5 raise] -->raise('REGEX.compile(Pattern ...[$config]): syntax error: {}: {}'.format(Msg Place_desc))
#   exception message: REGEX.compile(Pattern ...[$config]): syntax error: Dangling meta character '*': (-->**)

Specifying conts:

:REGEX.require_from('kink/regex/')

:try_compile <- {(:Pattern)
  REGEX.compile(Pattern){(:C)
    C.on_success{(:Regex)
      stdout.print_line('compiled: {}'.format(Regex.repr))
    }
    C.on_error{(:Msg :Ind)
      stdout.print_line('error at index={}: {}'.format(Ind Msg))
    }
  }
}

try_compile('[a-z]*')
# => compiled: (regex "[a-z]*")

try_compile('(**)')
# => error at index=1: Dangling meta character '*'

3.75.3. REGEX.is?(Val)

`is?` returns whether the `Val` is a regex val.

3.75.4. REGEX.escape(Str)

`escape` escapes regex special characters in `Str`. The result can be used as an arg of REGEX.compile, or can be embedded in a regex pattern.

Precondition:

• `Str` must be a str.

Example:

:REGEX.require_from('kink/regex/')

:Escaped <- REGEX.escape('int main() { puts("hello\n"); }')
stdout.print_line(Escaped.repr)
# => "\\Qint main() { puts(\"hello\\n\"); }\\E"