3.40. kink/host_lang/java/REGEX

The mod provides a regular expression engine backed by java.util.regex package.

The syntax of regex patterns is same as one of Java. See:

https://download.java.net/java/early_access/jdk11/docs/api/java.base/java/util/regex/Pattern.html

There are three basic types.

• regex: A regular expression pattern. A regex val provides features such as matching, searching, replacing and splitting.

• regex_match: A result of a successful matching. A regex_match contains groups.

• regex_group: A slice in the matched text, which can be the entire slice of the regex_match, or a slice of a named capturing group.

A regex_match val is also a regex_group, which represents the entire slice of the matched area.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Num_regex <- REGEX.new('0x(?<Hex>[0-9a-f]+)|(?<Dec>[0-9]+)')
Num_regex.search_all('0xa5a5 42 0xcafe').each{(:Match)
  Match.have_group?('Hex').if_else(
    { stdout.print_line('Hex: {}'.format(Match.group('Hex').slice)) }
    { stdout.print_line('Dec: {}'.format(Match.group('Dec').slice)) }
  )
}
# Output:
#   Hex: a5a5
#   Dec: 42
#   Hex: cafe

This mod is placed under kink/host_lang/java/ although the feature is not essentially related to integration with Java. The reason is that at the moment it is impractical to port the engine to another host environment, so the mod should be regarded as dependent on Java.

This mod handles indices of runes or code points, in contrast to java.util.regex which handles indices of UTF-16 units.

3.40.1. type regex

A regex is an immutable regular expression pattern.

A regex provides features such as matching, searching, replacing and splitting.

Regex.pat

Regex.pat returns the pat str from which the regex is made.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('.*')
stdout.print_line(Regex.pat.repr) # => ".*"

Regex.accept?(Text)

Regex.accept? returns whether the entire Text str matches the pattern of the Regex.

Text must be a str.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Verb_regex <- REGEX.new('[a-z_][a-z0-9_?]*')
stdout.print_line(Verb_regex.accept?('white_light').repr)  # => true
stdout.print_line(Verb_regex.accept?('<black_heat>').repr) # => false

Regex.match(Text)

Regex.match tries to match the Regex to the entire Text str.

Text must be a str.

If the match succeeds, Regex.match returns a single-element vec [Regex_match], where Regex_match is a regex_match.

If the match fails, Regex.match returns an empty vec [].

Example:

:REGEX.require_from('kink/host_lang/java/')
:Hex_regex <- REGEX.new('0x(?<Digits>[0-9a-f]+)')
:handle <- {(:Text)
  Hex_regex.match(Text).for_just_or(
    {(:Match)
      :Entire = Match.slice
      :Digits = Match.group('Digits').slice
      stdout.print_line('hex={} digits={}'.format(Entire.repr Digits.repr))
    }
    { stdout.print_line('unmatched') }
  )
}
handle('0xa5a5') # => hex="0xa5a5" digits="a5a5"
handle('<0xff>') # => unmatched

Regex.search(Text Start_pos)

Regex.search searches for the first slice of the Text which the Regex matches, starting from Start_pos.

Preconditions:

• Text must be a str

• Start_pos must be an int num in the range [0, Text.size]

If the search succeeds, Regex.search returns a single-element vec [Regex_match], where Regex_match is a regex_match.

If the search fails, Regex.search returns an empty vec [].

Example:

:REGEX.require_from('kink/host_lang/java/')
:Hex_regex <- REGEX.new('0x[0-9a-f]+')
:handle <- {(:Text :Start_pos)
  Hex_regex.search(Text Start_pos).for_just_or(
    {(:Match)
      stdout.print_line('from={} to={}'.format(Match.from Match.to))
    }
    { stdout.print_line('not found') }
  )
}
:Program <- '0xca 0xfe'
handle(Program 2) # => from=5 to=9
handle(Program 6) # => not found

Regex.search_all(Text)

Regex.search_all searches for all the slices of the Text which the Regex matches, and returns an iter of regex_match vals for the slices.

Text must be a str.

The first search is attempted from the beginning of the Text.

If a search from an Ind results in a regex_match for an empty slice, the next search is attempted from (Ind + 1). Otherwise, the next search is attempted from the “Match.to” ind of the current regex_match.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Hex_regex <- REGEX.new('0x[0-9a-f]+')
:Program <- '0xca 0xfe'
Hex_regex.search_all(Program)
.map{(:M) M.slice }
.each{(:Hex)
  stdout.print_line(Hex.repr)
}
# Output:
#   "0xca"
#   "0xfe"

Regex.replace_all(Text $match_to_str)

Regex.replace_all replaces all the slices of the Text which the Regex matches.

Text must be a str.

Searching is done in the way same as Regex.search_all.

Matched slices are replaced by $match_to_str. $match_to_str must take a regex_match, and return a str.

Example: convert hex literals to decimal

:REGEX.require_from('kink/host_lang/java/')
:NUM.require_from('kink/')
:Hex_regex <- REGEX.new('0x(?<Digits>[0-9a-f]+)')

:Program <- '0xca 0xfe'
:Decimal_program <- Hex_regex.replace_all(Program){(:M)
  :Digits = M.group('Digits').slice
  [:N] = NUM.parse_int(Digits 16)
  N.show
}
stdout.print_line(Decimal_program.repr)
# => "202 254"

Regex.split(Text ...[Max_field_count])

Regex.split splits the Text into a vec of str vals, using the Regex as the pattern of delimiters.

Precondition:

• Max_field_count must be an int num greater than or equal to 1

If Max_field_count is specified, spliting is performed at most Max_field_count-1 times, so that the number of fields is limited to Max_field_count.

If Max_field_count is not specified, splitting is performed on all the matches.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Sep_regex <- REGEX.new(' *, *')

stdout.print_line(Sep_regex.split('foo, bar, baz').repr) # => ["foo" "bar" "baz"]
stdout.print_line(Sep_regex.split('foo, bar').repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo').repr) # => ["foo"]
stdout.print_line(Sep_regex.split('').repr) # => [""]

stdout.print_line(Sep_regex.split('foo, bar, baz' 2).repr) # => ["foo" "bar, baz"]
stdout.print_line(Sep_regex.split('foo, bar' 2).repr) # => ["foo" "bar"]
stdout.print_line(Sep_regex.split('foo' 2).repr) # => ["foo"]
stdout.print_line(Sep_regex.split('' 2).repr) # => [""]

3.40.2. REGEX.new(Pat)

REGEX.new makes a regex val from the Pat str.

If the Pat has a syntax error, the fun raises an exception.

3.40.3. REGEX.regex?(Val)

REGEX.regex? returns whether the Val is a regex.

3.40.4. type regex_match

The result of a successful regex match or search.

Regex_match.slice

Regex_match.slice returns the sliced str of the text where the regex matches.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('0x[0-9a-f]+')
[:M] <- Regex.search('var x = 0xcafebabe;' 0)
stdout.print_line(M.slice.repr) # => "0xcafebabe"

Regex_match.from

Regex_match.from returns the pos from which the regex matches in the text.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('0x[0-9a-f]+')
[:M] <- Regex.search('var x = 0xcafebabe;' 0)
stdout.print_line(M.from.repr) # => 8

Regex_match.to

Regex_match.to returns the pos to which the regex matches in the text.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('0x[0-9a-f]+')
[:M] <- Regex.search('var x = 0xcafebabe;' 0)
stdout.print_line(M.to.repr) # => 18

Regex_match.have_group?(Name)

Regex_match.have_group? returns whether the match has the group named as Name.

Precondition:

• Name must be a str

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('(?<Hex>0x[0-9a-f]+)|(?<Binary>0b[01]+)')
[:M] <- Regex.search('var x = 0xcafebabe;' 0)
stdout.print_line(M.have_group?('Hex').repr)      # => true
stdout.print_line(M.have_group?('Binary').repr)   # => false
stdout.print_line(M.have_group?('Decimal').repr)  # => false

Regex_match.group(Name)

Regex_match.group returns a regex_group of the named group.

Precondition:

• Name must be a str

• The match must have the group named as Name.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('(?<Var>[a-z]+) *= *(?<Num>[0-9]+)')
[:M] <- Regex.match('foo = 42')
stdout.print_line(M.group('Var').slice.repr)  # => "foo"
stdout.print_line(M.group('Num').slice.repr)  # => "42"

3.40.5. REGEX.regex_match?(Val)

REGEX.regex_match? returns whether the Val is a regex_match.

3.40.6. type regex_group

A regex_group is a slice in the matched text.

A regex_group is either the entire region of the match result, or a region which a named capturing group matches.

Example:

:REGEX.require_from('kink/host_lang/java/')
:Regex <- REGEX.new('(?<Lhs>[a-z]+) *= *(?<Rhs>[a-z]+)')
[:M] <- Regex.match('foo = bar')
:Rhs <- M.group('Rhs')
stdout.print_line(Rhs.slice.repr) # => "bar"
stdout.print_line(Rhs.from.repr)  # => 6
stdout.print_line(Rhs.to.repr)    # => 9

Regex_group.slice

Regex_group.slice returns the sliced str of the text where the group matches.

Regex_group.from

Regex_group.from returns the pos from which the group matches in the text.

Regex_group.to

Regex_group.from returns the pos to which the group matches in the text.

3.40.7. REGEX.regex_group?(Val)

REGEX.regex_group? returns whether the Val is a regex_group.