2.3. Semantics

This chapter specifies the semantics of Kink programs as transformation into KSM procedures. The transformation is comprised of the two steps "desugaring" and "translation."

On the desugaring step, the program AST is transformed to another program AST, The word desugaring means that this step expands syntactic sugars into fundamental building blocks.

On the translation step, the desugared form of the program is transformed to a KSM procedure.

The following is an example program.

:foo <- {
  :Vec = [10 20 30]
  Vec.each{(:E) bar(E) }
}

The program above is desugared as follows.

\binding:foo.op_store({
  { [\binding:Vec].op_store(\binding._Args)
    \binding.Vec.each({
      [\binding:E].op_store(\binding._Args)
      \binding.bar[()](\binding.E)
    })
  }!!([10 20 30])
})

Then translated to a KSM procedure as follows.

{ binding
  varref foo
  dup
  deref op_store
  emptyvec
  binding
  fun {
    binding       # (binding)
    fun {
      emptyvec        # ([])
      binding         # ([] binding)
      varref Vec      # ([] :Vec)
      add             # ([:Vec])
      dup             # ([:Vec] [:Vec])
      deref op_store  # ([:Vec] $op_store)
      flip            # ($op_store [:Vec])
      emptyvec        # ($op_store [:Vec] [])
      binding         # ($op_store [:Vec] [] \binding)
      deref _Args     # ($op_store [:Vec] [] _Args)
      add             # ($op_store [:Vec] [_Args])
      call            # (result-of-call)
      remove          # ()
      binding         # (binding)
      deref Vec       # (Vec)
      dup             # (Vec Vec)
      deref each      # (Vec $each)
      flip            # ($each Vec)
      emptyvec        # ($each Vec [])
      binding         # ($each Vec [] \binding)
      fun {
        emptyvec        # ([])
        binding         # ([] \binding)
        varref E        # ([] :E)
        add             # ([:E])
        dup             # ([:E] [:E])
        deref op_store  # ([:E] $op_store)
        flip            # ($op_store [:E])
        emptyvec        # ($op_store [:E] [])
        binding         # ($op_store [:E] [] \binding)
        deref _Args     # ($op_store [:E] [] _Args)
        add             # ($op_store [:E] [_Args])
        call            # (result-of-call)
        binding         # (\binding)
        deref bar       # ($bar)
        nada            # ($bar nada)
        emptyvec        # ($bar nada [])
        binding         # ($bar nada [] \binding)
        deref E         # ($bar nada [] E)
        add             # ($bar nada [E])
        call
      }               # ($each Vec [] fun)
      add             # ($each Vec [fun])
      call
    }         # (fun)
    nada      # (fun nada)
    emptyvec  # (fun nada [])
    emptyvec  # (fun nada [] [])
    num 10    # (fun nada [] [] 10)
    add       # (fun nada [] [10])
    num 20    # (fun nada [] [10] 20)
    add       # (fun nada [] [10 20])
    num 30    # (fun nada [] [10 20] 30)
    add       # (fun nada [] [10 20 30])
    add       # (fun nada [[10 20 30]])
    call
  }
  add
  call
}

2.3.1. Desugaring

This section defines desugaring in somewhat informal style.

Programs

A Kink program is composed of zero or more toplevel expressions.

For example, the program is composed of expressions E1, E2, ,,, En, the program is desugared to a parentheses expression (E1 E2 ,,, En).

Let clauses in sequences

A let clause like expression = expression substantial_seq in a sequence (seq) stores the result of the second expression evaluated in the current binding, to the result of the first expression evaluated in a new binding, then evaluates the subsequent sequence in the new binding. For example, the function foo in the following program firstly stores the function {(:X :Y) X + Y } to the variable add, secondly stores the result of add to the variable Sum, then prints out Sum.

:fun <- {
  :add = {(:X :Y) X + Y }
  :Sum = add(10 20)
  stdout.print_line(Sum.show)  # => 30
}

Grammatically, let clause is a syntactic sugar for a combination of a function expression and a direct call. A let clause in the form E1 = E2 S, where E1 and E2 are expressions and S is a substantial sequence, is desugared to {(E1) S }!!(E2).

For example, the program above is desugared to the following program.

:fun <- {
  {(:add)
    {(:Sum)
      stdout.print_line(Sum.show)
    }!!(add(10 20))
  }!!({(:X :Y) X + Y })
}

Operator expressions

Operator expressions are syntactic sugar for attributional call expressions (attr_call).

Here is the list of operators.

Priority Token type Operators Associativity
Lowest store_op <- Non-associative
  logor_op || Right-associative
  logand_op && Right-associative
  relation_op ==, !=, <, >, <=, >= Non-associative
  add_op +, -, |, ^ Left-associative
  multiply_op *, /, //, %, &, <<, >> Left-associative
Highest unary_op -, !, ~ Unary

Each type of operator expressions is desugared as follows.

Operator expression Desugared form
X <- Y (X).op_store(Y)
X || Y op_logor(X { Y })
X && Y op_logand(X { Y })
X == Y (X).op_eq(Y)
X != Y op_lognot((X).op_eq(Y))
X < Y (X).op_lt(Y)
X > Y (Y).op_lt(X)
X <= Y op_lognot((Y).op_lt(X))
X >= Y op_lognot((X).op_lt(Y))
X + Y (X).op_add(Y)
X - Y (X).op_sub(Y)
X | Y (X).op_or(Y)
X ^ Y (X).op_xor(Y)
X * Y (X).op_mul(Y)
X / Y (X).op_div(Y)
X // Y (X).op_intdiv(Y)
X % Y (X).op_rem(Y)
X & Y (X).op_and(Y)
X << Y (X).op_shl(Y)
X >> Y (X).op_shr(Y)
- U (U).op_minus()
! U op_lognot(U)
~ U (U).op_not()

Function bodies

Formal receivers and formal arguments in function bodies (fun_body) are syntactic sugar for attributional call expressions (attr_call).

Function bodies are categorized in four groups: with/without formal receiver, and with/without formal arguments. The following table describes how each group of function bodies are desugared. In the table, C denotes a sequence (seq), E denotes an expression, and VB denotes a vector body (vec_body),

Formal receiver Formal arguments Original form Desugared form
Without Without C C
Without With (VB) C [VB].op_store(\binding._Args) (C)
With Without [E] C (E).op_store(\binding._Recv) (C)
With With [E](VB) C (E).op_store(\binding._Recv) [VB].op_store(\binding._Args) (C)

Therefore, fun1 and fun2 in the following example are equivalent.

:fun1 <- {[:Self](:X :Y :Z)
  'result'
}

:fun2 <- {
  :Self <- _Recv
  [:X :Y :Z] <- _Args
  'result'
}

Actual arguments vectors

Actual arguments vectors (args) are categorized in two groups: those with parentheses (paren_args) and without parentheses.

The following table describes how each group of actual arguments vectors are desugared. In the table, E1, E2, ... En denotes zero or more elements producers (elements_producer). FB1, FB2, ... FBm denotes function bodies (fun_body) of zero or more function arguments (fun_arg) in the original form. or zero or more functions (fun) in the desugared form.

Original form Desugared form
(E1 E2 ‹...snip...› En){ FB1 }{ FB2 }‹...snip...›{ FBm } (E1 E2 ‹...snip...› En { FB1 } { FB2 } ‹...snip...› { FBm })
{ FB1 }{ FB2 }‹...snip...›{ FBm } ({ FB1 } { FB2 } ‹...snip...› { FBm })

Therefore, in the following example, each pair of invocations of show, if and fold are mutually equivalent.

Num.show
Num.show()

if(Str.empty?){ stdout.print_line('empty') }
if(Str.empty? { stdout.print_line('empty') })

Vec.fold(0){(:X :Y) X + Y}
Vec.fold(0 {(:X :Y) X + Y})

Local dereference expressions

Local dereference expressions (local_deref) are syntactic sugar for attributional dereference expressions (attr_deref) of the local binding. The local dereference expression in the form Noun is desugared to \binding.Noun. The local dereference expression in the form $verb is desugared to \binding$verb.

In the following example, fun1 and fun2 are equivalent, thus the invocations of both functions output 42 to the standard output.

:fun1 <- {(:Val :output)
  :output_copy = $output
  output_copy(Val)
}
fun1(42 {(:X) stdout.print_line(X.show) })  # => 42

:fun2 <- {(:Val :output)
  :output_copy = \binding$output
  output_copy(\binding.Val)
}
fun2(42 {(:X) stdout.print_line(\binding.X.show) })  # => 42

Local reference expressions

Local reference expressions (local_ref) are syntactic sugar for attributional reference expressions (attr_ref) of the local binding. The local reference expression in the form :Noun is desugared to \binding:Noun. The local reference expression in the form :verb is desugared to \binding:verb.

In the following example, fun1 and fun2 are equivalent, thus the invocations of both functions output 42 to the standard output.

:fun1 <- {
  :Val = 42
  :output = {(:X) stdout.print_line(X.show) }
  output(Val)
}
fun1  # => 42

\binding:fun2 <- {
  \binding:Val = 42
  \binding:output = {(\binding:X) stdout.print_line(X.show) }
  output(Val)
}
fun2  # => 42

Local call expressions

Local call expressions (local_call) are syntactic sugar for attributional call expressions (attr_call) of functions owned by the local binding.

Desugaring of local call expressions must occur after desugaring of actual argument vectors.

The local call expression in the form fun(VB) is desugared to \binding.fun[()](VB), where VB is a vector body.

The local call expression in the form fun[R](VB) is desugared to \binding.fun[R](VB), where R is an expression and VB is a vector body.

In the following example, fun1 and fun2 are equivalent, thus the invocations of both functions output hello world to the standard output.

:print_arg <- {(:Arg)
  stdout.print_line(Arg.show)
}

:print_recv <- {[:R]
  stdout.print_line(R.show)
}

:fun1 <- {
  print_arg('hello ')
  print_recv['world']
}
fun1  # => hello world

:fun2 <- {
  \binding.print_arg[()]('hello ')
  \binding.print_recv['world']()
}
fun2  # => hello world

2.3.2. Translation

This section defines translation as transformation from a desugared syntax tree to a KSM procedure.

Programs

In the desugared form, a program is composed of a single parentheses expression..

Translation of a program P to a KSM procedure tprogram(P) is defined as follows.

Composition of P Where tprogram(P)
T T : paren { «tparen(T)» }

Sequences without let clauses

After desugaring, thus without let clauses, a sequence (seq) is a sequence of zero ore more expressions. If the sequence consists of no expressions, it produces the value () (called “nada”). If the sequence consists of one or more expressions, it evaluates the expressions in the order they appears, and produces the result of the last expression.

Grammatically, a sequence is composed of either empty or a substantial sequence (substantial_seq). A substantial sequence is composed of a single expression, or a pair of an expression and a substantial sequence.

Translation of a sequence S to an instruction sequence tseq(S) is defined as follows.

Composition of S Where tseq(S)
: empty nada
SS SS : substantial_seq «tsubstantial_seq(SS)»

Translation of a substantial sequence SS to an instruction sequence tsubstantial_seq(SS) is defined as follows.

Composition of SS Where tsubstantial_seq(SS)
E E : expression «texpression(E)»
E SSS «texpression(E)»; remove; «tsubstantial_seq(SSS)»

In the following example, fun1 outputs the result of an empty sequence, which is nada. fun2 outputs the result of a sequence comprising three expressions 'foo', 'bar' and 'baz'; the result is 'baz'.

:fun1 <- {
  :Result = ()
  stdout.print_line(Result.repr)  # => nada
}
fun1

:fun2 <- {
  :Result = ('foo' 'bar' 'baz')
  stdout.print_line(Result.repr)  # => "baz"
}
fun2

Expressions

In the normalized form, an expression can be eventually reduced to a single primary expression (primary).

Therefore, translation of an expression E to an instruction sequence texpression(E) is defined as follows.

Composition of E Where texpression(E)
P P : primary «tprimary(P)»

Primary expressions

Translation of a primary expression P to an instruction sequence tprimary(P) is defined as follows.

Composition of P Where tprimary(P)
N N : num «tnum(N)»
S S : str «tstr(S)»
CE CE : context_binding «tcontext_binding(CE)»
P P : paren «tparen(P)»
V V : vec «tvec(V)»
F F : fun «tfun(F)»
D D : attr_deref «tattr_deref(D)»
R R : attr_ref «tattr_ref(R)»
AC AC : attr_call «tattr_call(AC)»
DC DC : direct_call «tdirect_call(DC)»

Number expressions

A number expression (num) produces a number value.

Translation of a number expression N to an instruction tnum(N) is defined as follows.

Composition of N Where tnum(N)
NN NN : NUM num «NN»

In the following example, fun1 outputs the result of the integer number expression 42. fun2 outputs the result of the decimal number expression 3.14.

:fun1 <- {
  :Result = 42
  stdout.print_line(Result.show)  # => 42
}
fun1

:fun2 <- {
  :Result = 3.14
  stdout.print_line(Result.show)  # => 3.14
}
fun2

String expressions

A string expression (str) produces a string value.

Translation of a string expression S to an instruction tstr(S) is defined as follows.

Composition of S Where tstr(S)
SS SS : STRING str «SS»

The following example outputs the result of the string expression 'foo'.

:Result <- 'foo'
stdout.print_line(Result)  # => foo

Context binding expressions

A context binding expression produces the current context binding.

Translation of a current context binding expression CE to an instruction tcontext_binding(CE) is defined as follows.

Composition of CE tcontext_binding(CE)
'\binding' binding

The following example outputs the value of the variable Val of the current context binding, namely, the local variable Val.

:Val <- 42
stdout.print_line(\binding.Val.show)  # => 42

Parentheses expressions

A pair of parentheses is used to enclose a sequence as a primary expression.

Translation of a parentheses expression P to an instruction sequence tparen(P) is defined as follows.

Composition of P Where tparen(P)
OPENPAREN S ')' S : seq «tseq(S)»
WS_OPENPAREN S ')' S : seq «tseq(S)»

In the following example, the addition 2 + 4 is calculated before the mulliplication.

:Val <- 7 * (2 + 4)
stdout.print_line(Val.show)  # => 42

Vector expressions

A vector expression is used to make a vector.

Translation of a vector expression V to an instruction sequence tvec(V) is defined as follows.

Composition of V Where tvec(V)
OPENBRACKET VB ']' VB : vec_body emptyvec; «tvec_body(VB)»
WS_OPENBRACKET VB ']' VB : vec_body emptyvec; «tvec_body(VB)»

Vector bodies

A vector body (vec_body) is used in vector expressions and actual arguments in function calls (and formal arguments in not normalized forms) to produce elements of vectors.

Translation of a vector body VB to an instruction sequence tvec_body(VB) is defined as follows.

Composition of VB Where tvec_body(VB)
: empty /* nop */
EP VVB «telements_producer(EP)»; «tvec_body(VVB)»

In the following example, the vector Spreaded is expanded in the second vector literal.

:Spreaded <- [42 3.14]
:Vec <- ['foo' ...Spreaded 'bar']
stdout.print_line(Vec.repr)  # => ["foo" 42 3.14 "bar"]

In the following example, the vector Spreaded is spreaded in the arguments vector for the function call of fun.

:fun <- {
  stdout.print_line(_Args.repr)
}
:Spreaded <- [42 3.14]
fun('foo' ...Spreaded 'bar')  # => ["foo" 42 3.14 "bar"]

Elements producers

An elements producer (elements_producer) is a content of vector bodies.

Translation of an element producer EP to an instruction sequence telements_producer(EP) is defined as follows.

Composition of EP Where telements_producer(EP)
E E : expression «texpression(E)»; add
'...' E E : expression «texpression(E)»; concat

Function expressions

A function expression (fun) produces a function value.

Translation of a function expression F to an instruction sequence tfun(F) is defined as follows.

Composition of F Where tfun(F)
OPENBRACE S '}' S : seq binding; fun { «tseq(S)» }
WS_OPENBRACE S '}' S : seq binding; fun { «tseq(S)» }

Attributional dereference expressions

An attributional dereference expression (attr_deref) dereferences a variable of the owner.

Translation of an attributional dereference expression D to an instruction sequence tattr_deref(D) is defined as follows.

Composition of D Where tattr_deref(D)
P '.' N
N : NOUN
«tprimary(P)»; deref «N»
P DOLLAR V
V : VERB
«tprimary(E)»; deref «V»

Attributional reference expressions

An attributional reference expression (attr_ref) produces a variable reference of the owner.

Translation of an attributional reference expression R to an instruction sequence tattr_ref(R) is defined as follows.

Composition of R Where tattr_ref(R)
P COLON N
N : NOUN
«tprimary(P)»; varref «N»
P COLON V
V : VERB
«tprimary(P)»; varref «V»

Attributional call expressions

An attributional call expression (attr_call) invokes a function.

Translation of an attributional call expression C to an instruction sequence tattr_call(C) is defined as follows.

Composition of C Where tattr_call(C)
P '.' V OPENBRACKET R ']' OPENPAREN VB ')'
V : VERB
«tprimary(P)»; deref «V»; «texpression(R)»; emptyvec; «tvec_body(VB)»; call
P '.' V OPENPAREN VB ')'
V : VERB
«tprimary(P)»; dup; deref «V»; flip; emptyvec; «tvec_body(VB)»; call

Direct call expressions

A direct call expression (direct_call) invokes a function placed on the left side of !!.

Translation of a direct call expression C to an instruction sequence tdirect_call(C) is defined as follows.

Composition of C Where tdirect_call(C)
P '!!' OPENBRACKET R ']' OPENPAREN VB ')'
«tprimary(P)»; «texpression(R)»; emptyvec; «tvec_body(VB)»; call
P '!!' OPENPAREN VB ')'
«tprimary(P)»; nada; emptyvec; «tvec_body(VB)»; call