Kap for APL’ers

Introduction

Kap is very similar to APL in several aspects. So much so that an APL’er may be convinced that it is a full APL. If they do, they will be surprised when certain things does not behave the way they expect. The purpose of this document is to explain how Kap works from the perspective of a developer who already knows APL.

The different features are listed here in no particular order.

Design considerations

The original intent behind the design of Kap was twofold:

Create a practical implementation using lazy evaluation in order to be able to avoid excessive array copying
Redesign some aspects of the language without having to worry about backwards compatibility

Design consideration: Lazy evaluation

When performing a computation in Kap, the result is not computed immediately. Instead, an object is returned that represents the result of te computation. This will happen whenever it is possible to determine the dimensions of the result without computing content of the individual values. The values will only be computed once a value is needed, or when the result is assigned to a variable. More information can be found in the section on Lazy evaluation.

Redesign of language features

Some features of APL can be argued to be less than optimal, but has remained as-is in most versions in order to stay backwards compatible. Since Kap does not aim for full APL compatibility, these features have been changed or replaced with alternative features. These include:

Removal of magic numbers. This includes removal of the circle functions, replacing them with explicit functions.
Tradfns with goto is not implemented, instead regular structured programming is supported, with syntax similar to Java.
The language can be parsed prior to evaluation. The parser needs to be a simple left-to-right parser. This has caused to changes to be made in how operators are parsed, for example.
Tacit expressions are made more simple. In particular, the fork is now using explicit syntax which frees up trains to be evaluated in a simple right-left-manner. This makes them much easier to read.

Syntax

Multiline input

If the final character of a line is the backquote character, then the newline is ignored and parsing continues with the next one:

a ← 1 2 3 `
    4 5 6

Note that lines cannot be broken in the middle of a symbol. In other words, the newline is interpreted as a space.

Parsing operators

Dyadic operators that can take either a function or a value as a right argument causes problems during parsing. Kap supports two of these operators: ⍤ and ⍣. Different APL versions have different ways of dealing with this depending on the type of parser used. Kap uses a single token lookahead left-to-right parser, which makes this form very difficult to parse.

Due to this parsing issue, Kap requires the programmer to enclose the operator arguments in parentheses regardless of whether the arguments are functions or values.

a (fn⍣2) b
a (fn0⍣fn1) b

The parentheses are not required for operators that only support function arguments.

Namespaces

Every symbol in Kap belongs to a namespace. Unless changed, the default namespace after starting the Kap REPL is default. The default namespace can be changed using the use keyword. The default namespace is the namespace where new symbols are interned if they cannot be found in any of the other namespaces in the search path. After starting, the only namespace in the search path is kap which is where all the standard symbols are located. Other namespaces can also be included using the import statement.

Symbols in other namespaces can be specified by prefixing them with the name of the namespace followed by a colon, such as unicode:toCodepoints.

Symbols with an empty namespace, :abc, could also be specified as keyword:foo. As keywords are the most common use of symbols, this namespace is assumed by default if no namespace is given.

Datatypes

Numbers

In contrast to APL, Kap makes the difference between rational numbers and floating point numbers explicit. If a number is written with a decimal point (i.e. 123.4), it is a floating point number (64-bit double).

If the number does not contain a decimal point (such as 123456), it is parsed as an integer (stored as a bigint if it doesn’t fit in a 64-bit register).

Mathematical operations on rationals gives a rational result if possible, otherwise it’s converted to floating point before the computation takes place. A floating point number is never converted to a rational number unless explicitly requested using monadic ⌊ or ⌈.

The real and imaginary components of complex numbers are always floating point.

Please refer to the reference documentation for more information on how to enter numbers in Kap source.

Characters

All characters are unicode codepoints. Strings are sequences of codepoints, which means that strings are using UTF-32 encoding. The functions unicode:toCodepoints and unicode:fromCodepoints can be used to convert numbers to and from character values respectively.

Note that in Unicode, a single codepoint does not necessarily mean a single character. Unicode uses the term “grapheme cluster” to describe a single unit (what most people would refer to as a character). Kap provides the function unicode:toGraphemes to convert a string into an array of graphemes.

Strings are specified using double quotes rather than single quotes in APL. In Kap, a string containing a single character is simply "a". This is different compared to APL where a single-character string has a special case where it represents just the character. In APL, to type a single-character string, one has to type ,'a'.

To specify a single character in Kap, use the @ symbol, followed by the character. Thus, the following two lines both specify the same string:

"abc"
@a @b @c

If the @ symbol is followed by a \, and at least one more character, the following rules apply:

@\n — newline (U+000A)
@\r — carriage return (U+000D)
@\e — escape (U+001B)
@\\ — backslash \
@\0 — the NUL character (unicode value 0)
@\uNNNN — explicit Unicode codepoint in hex. For example: @\u905 for अ
@\UNICODE_CHARACTER — Unicode name. For example: @\LATIN_SMALL_LETTER_A_WITH_DIAERESIS for ä. Note that this syntax is not available in the Javascript backend as this would require the use of extra external files.

Symbols

Symbols are first-class objects in Kap. They work similar to Lisp in that they are unique objects identified by their name. Symbols in the keyword namespace are special in that they always evaluate to themselves and as such are useful for things like hash keys. To obtain the symbol itself instead of its value, they are prefixed by a quote:

a ← 'foo     ⍝ a now contains the symbol foo itself rather than the value of the variable
b ← :abc     ⍝ b contains the keyword abc, no need to use ' here

Maps

Maps are a separate datatypes in Kap. They are immutable, and updating a map returns a new instance with the requested change applied.

A map is created using the function map:with. It accepts either a 2-element array with the key and value of a single element, or a 2-dimensional array with 2 columns containing key/value pairs:

⍝ Create a map with a single element mapping foo to bar
a ← map:with "foo" "bar"

⍝ Create a map with three elements:
b ← map:with 3 2 ⍴ `
    "foo" "value is a string" `
    "test" 1 `
    "test2" 2

The keys in a map can be anything, not just strings. This includes arrays, numbers and symbols. In fact, the most useful type of key is likely the keyword:

c ← map:with :foo "bar"

The benefit of using keywords as keys is that member checks are done by identity and does not require iterating over each element in a string. This makes map lookups much faster.

Elements from an array are accessed using syntax similar to array dereferencing, or the map:get function:

    b["foo"]
"value is a string"
    b map:get "foo"
"value is a string"

Maps can be manipulated using the functions map:with and map:remove:

    b ← b map:with "a" "b"
    b["a"]
"a"
    b ← b map:remove ⊂"a"
    b["a"]
⍬

List

The list is a scalar datatype that wraps a fixed set of values. It can be seen as a generic n-tuple. The syntax for lists are a number of values separated by ;. The most common use of lists are as arguments to array lookup as well as supporting multiple arguments to functions. Note that ; binds looser than regular function calls, so in most cases the list needs to be enclosed in parentheses in order to be used as a single object.

The functions toList and fromList can be used to convert between lists and vectors.

    a ← (1 ; 2 ; 3)
list
    fromList a
┌→────┐
│1 2 3│
└─────┘

Array indexing

Negative indexes

In most places where array positions are specified using an integer index, a negative value can be used to index from the end of the given axis, where ¯1 refers to the last element, and -≢A refers to the first element.

Dereference operation

Since the outer product uses the symbol ∙, this frees up the period (.) for the composite object dereference operation. The reference documentation describes this in detail, but in short, one can use it to look up values in hashmaps, arrays and n-tuples:

a ← map:with 'foo 1 'bar (10 20 30)
a.bar.(1)

The above expression returns the value 20.

Tacit programming

Kap supports a modified version of tacit programming as included in Dyalog. The most significant difference between Kap and APL in this regard is that Kap does not implement forks in the form of a 3-train, and instead uses dedicated symbols for this purpose.

The following tacit programming structures exist:

2-chain

A sequence of two functions next to each other are executed in the same manner as a train in APL:

x (AB) y is evaluated as A x B y
(AB) y is evaluated as A B y

Since Kap does not implement APL-style forks, this expands to any number of functions in a train. In other words:

x (ABCD) y is evaluated as A B C x D y

Fork

The fork is specified using « and ». It has the following form:

x A«B»C y is evaluated as (x A y) B (x C y)
A«B»C y is evaluated as (A y) B (C y)

Compositions

The compose operator works the same as APL when called dyadically, but its monadic version is different.

x A∘B y is evaluated as x A (B y)
A∘B y is evaluated as y A (B y)

The inverse of compose is also available:

x A⍛B y is evaluated as (A x) B y
A⍛B y is evaluated as (A y) B y

Over

The over operator derives a function which, when called dyadically, calls the right function on both arguments individually and then calls the left function on the results. In other words, this operator can be thought of processing the arguments using A before acting on it using B.

x A⍥B y is evaluated as (B x) A (B y)
A⍥B y is evaluated as A B y

Left-bound functions

A left-bound function derives a monadic function from a dyadic function by assigning a constant to the left argument. For example, 2+ is a derived function that adds 2 to its argument. This functionality is particularly useful in trains. The following is a function that divides the argument by 2 and then adds 1: 1+2÷⍨. Example:

    A ⇐ 1+2÷⍨
    A 10
6

In APL, the same code would use the ∘ operator to bind the left argument to the function. This syntax is not possible in Kap since operators in Kap requires the left argument to be a function.

Functional inverses

Some dialects of APL supports functional inverses. These dialects implement this using a right argument of ¯1 to ⍣. Kap uses the symbol ˝ for this instead. In other words, the APL expression a (+⍣¯1) b is written in Kap like so: a +˝ b.

Differences in standard functions

Enclose and disclose: `⊂`, `⊃`

In Kap, the ⊂ and ⊃ functions follow the APL2 style, based on an assumption that it is more consistent than the style used by for example Dyalog. The function ⊂ encloses the value in a scalar wrapper, and ⊃ undoes this operation, returning the contained value.

    ⊂ "foo"
┌─────┐
│"Foo"│
└─────┘
    ⊃ ⊂ "foo"
"foo"

If ⊃ is called on an array, it performs the “mix” operation:

    ⊃ (1 2 3 4) (5 6 7 8)
┌→──────┐
↓1 2 3 4│
│5 6 7 8│
└───────┘

Take and drop: `↑`, `↓`

The ↑ and ↓ operations are consistently representing the take and drop functions. ↑ always takes some number of values from the beginning or end of the array, while ↓ removes the same values:

    ↑ 1 2 3 4
1
    3 ↑ ⍳10
┌→────┐
│0 1 2│
└─────┘
    ↓ 1 2 3 4
┌→────┐
│2 3 4│
└─────┘
    7 ↓ ⍳10
┌→────┐
│7 8 9│
└─────┘

Convert to string: `⍕`

The format function is currently much less capable compared to APL. It’s currently only used to format a value to a string:

    ⍕2
"2"

Parse string as number: `⍎`

Kap currently does not support eval. The eval symbol is instead used to parse a string as a number:

    ⍎"432"
432

Outer product

Outer product uses the monadic operator ⌻ instead of ∘.. For example, the APL expression ∘.+ is expressed as +⌻ in Kap.

Inner product

The inner product uses the symbol ∙ rather than a period.

Key

In Dyalog APL, the operator ⌸ (Key) is available. This operator is also present in Kap, albeit with slightly different behaviour. Dyalog calls the underlying function dyadically with the key as its left argument and an array of corresponding values as the right argument. The return values collected in an array and the disclose function is implicitly called.

In Kap, the underlying function is called monadically, with only an array of the values as its argument. The results are collected into a 2-column array with the keys in the first column and the return value of the function as the second column.

Dyalog behaviour can be emulated using the following custom operator:

∇ a (f dyalogKey) b {
    (⍞f/⍤1) a ⊢⌸ b
}

Maths functions

In APL, a lot of maths functions are provided via the ○ function. The left argument is a number specifying the operation and the right argument is the value on which the function should work. The ○ function is not available in Kap, and instead these functions are given regular names and placed in the math namespace. The currently implemented functions include:

For a list of available maths functions, see the reference documentation.

Function declarations

Both APL and Kap has two ways of declaring functions, either tradfns or using dfns.

In Kap, functions that are defined using the tradfn style are global functions, while dfns are local to the current lexical context.

Tradfn

In APL, the original method uses ∇ and declares a function that allows you to use flow control using →. The following is an example of an APL tradfn:

∇ R←A foo B
  ⎕←'This function returns 10 plus the sum of A and B'
  R←1+A+B
∇

Kap provides a similar form. The corresponding version looks like this:

∇ A foo B {
  io:println "This function returns 10 plus the sum of A and B"
  1+A+B
}

The main differences here are:

The code is enclosed between { and }. This is to make code blocks consistent across all uses.
In Kap, the function returns the last value that was evaluated. In APL the return value is assigned to a special variable.
Kap does not support the use of goto for flow control (please see the separate section on flow control for alternative solutions).

Functions defined using this style are global, and after declaration they can be accessible from any part of a program.

Dfns style

Defining a dfn in Kap is similar to APL. The only visible difference is the use of ⇐ instead of ←. The reason for this difference is because ⇐ is processed at parse time, while ← represents a runtime assignment to a variable. As these are vastly different types of operations, different symbols are used to represent these operations.

foo ⇐ { ⍵+1 }

Multiple arguments

Multiple arguments are passed to Kap functions as lists. The tradfn syntax allows for declaring a function as accepting multiple arguments which are then automatically destructured when the function is called.

∇ foo (a;b) {
  io:println "Argument 1: ",a
  io:println "Argument 2: ",b
}

The function can then be called as:

foo (1;2)

Parse-time vs. evaluation-time

In APL, a function declared using ← takes effect immediately. Thus, the following expression is valid in APL:

a ← { b ⍵+10 }
b ← { ⍵+1 }
a 1  ⍝ This will print 12

The corresponding code in Kap will not work, because at the time where the definition of a happens, b is not yet declared and the following error will be displayed when a is called on the last line: Variable not assigned: default:b. This error may seem confusing until one notes that when the first line was parsed, b was assumed to be a variable, and this variable indeed does not have a value.

This difference is important when coming from APL. During parsing, Kap needs to know whether a symbol represents a function, an operator or a value. Any undefined symbols are assumed to be values.

Flow control

Kap provides flow control structures that are similar to traditional programming languages. These are described in more detail in the tutorial, and are therefore only listed here briefly:

Returning values from functions

The function → is used to return values from functions. It is a regular function which causes the innermost function, with the return value being the argument.

When called dyadically, the return will happen only if the left argument is true.

if statements

The following adds 1 to either c or d depending on a:

a ← 1 + if (b) { c } else { d }

when statement

The when statement is used as an alternative to series of if and else. The following sets a to be the value of some variable, or returns a message if all conditions failed.

a ← when {
  (b=1) { c }
  (b=2) { d }
  (b=3) { e }
  (1)   { "All comparisons were false" }
}

while loop

i ← 0
while (i < 5) {
  io:println "Number: ",⍕i
  i ← i+1
}

Lambda functions

Kap provides support for first-class functions. A first-class function is a function that can be processed like a value. They can be placed in arrays, and returned from functions. To convert a function into a value, the symbol λ is used:

q ← λ{⍵+1}
w ← λ+

To call a function from a value, use the symbol ⍞, called the “apply” operation. Note that while it may look like a function, it’s actually special syntax which processes only the next element (either a symbol or an expression inside parens) after the apply symbol itself.

    ⍞q 10
11

Lambda functions capture the local environment where they were applied:

∇ makeCounter start {
    currentValue ← start
    λ{currentValue ← currentValue+1}
}

This function can be used as shown below. The argument 1 to the function is a no-op which is needed as there is no way to call a function with no parameters. A more general way to handle this will be introduced at a later time, once the best way to do this has been decided on.

    a ← makeCounter 0
function
    ⍞a 1
1
    ⍞a 1
2

Lazy evaluation

The idea behind lazy evaluation is the observation that expressions such as ⌊0.5+foo on a large array results in the construction of an intermediate array (the sum of 0.5 and the array) and then a second array containing the result of the rounding.

Various approaches have been taken by implementations in the past to overcome this, including compilers which are capable of optimising such expressions or interpreters that detect specific sequences of tokens and replaces them with faster versions.

Lazy evaluation is another way to solve this problem. When Kap executes the above expression, 0.5+foo does not actually perform the computation, but instead returns an object that represents the future computation of this expression. Then, ⌊ returns another lazy result, representing the entire computation. The actual results will not be computed until they are needed.

This means that if not all results are needed, they don’t have to be computed, making many straightforward APL expressions that used to be simple but slow, actually fast. Also, when the values are computed, no temporary arrays are created, which improves performance.

Note that this behaviour is usually resulting in better performance, however there are cases where it slows things down because some results are computed multiple times. If that happens, the function comp can be used for force the evaluation of a lazy value.

The ¨ operator is the one that has the capability of creating the most surprises as it will defer the evaluation of the function until a possibly much later time. An example follows:

    ↑ {(1+⍵) ⊣ io:println "⍵ = ",⍕⍵}¨ 1 2 3
⍵ = 1
2

Since only the first value of the result was taken, the function was only evaluated once with the first element in the list as argument.

Here, the comp function is used to collapse the result before taking the first value.

    ↑ comp {(1+⍵) ⊣ io:println "⍵ = ",⍕⍵}¨ 1 2 3
⍵ = 1
⍵ = 2
⍵ = 3
2

When assigning a value to a variable, a call to comp is always implicitly performed. In other words, lazy values cannot be stored in variables.

Additionally, if a standalone expression is evaluated and the result is not used, all results are collapsed before being discarded. This is so that expressions that are solely evaluated for their side effects work as expected.