**LSON: Lucid Serialized Object Notation** v0.2.2 / 2020-06-26 Overview ==================================================================================================== LSON is a data representation that aims for simplicity and expressiveness of JSON, but differs in the following ways: + It's intended to be both concise and readable by humans as well as computers. It supports comments. Items are optionally terminated by whitespace, end delimiters, commas, or semi-colons. + It does not aim to mirror JavaScript, and thus is not a JavaScript subset. At the same time, LSON is a superset of JSON: any legal JSON file is legal LSON. + LSON is focused on data _representation_, not data _usage_. With the single exception of string values, there is no intrinsic support for numbers, boolean, or any other primitive type. + LSON supports arbitrary _elements_: domain-specific data values with declared or unknown type. Elements provide support for arbitrary domain-specific values, such as `true`, `null`, `infinity`, `2018-07-02`, `#6b17ec`, `0x1138`, `(x,a,b) => { a <= x && x <= b }` and so forth. LSON encoders and decoders handle both known and unknown types in a consistent and predictable manner. + LSON supports four intrinsic data structures: - array - dictionary (a set of name-value pairs) - table - graph LSON Example ==================================================================================================== The following is an LSON snippet to illustrate various aspects of the notation, before we dive deeper: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON // Comments are C-style: double slash to end of line, or enclosed with `/*` and `*/`. /* This is an example using slash-star delimeters. */ // Items may be terminated with whitespace, commas, semi-colons, or object/array terminators. { index: { 'Gloss Div': { // There are six legal string-delimeter pairs. title: "S" // No need to quote strings that lack whitespace. "Gloss List": { `Gloss Entry`: { ID: x112-223 SortAs: SGML Acronym: SGML «Gloss Term»: "Standard Generalized Markup Language" Abbrev: (ISO: 8879:1986) // Element of some type "ISO", value "8879:1986" ‘Gloss Def’: { para: "A meta-markup language, used to create markup languages " + "such as DocBook." “Gloss SeeAlso”: [ 'GML', 'XML', 'HTML' ] 'Gloss See': "markup" } } } } EntryCount: (count32:1123) // Element of some type "count32", value "1123" // Table Content: [# [ Term ; Pages ; 'See Also' ] //--------------------;-----------------;------------------; "ABC Dry-Clean Pad" ; (30) ; null ; "Abstract grids" ; (46-58, 92-104) ; "Gridded Layout" ; "Abstract ideas" ; (37-38, 74-77) ; null ; "Acceleration" ; (408-409) ; Velocity ; // ... #] } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ File Structure ==================================================================================================== Whitespace ---------------------------------------------------------------------------------------------------- LSON whitespace includes all standard [Unicode whitespace][] characters. | Unicode | Escape | Description |:-------:|:----------:|:----------- | U+0009 | `\t` | Tab | U+000a | `\n` | Newline, or line feed | U+000b | `\u{0b}` | Vertical tab | U+000c | `\f` | Form feed | U+000d | `\r` | Carriage return | U+0020 | `\u{20}` | Standard space character | U+0085 | `\u{85}` | Next line | U+00a0 | `\u{a0}` | No-break space | U+1680 | `\u{1680}` | Ogham space mark | U+2000 | `\u{2000}` | En quad | U+2001 | `\u{2001}` | Em quad (mutton quad) | U+2002 | `\u{2002}` | En space (nut) | U+2003 | `\u{2003}` | Em space (mutton) | U+2004 | `\u{2004}` | Three-per-em-space (thick space) | U+2005 | `\u{2005}` | Four-per-em-space (mid space) | U+2006 | `\u{2006}` | Six-per-em-space | U+2007 | `\u{2007}` | Figure space | U+2008 | `\u{2008}` | Punctuation space | U+2009 | `\u{2009}` | Thin space | U+200a | `\u{200a}` | Hair space | U+2028 | `\u{2028}` | Line separator | U+2029 | `\u{2029}` | Paragraph separator | U+202f | `\u{202f}` | Narrow no-break space | U+205f | `\u{205f}` | Medium mathematical space | U+3000 | `\u{3000}` | Ideographic space Terminators ---------------------------------------------------------------------------------------------------- Each item in a sequence must be terminated with whitespace, a closing delimeter of some kind, or with a comma (`,`) or semi-colon (`;`). Though not technically whitespace, commas and semi-colons serve the same role in separating values, and are treated equivalently. Since commas and semi-colons are parsed as whitespace, they are not interpreted in any syntactically meaningful way. For example, the sequence `a,,,,b,,c` is interpreted as three items `a`, `b`, and `c`. Terminating values with commas or semi-colons is completely optional, and is supported only as an aid to readability, at the discretion of the author. Comments ---------------------------------------------------------------------------------------------------- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON // Single line comments run from double forward slashes to end of line. /* Slash-star comments: this is probably the best form for block comments. */ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Data Values ==================================================================================================== Strings ---------------------------------------------------------------------------------------------------- Strings are the only natively-supported element type (more on LSON elements later). In addition to standard double quotes, strings may be quoted with any of five additional quote delimiter pairs. This provides a clean way to avoid the necessity of escaping delimiters in most complex strings. | Quotes | Character Codes | Description | |:------------:|:---------------:|:--------------------------------------------------| | `"string"` | U+0022 | Quotation Mark | | `'string'` | U+0027 | Apostrophe | | `“string”` | U+201c, U+201d | {Left,Right} Double Quotation Mark | | `‘string’` | U+2018, U+2019 | {Left,Right} Single Quotation Mark | | `«string»` | U+00ab, U+00bb | {Left,Right}-Pointing Double Angle Quotation Mark | |````string````| U+0060 | Backtick – **raw strings only** | The last form, using backticks, expresses a _raw string_. Raw strings are interpreted literally, with no processing of escape sequences (see the following section on escape sequences). The one exception is the escape `\````, which may be used to specify a backtick within a raw string, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON `This is a \`test\`\n of the emergency\n broadcast system. \u26a0` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ which would yield the following string: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON This is a `test`\n of the emergency\n broadcast system. \u26a0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Escape Sequences In non-raw strings, escape sequences are processed as follows: | Sequence | Description | |:----------------|:------------------------------------------------------| | `\b` | Backspace | | `\f` | Form feed | | `\n` | New line | | `\r` | Carriage return | | `\t` | Horizontal tab | | `\uXXXX` | Unicode character from four hexadecimal digits | | `\uXXany` | Not a legal Unicode escape; yields `uXXany` | | `\u{X...}` | Unicode character from 1-8 hexadecimal digits | | `\u{}` | Not a legal Unicode escape; yields `u{}` | | `\u{123456790}` | Not a legal Unicode escape; yields `u{1234567890}` | | `\X` | Yields any character verbatim, such as `\'` or `\\` | ### String Concatenation In order to support human-readable long strings, the `+` operator may be used to construct concatenations. For example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { strBlock: "Knock knock.\n" + "Who's there?\n" + "Bug in your state machine.\n" + "Who's there?\n" } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Strings may include the source line terminators. However, this will capture exactly the actual line endings used in the LSON source. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { strBlock: " Knock knock. Who's there? Bug in your state machine. Who's there? " } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The string defined in the above example includes all whitespace and line endings present between the opening and closing `"` characters. Elements ---------------------------------------------------------------------------------------------------- LSON really has a single value type: the _element_. Unlike JSON, any string-representable value is supported and handled consistently, but interpretation is up to the decoding application or context. Applications that do not handle a particular element type natively will process that value using its string representation, while preserving its (possibly unknown) type. The element type (which might be "unknown") is preserved when re-encoding after any transformation or transmission. ### Element Types Elements have two components: type and value. The element type is optional, and defaults to "unknown" if not specified. At its most basic, an element has the following syntax: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON ( type : value ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elements with a declared type may take several forms. For example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON (type:value) // Type "type", value "value" ("thing" : "xyzzy") // Type "thing", value "xyzzy" (color:#f863b2) // Type "color", value "#f863b2" ( float32 : 334.1 ) // Type "float32", value "334.1" (readyState: armed) // Type "readyState", value "armed" ( a b c : This is a test ) // Type "a b c", value "This is a test" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that leading and trailing whitespace is ignored, and not considered part of the type or value. However, both types and values may themselves _contain_ whitespace. Quoting can be used to preserve leading or trailing whitepace in types or values. Type IDs are case-insensitive, and followed by a colon. (Thus, type names must either be quoted or escape any contained colons.) Once the first colon is encountered, any subsequent colon is interpreted as part of the value. Types may be omitted. If the type is omitted, the colon itself may be present or omitted. The following are equivalent examples of an _untyped_ element, both with value `"a:b:c"`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON ("a:b:c") (:a:b:c) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The first colon after an element open parentheses is treated as terminating the element's type. All subsequent colons are considered part of the element value. For example, if a element has type `width:height` and value `150:400`, it could be expressed in any of the following ways: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON (width\:height: 150:400) ("width:height": 150:400) ('150:400') // Omitted type (150\:400) // Omitted type (:150:400) // Omitted type ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In general, either avoid or quote type names with colons. Untyped values with colons are easy enough to specify using the last form above for unspecified types, with a colon immediately following the opening parenthesis. ### Element Values As shown in examples above, the element value may be either unquoted or quoted in its entirety. Quoted element values obey the conventions outlined in [Strings][], using any of the six string delimiters. Because elements may contain values of foreign syntax, **LSON interprets any contained `)` character as the element terminator**. Consider the following (**erroneous**) example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON ( gronkScript: konst force ← "gravity(2.3)"; konst elapsed ← 1.223; ) // ERROR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As soon as the LSON parser encounters `konst`, it parses the element value as an unquoted string. In that mode, it will terminate the element value at `2.3`, _not_ at `1.223;`. Elements that might contain complex values should therefore either be quoted entirely, or contained in [Element Value Blocks][] (described below). ### The Null Element The special element `()` represents an empty, or null element. Null elements may also be typed, as in `(id:)`, or `(email:)`. ### Elements of Type String As pointed out earlier, strings are the only element type that LSON recognizes implicitly. Since strings are natively supported, string quotes are sufficient to recognize the element type ("string") and value (the quoted content). Thus, the following are all equivalent: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON (string:"This is a string.") // Fully-qualified element of type "string" (string: This is a string) // Value quotes optional when inside parentheses "This is a string" // Value recognized as type "string" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Element Value Blocks As noted above, an _element_ is bascially an arbitrary foreign syntactic structure. Most uses of LSON elements will be for simple values in other domains, as we've seen above. Some elements, however, might have quite a complex representation, both in syntax and in length. For example, it should be simple to embed a 250-line script inside LSON. In my experience, I've seen JSON inside a script inside JSON (not because it's good, but because it's necessary). To this end, elements may employ block delimiters, using a locally-unique identifier. Block-delimited elements begin with `((id` where `id` is an arbitrary string. As soon as the character sequence `id))` is encountered, the element is closed. Here's an example of a complex element with block delimiters: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON frotz: ((xyzzy python : db = MySQLdb.connect("localhost","username","password","dbname") cursor = db.cursor() sql = "select Column1,Column2 from Table1" cursor.execute(sql) results = cursor.fetchall() for row in results: print row[0]+row[1] db.close() xyzzy)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that the closing identifier must have the identical case as the opening identifier. For example, `GREEN))` matches `((GREEN`, but not `((Green`. Now consider the following (**erroneous**) LSON fragment: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON frotz: ((Klaatu blargScript : gargle("Hey, I have a ((Klaatu)) inside me!") Klaatu)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The element above terminates at `Klaatu))` inside the string, not at the last line (thus yielding a syntax error). This strict syntax is actually an advantage, because it leaves the element value free to use any arbitrary syntax, and LSON will dutifully accumulate the string representation of that element until it encounters the element block terminator. As another example of the syntactic freedom, here's a fragment that totally diverges from LSON syntax: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON frotz: ((barada niktoScript: Look! Unterminated string chars: " ' » ) ... wait, there's more ... ] } %> #> Zing! barada)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Safe and legal. As a final note, because element values encode a foreign syntax, language constructs such as linefeeds, whitespace, and comments are all interpreted **literally** as part of the element value. Thus: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON jimjam: ((block someScript: (1.2 / 3 * (25.6)) // I am not an LSON comment. block)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ is equivalent to: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON jimjam: (someScript: "\n (1.2 / 3 * (25.6)) // I am not an LSON comment.\n ") ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Decoding Elements All elements have a string representation of their value. In addition, for elements with declared type, decoders may use this information to generate a native value of that type. For example, the element `(boolean:true)` always has the string representation "true", and may have a decoder's native Boolean value `True`. Decoders are thus domain-specific, and may handle a mix of elements of both known and unknown types. This approach to typing allows unknown types to be handled consistently across encode-decode transitions, and across data queries and transforms. In this manner, a C++ decoder could meaningfully and consistently operate on LSON intended for a Python endpoint, with values like `False` or `None`. ### Untyped Elements Elements may omit type information, as in `(1.23456)` or `(:s/ab/xy/g)`. As for all elements, both of these cases have their string representation. However, decoders will typically be able to deduce the type of an element, according to a scheme of their choosing. For some element types and decoders, this can be fairly trivial: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON (nullptr) → (nullptr:nullptr) // C++ (true) → (bool:true) // C++ (true) → (Boolean:true) // JavaScript (true) → (Boolean:True) // Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A JSON-style decoder might employ a sequence of three recognizers: 1. null 2. boolean 3. number Other common types may have associated type recognizers: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON (-1.234e6) → (Number:-1.234e6) // JavaScript (-1.234e6) → (double:-1.234e6) // C++ (1..10) → (range:"range(1,10)") // Python (#a3f4b9) → (color:0xa3f4b9ff) // CSS color in C++ (0x3ff0'0000'0000'0000) ... // In C++, could be recognized as `uint64_t` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One reasonable approach for decoders is to maintain an ordered list of recognizers that are employed in sequence to attempt to recognize a given untyped value. Enumerations such as `true`, `false`, `null`, `None` and so forth would be first in the list, with more complicated types (like numbers, colors, regular expressions) tried later in the sequence. Note that values need only be recognized if the decoder intends to perform native operations with those values. Decoders that just perform queries, transforms, or transmission need not care about underlying implementations. By convention, elements that are consumed and unmodified should be preserved exactly across decode-encode transitions. For example, the element `(true)` might be interpreted by a Python decoder as element of type `Boolean`, value `True`. If it's not modified, however, any subsequent encoding should emit the original `(true)` form. That leaves it free for natural consumption by a C++ application, for example, while preserving the original meaning (an untyped element with value `"true"`). Bare Values ---------------------------------------------------------------------------------------------------- A _bare value_ is an element that is not surrounded by parentheses. If the bare value is string-quoted, then it is recognized as an element of type string. If the bare value is not quoted, it is considered an untyped element. Consider the following bare values: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { thing1: "true" // Element (string:true) thing2: 'two words' // Element (string:two words) thing3: true // Element (:true) thing4: 1.37 // Element (:1.37) thing5: plaid // Element (:plaid) thing6: red\ blue // Element (:red blue) thing7: (trog) // Element (:trog) thing8: (a + b) // Element (:a + b) } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Bare values that contain whitespace must either escape that whitespace (by prefixing with the `\` character) or must use parenthesis delimeters. Bare values are one of the key features of LSON. Whereas JSON supports a partial set of values (boolean, null, numbers), it lacks other values that would be equally useful in different contexts: hexadecimals, CSS colors, native primitive values (`None`, `Any`), and so on. The promotion of bare value to untyped element provides a succint way to express arbitrary values in different domains, while at the same time allowing for consistent treatment and handling of unrecognized types. Thus, LSON establishes a hard boundary between data _representation_ and data _usage_. On re-encoding, _an encoder should preserve original bare values as bare values_. For example, if a transforming program decodes the bare literal `cromulent`, then any subsequent encoding should emit `cromulent`, not `(cromulent)` or `(:cromulent)`. In situations where it makes sense, recognizable values such as `true` or `0x113f` should also be encoded as bare values, if these types are expected to be recognizable by the a subsequent decoder. ### Word → Element Promotion A specific application may choose to recognize and handle a select set of element types. In such a case, it is up to the application to specify the types handled, and the order in which they are recognized. For example, a standard JSON-type parser would handle the following ordered list of element types: 1. null (`null`) 2. boolean (`true`, `false`) 3. real numbers (`[-]*[digit]*[.digit*][[eE][+-]?digit+]`, or some such syntax) Other common parsers might support CSS values, ±infinity, and so forth. See [Element Types][] for a set of common element types. ### Bare Value Concatenation The concatenation operator always promotes words to strings, to produce a string-valued result. For example, the result of `0. + 123 + e10` would yield `(string:0.123e10)`, not `(0.123e10)`, which might get promoted to `(number:0.123e10)`. Data Structures ==================================================================================================== Arrays ---------------------------------------------------------------------------------------------------- Arrays encode ordered lists of items. They have the following properties: 1. They begin with a left square bracket (`[`, `U+005b`), followed by zero or more values, and terminated with a right square bracket (`]`, `U+005d`). 2. Each array element may be any LSON value (elements, strings, arrays, tables, or whatever). Elements may be any combination of types. 3. Each array value is terminated with whitespace, a comma, a semi-colon, or the array closing delimeter. 4. Arrays are contiguous. That is, there is no way in LSON to indicate an undefined element, or to specify a sparse array. If truly sparse arrays are desired for a particular encoding, it is recommended that dictionaries be used with numeric key values, or designate a special value to indicate an empty element. Interpretation of such an array would depend on a particular encoder/decoder. Dictionaries ---------------------------------------------------------------------------------------------------- Dictionaries are sets of key-value pairs. Keys are string values, and hence may be either quoted or unquoted. Dictionaries have the following syntax: 1. They begin with a left curly bracket (`{`, `U+007b`), followed by zero or more key-value pairs, followed by a right curly bracket (`}`, `U+007d`). 2. Each dictionary pair is an ID, followed by a colon, followed by the value. 3. Key-value pairs are terminated with whitespace, a comma, a semi-colon, or the dictionary closing delimeter. ### Multi-Keys Multiple keys may be assigned a single value using an array-like syntax: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { [ red orange yellow ]: true [ green cyan blue violet ]: false } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Key Redefinition The first definition of a key wins -- all subsequent definitions are ignored. It is recommended that parsers provide warnings to catch accidental key collisions. Tables ---------------------------------------------------------------------------------------------------- Tables (like CSV files) are 2D entities with multiple rows (points) of data, where each dimension has an associated label. Tables have the following properties: + Tables are delimited with `[#` and `#]` tokens (no whitespace is allowed between delimiter characters). + Each row feature may be any legal LSON value. So you could have cells of arrays, objects, graphs, or even other tables. The following is an example LSON table: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ key1 key2 key3] // Table Schema [thing1 false 3] // Table Rows [thing2 false 13] [thing3 true 37] #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this example, the table _schema_ consists of three _features_: `key1`, `key2`, and `key3`. The third value of the third row (`37`) is considered a _row value_. The fragment above uses brackets to delimit table rows, which can aid legibility and debugging. However, brackets are optional, and the same table could be expressed thus: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ key1 ; key2 ; key3 ] // --------;-------;------ thing1 ; false ; 3 // Schema has 3 features, so first 3 values = first row, thing2 ; false ; 13 // and next three values = second row, thing3 ; true ; 37 // and so on... #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the absence of row brackets, a row must contain a value for each feature. Brackets are optional for each row, so one row may use brackets while another may choose to omit them. Even when all values are supplied for each row, brackets may be useful as they provide a syntax check. As with objects and arrays, optional comma or semi-colon terminators may be used to aid readability, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [key1,key2,key3] thing1,false,3; thing2,false,13; thing3,true,37; #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Potential ambiguous sequences (brackets and hash characters) can usually be solved with whitespace, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [ #ff8cee #Nan# ] // An array with a CSS color and a special value; NOT a table. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Default Table Values Tables may optionally define default element types or element values (or both) for each feature. Defaults follow a colon (`:`) after the feature name, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [id:0 role:scout speed:10] ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When using bracketed tables, default values can be used to set missing values when fewer values are specified than there are columns. For example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ id status:idle ttl:120 ] [ a173 running 300 ] // id = a173, status=running, ttl=300 [ b2fc init ] // id = b2fc, status=init, ttl=120 (default) [ 781d ] // id = 781d, status=idle (default), ttl=120 (default) [ ] // Error: feature 'id' has no default value #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition, the special character `~` can be used in row data to specify the default value for that feature, allowing for sparse table specifications: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ id:0000 status:idle ttl:120 ] ~ stopped 0 // id = 0000, status=stopped, ttl=0 b2fc init ~ // id = b2fc, status=init, ttl=120 781d ~ 240 // id = 781d, status=idle, ttl=240 ~ running ~ // id = 0000, status=running, ttl=120 #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The above examples use bare values as the default. However, defaults can be any element, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ id:(count32:0) lat:(real:0.00) lon:(real:0.00) strength:(HCategory:1) ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Feature defaults can be any legal LSON value, so arrays, dictionaries, tables and graphs can also be used as default values: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ id:(count64:0) xform:[[1 0][1 0]] up:[0 1 0] meta:{label:"none",color:black} ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition, one can use a type element to set the default _element type_ of table features, while also requiring an explicit value of that type, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ id: (count32:) lat: (real:) long: (real:) strength: (HCat:) ] //____ ______ _______ _ [ 01ca -12.30 110.41 1 ] // (count32:01ca), (real:-12.30), (real:110.41), (HCat:1) [ 021s ~ 70.58 3 ] // Error: No default value for 'lat'. [ 9afb ] // Error: No default value for 'id', 'lat', 'lon'. [ 47b1 [1,4] 11.12 2 ] // Error: Feature 'lat' has elemental data type. ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A standard CSV-type table might default all values to the null element: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ invoice:() date:() customerid:() amount:() address:() ] ... #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Dictionary Table Rows In addition to unbracketed and bracketed table rows, LSON supports dictionary table rows. In this case, a row is delimited with curly braces (`{`, `}`), and the dictionary keys are the formal feature names specified in the table schema. Each row feature may get the default value either by omitting the key entirely, or by explicitly using the special value `~`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ a:true, b:1.0, c:"foo", d:none, e:normal, f:100%, g:[[1 0][0 1]] ] : { b:22.3, d:all } // Features a,c,e,f,g defined with default values { g:rotate(30), f:50%, c:"bar" } // Features a,b,d,e defined with default values { a:false, e:heavy, b:~, g:~ } // Features b,c,d,f,g defined with default values #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The above result is a table with three rows, where each row has all values defined, some explicitly and some via default values. Dictionary table rows can contain several kinds of schema-mismatch errors: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [# [ a:(boolean:), b, c:"foo", d:none, e:normal, f:100%, g:[[1 0][0 1]] ] { b:3.7 } ^ Error: required feature 'a' not defined (it has default type, but not value) { a:true, b:4.6, x:red } ^ Error: unrecognized feature 'x' #] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Graphs ---------------------------------------------------------------------------------------------------- LSON supports graph data, where a graph is defined by a set of nodes and a set of edges between node pairs. Graphs have the following properties: + Graphs are delimited with `[%` and `%]` tokens (no whitespace is allowed between delimiter characters). Inside are two objects: a node set, and an edge set. + Each node may or may not have associated data. + Each edge may or may not have associated data. + Nodes may be unnamed, in which case they are referenced by index. + Nodes may be named, in which case they are referenced by name. + Edges may be directed or undirected. + An edge may leave and arrive at the same node. + There may be many edges between any pair of nodes. ### General Graph Structure Each LSON graph is expressed in the following pattern: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% // Node Data // Edge Data %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Graph Nodes Graph nodes may be expressed in a number of ways, depending on - whether they are to be referenced by index or by name, and - whether they have associated data, and - whether they are implicitly or explicitly defined. Unnamed nodes use an array-like container, while named nodes use a dictionary-like container. #### Implicitly Defined Nodes The simplest node set is specified entirely by the set of all nodes referenced in the edge set. If this is sufficient, the node set consists entirely of the keyword `auto`. Implicitly-defined nodes have no data. Graphs defined with implicit nodes have no way to specify nodes that are unreferenced in the edge set. #### Unnamed Graph Nodes Without Data If nodes are to be referenced by index (from 0 onward), and have no data, then simply specifying the number of nodes is sufficient. Node count must be greater than or equal to zero, and is expressed as a positive decimal integer. Unlike implicitly-defined nodes, this format provides a way to include nodes that exist but have no associated edges. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON 1000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Unnamed Graph Nodes With Data Unnamed nodes can be referenced by index in an array of node data. In this example, each node has 2D coordinate data, but the array can contain any type of data per node. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [ [3 2] [2 2] [2 1] [1 3] [1 2] [1 1] ] // ----- ----- ----- ----- ----- ----- // 0 1 2 3 4 5 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Even if nodes have number-like values, they must still be referenced by their array-like index: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [ 1 // Node 0 has data (1) 2 // Node 1 has data (2) 3 // Node 2 has data (3) 5 // Node 3 has data (5) 8 // Node 4 has data (8) ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Named Graph Nodes Without Data To define a set of nodes with names (keys), use a dictionary. To express a set of named nodes without additional data, use a dictionary with multi-keyed dummy data, like so: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { [ bargaining testing anger shock acceptance depression denial ]: () } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here, the null element is a simple convenience. The actual value could be anything, like `0` or `""`, but this allows nodes to have an associated name or key. #### Named Graph Nodes With Data To express a set of named nodes with additional data per node, use a dictionary, but with actual data. Each entry specifies the node data by name. Here, the seven standard colors of the rainbow are given with their CSS color values: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { red: { css:#ff0000 rgb:'rgb(255,0,0)' } orange: { css:#ffa500 rgb:'rgb(255,165,0)' } yellow: { css:#ffff00 rgb:'rgb(255,255,0)' } green: { css:#008000 rgb:'rgb(0,128,0)' } blue: { css:#0000ff rgb:'rgb(0,0,255)' } indigo: { css:#4b0082 rgb:'rgb(75,0,130)' } violet: { css:#ee82ee rgb:'rgb(238,130,238)' } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Graph Nodes With Tabular Data Node data may be expressed in the form of a table. Node IDs must be the first column of a node table (the column name is arbitrary), and all such ID values must be unique (whether referenced or not). Here's an example: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% [# [nodeID, color:(color:black), weight:(number:0)] a107c5, ~, 1.00; 8c78e5, blue, 1.00; 73ba4d, ~, 2.30; 2b0ebb, indigo, 0.21; #] [ a107c5 → a107c5, a107c5 → 2b0ebb, ... ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Graph Edges Edges are expressed as a set of node pair relationships. A node relationship is either a node index or name, followed by a relationship character, followed by the second node index or name. The following are examples of node edges: | Edge | Interpretation |--------------------|------------------------------------------ | `a - b` or `a ↔ b` | An undirected edge between nodes a and b | `a > b` or `a → b` | A directed edge from node a to node b | `a < b` or `a ← b` | A directed edge to node a from node b _The special Unicode characters above are ↔ (U+2194), → (U+2192), and ← (U+2190)._ Because node names may themselves contain relationship characters, ambiguity is possible. In general, parsing will consider the first encountered relationship character as a part of the edge description, and not part of the node name. If a node name contains any of the above six characters, it is best to quote the names to avoid confusion. Here are parsing examples for edge specifications: | Edge Spec | Result |:------------|:---------------------------------------------------------------- | `a>b` | Interpreted as `'a' → 'b'` | `a > b` | Interpreted as `'a' → 'b'` | `'a-b'-c` | Interpreted as `'a-b' ↔ 'c'` | `a-b-c` | Interpreted as `'a' ↔ 'b-c'` | `a-b>c` | Interpreted as `'a' ↔ 'b>c'` | `a - b - c` | Interpreted as edge `a ↔ b`, node name `'-'`, illegal edge `c` #### Graph Edges Without Data Edges without data are specified as an array of edges. Here's an example using node indices: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [ 0→500 1→548 2→23 3→897 ... ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another example, this time using node names: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [ shock → denial denial → anger anger → bargaining bargaining → depression depression → testing testing → acceptance ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Graph Edges With Data Edges with data use a dictionary where each property is a node edge, and the property value is the data associated with the edge. This example references named nodes to specify edges colored with CSS colors. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON { 'upper-left' ← 'upper-right': #888888 'mid-left' - 'upper-left' : #666666 'mid-left' → 'mid-right' : #444444 'mid-right' - 'lower-right': #222222 'lower-left' ← 'lower-right': #000000 } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Graph Edges With Tabular Data Graph edges may be expressed using tabular data. As with node tables, the first column of edge tables is special, and expected to hold the edge expressions. The name of the first is not significant. Here's an example of edge data using a table: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% .... // Row data [# [ edge ; status ; frinkiness ] //===== ; ====== ; ============== [ 2 > 0 ; hot ; zoobnificent ] [ 2 > 1 ; tepid ; cromulipitant ] [ 1 > 2 ; molten ; breg ] ... #] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Directed Graph Edges Via Adjacency Matrix Directed graph edges may be expressed via adjacency matrix. In a directed adjacency matrix, each edge is found in a square **N**×**N** matrix, where **N** is the number of nodes in the graph. In this form, nodes are referenced by index (not by name), and the edge from node **A** to node **B** is found in the Ath row, Bth column, with that entry containing the data for that edge. Graphs using adjacency matrices must have indexable nodes. Named nodes are not supported in conjunction with adjacency matrices. Here's an example of an adjacency matrix for a directed graph of three nodes: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% [ scout dig sleep ] [ [ .67 .12 .21 ] // Probability of scout→scout, scout→dig, scout→sleep [ .74 .21 .05 ] // Probability of dig→scout, dig→dig, dig→sleep [ .23 .29 .48 ] // Probability of sleep→scout, sleep→dig, sleep→sleep ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #### Undirected Graph Edges Via Adjacency Matrix Adjacency matrixes with undirected edges require only an upper triangular matrix, where each entry describes an edge between one index and an equal or greater index. Thus, the i-th row has **N** - _i_ entries. Graphs using adjacency matrices must have indexable nodes. Thus, named nodes are not supported in conjunction with adjacency matrices. Here's an example of an undirected adjacency matrix: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% 3 // Three nodes without data: 0, 1, 2 [ [ 23.1 445. 0.12 ] // 0-0 0-1 0-2 [ 1.72 34.7 ] // 1-1 1-2 [ 7.56 ] // 2-2 ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ### Graph Examples Implicitly-defined nodes: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% auto [ rock → scissors scissors → paper paper → rock ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Indexed nodes without data, edges without data: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% 1000 [ 0→500; 1→548; 2→23; 3→897; ... ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Indexed nodes with 2D coordinate data, plus edges without data: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% [ [3 2] [2 2] [2 1] [1 3] [1 2] [1 1] ] [ 0-3, 1-4, 2-5, 3-4, 1-2 ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The seven stages of grief: named nodes without data, edges without data, one node without any edges: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% { [bargaining testing anger shock acceptance depression denial combustion]: () } [ shock → denial denial → anger anger → bargaining bargaining → depression depression → testing testing → acceptance ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, a railroad (parsing) graph for floating point numbers: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LSON [% // Named nodes without data { [ start wholeDigit fractionalDigit exponentCharacter sign decimalPoint exponentSign exponentDigit end ]: () } // Edges without data [ start → sign start → wholeDigit start → decimalPoint wholeDigit → wholeDigit wholeDigit → exponentCharacter wholeDigit → decimalPoint wholeDigit → end decimalPoint → fractionalDigit decimalPoint → exponentCharacter decimalPoint → end fractionalDigit → fractionalDigit fractionalDigit → exponentCharacter fractionalDigit → end exponentCharacter → exponentSign exponentCharacter → exponentDigit exponentSign → exponentDigit exponentDigit → exponentDigit exponentDigit → end ] %] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Appendix A: Grammar ==================================================================================================== ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ lson-file ::= line-terminator ::= U+000a | U+000b | U+000c | U+000d | U+0085 | U+2028 | U+2029 comment-line-remainder ::= "//" <any character not line-terminator>* comment-block ::= "/*" <any character sequence not containing "*/"> "*/" whitespace-item ::= | | | U+0009 | U+0020 | U+00a0 | U+1680 | U+2000 | U+2001 | U+2002 | U+2003 | U+2004 | U+2005 | U+2006 | U+2007 | U+2008 | U+2009 | U+200a | U+2028 | U+2029 | U+202f | U+205f | U+3000 whitespace ::= + value ::= | | | | | | string-character ::= | "\b" | "\f" | "\n" | "\r" | "\t" | "\u" {4} | "\u{" + "}" | "\" hex ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F" word ::= + string ::= | quoted-string ::= not matching begin quote>* * unquoted-string ::= + * string-begin-quote ::= "\"" | "'" | "`" | "«" | "‘" | "“" string-end-quote ::= "\"" | "'" | "`" | "»" | "’" | "”" concatenated-string ::= "+" ( | ) id ::= terminator ::= "," | ";" | | null-before-closing-delimiter element ::= | | null-element ::= "()" untyped-element ::= "(" ")" | "((" "))" typed-element ::= "(" ":" ") | "((" ":" "))" typeonly-element ::= "(" ":" ")" dictionary ::= "{" "}" dictionary-body ::= * dictionary-item ::= ":" dictionary-key ::= | "[" ( )+ "]" array ::= "[" * "]" array(n) ::= "[" {n} "]" array-item ::= table ::= "[#" "#]" table-body ::= * table-schema ::= "[" * "]" table-feature ::= | ":" | ":" table-row ::= | | table-row-bare(numFeatures) ::= ( ){numFeatures} table-row-bracketed ::= "[" * "]" table-row-value ::= | "~" table-row-dictionary ::= "{" "}" table-row-dictionary-body ::= * table-row-dictionary-item ::= ":" ( | "~") graph ::= explicit-graph | adjacency-graph explicit-graph ::= "[%" "%]" adjacency-graph ::= "[%" (numNodes) (numNodes) "%]" "[%" "%]" graph-nodes ::= "auto" | | | indexed-nodes ::= | (numNodes) named-nodes ::= node-table ::= < table > explicit-graph-edges ::= | | edge-array ::= "[" * "]" edge-dictionary ::= "{" ( ":" )* "}" edge-key ::= | "[" ( )+ "]" edge-table ::= < table > adjacency-matrix ::= | directed-adjacency-matrix ::= "[" (numNodes){numNodes} "]" undirected-adjacency-matrix ::= "[" (numNodes) (numNodes-1) .. (1) "]" edge ::= node-ref ::= | node-index ::= edge-type ::= '-' | '↔' | '>' | '→' | '<' | '←' ____ (...) Ordered group of tokens ? Denotes zero or one * Denotes zero or more + Denotes one or more {n} Denotes exactly n s {n+} Denotes common n s, where n is one or more ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additional Material ==================================================================================================== - [LSON Examples](./examples.md.html) - [Element Types][] [Default Table Values]: #datastructures/tables/defaulttablevalues [Element Types]: ./ElementTypes.md [Element Value Blocks]: #datavalues/elements/elementvalueblocks [Strings]: #strings [Unicode whitespace]: https://en.wikipedia.org/wiki/Whitespace_character#Unicode