Skip to content

Small int and untagged immediate literals#4635

Merged
jra4 merged 16 commits intomainfrom
jra.small-ints
Sep 10, 2025
Merged

Small int and untagged immediate literals#4635
jra4 merged 16 commits intomainfrom
jra.small-ints

Conversation

@jra4
Copy link
Contributor

@jra4 jra4 commented Sep 3, 2025

This feature adds literals for the remaining integer types:

  • int8: 42s,
  • int16: 42S,
  • int8#: #42s,
  • int16#: #42S
  • and int#: #42m.

The literal syntax #42 is not an option because it causes an ambiguity with line number directives. The literal syntax 42m for regular tagged integers was also added to mirror the untagged immediate literals.

The changes are mostly straightforward since we don't need to modify the parser. The one rough edge is converting strings to small ints. Since small ints are represented as regular ints in the compiler, we could just call int_of_string to convert, but this would not properly handle overflow1. Ideally, we would call the C primitive parse_intnat to handle most of the overflow logic, but it takes some of its arguments unboxed. A less ideal but feasible option is to put a C stub in utils/ that calls parse_intnat, but this would be awkward since it would be the only C stub used during typing. The option I chose to go with is to reimplement the overflow logic in OCaml (in utils/misc.ml). Extensive testing of this overflow logic is found in testsuite/tests/typing/small-numbers/test_enabled.ml.

The choice of m for the untagged immediate suffix is a bit arbitrary. The main consideration is that the suffix isn't overloaded (e.g. #42u would be a bad choice since it could be confused for an unsigned literal).

Reviewing

Start by looking at the tests. In particular, make sure every case of overflowing literals is tested and has behavior similar to regular ints.

Most of the changes in the source are copy-pastes of code that handles the other integer types, so reviewing should be easy. Two places require extra care:

  • In utils/misc.ml, make sure cvt_small_int handles overflow correctly (see long paragraph above)
  • In the printers, make sure that every time that some literals are printed with a suffix, untagged immediates are also printed with a suffix (in an earlier version of this PR, untagged immediates were not given a suffix).

Footnotes

  1. Handling overflow is more complex than you'd expect: see https://github.com/ocaml/ocaml/issues/4210.

@github-actions
Copy link

github-actions bot commented Sep 3, 2025

Parser Change Checklist

This PR modifies the parser. Please check that the following tests are updated:

  • parsetree/source_jane_street.ml

This test should have examples of every new bit of syntax you are adding. Feel free to just check the box if your PR does not actually change the syntax (because it is refactoring the parser, say).

@jra4 jra4 changed the title Jra.small ints Small int and untagged immediate literals Sep 4, 2025
@jra4 jra4 marked this pull request as ready for review September 4, 2025 18:42
@jra4 jra4 requested a review from nmatschke September 4, 2025 19:21
Copy link
Contributor

@nmatschke nmatschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, thanks. Couple small comments.

@jra4 jra4 enabled auto-merge (squash) September 10, 2025 17:18
@jra4 jra4 merged commit 7ed9ac9 into main Sep 10, 2025
23 checks passed
@jra4 jra4 deleted the jra.small-ints branch September 10, 2025 17:39
mshinwell pushed a commit that referenced this pull request Sep 11, 2025
This feature adds literals for the remaining integer types:
* `int8`: `42s`,
* `int16`: `42S`,
* `int8#`: `#42s`,
* `int16#`: `#42S`
* and `int#`: `#42m`.

The literal syntax `#42` is not an option because it causes an ambiguity with line number directives. The literal syntax `42m` for regular tagged integers was also added to mirror the untagged immediate literals.

The changes are mostly straightforward since we don't need to modify the parser. The one rough edge is converting strings to small ints. Since small ints are represented as regular `int`s in the compiler, we could just call `int_of_string` to convert, but this would not properly handle overflow[^1]. Ideally, we would call the C primitive `parse_intnat` to handle most of the overflow logic, but it takes some of its arguments unboxed. A less ideal but feasible option is to put a C stub in `utils/` that calls `parse_intnat`, but this would be awkward since it would be the only C stub used during typing. The option I chose to go with is to reimplement the overflow logic in OCaml (in `utils/misc.ml`). Extensive testing of this overflow logic is found in `testsuite/tests/typing/small-numbers/test_enabled.ml`.

The choice of `m` for the untagged immediate suffix is a bit arbitrary. The main consideration is that the suffix isn't overloaded (e.g. `#42u` would be a bad choice since it could be confused for an unsigned literal).

## Reviewing

Start by looking at the tests. In particular, make sure every case of overflowing literals is tested and has behavior similar to regular ints.

Most of the changes in the source are copy-pastes of code that handles the other integer types, so reviewing should be easy. Two places require extra care:
* In `utils/misc.ml`, make sure `cvt_small_int` handles overflow correctly (see long paragraph above)
* In the printers, make sure that every time that some literals are printed with a suffix, untagged immediates are also printed with a suffix (in an earlier version of this PR, untagged immediates were not given a suffix).

[^1]: Handling overflow is more complex than you'd expect: see <ocaml/ocaml#4210>.

---------

Co-authored-by: James Rayman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants