reword README
README: fix typo
Update thing about musl in README
this is an "assembler" that compiles to a printf loop. i'm not sure that "assembler" is even the right word for it. it feels right, but like, idk. call it whatever you want, point is, it generates C code which pretty much looks like this:
/* a bunch of macros and stuff... */
static unsigned char mem[65536];
int main(void)
{
while (!*EXIT) {
printf(/*...*/);
}
}
for those unaware, printf is turing complete (1, 2). the reason for this
is the %n specifier, which stores(!) the number of bytes printed thus far into
a pointer.
this assembler makes it "easy" to write code with printf. the result is slow af
and very cpu intensive, but it works! note that the generated code uses POSIX
positional specifiers (e.g. %2$d), so you'll need a POSIX-compatible libc to
run the resulting code. POSIX only mandates support for 9 positional parameters,
but this assembler generates code which usually uses way more, so you'll need a
libc that supports a large number of positional parameters (i.e. a large value
for NL_ARGMAX). glibc works great; musl and freebsd libc unfortunately only
support 9, so they don't work. i haven't tested any other libcs; i have a
feeling most should probably work though? in the future i might add a mode which
doesn't depend on POSIX, we'll see
the resulting code extensively uses the terminal's alternate screen buffer, so it's not really possible to meaningfully pipe the output. your terminal emulator must support the alternate screen buffer, and ideally also synchronized output.
as a challenge for myself, i imposed the limitation that the entire source code for the assembler must fit in under 1000 lines, while still being readable (so someone who wasn't familiar with the line limit wouldn't suspect anything).
make
./asm < foo.pfs > foo.c
gcc foo.c # or clang or whatever
./a.out
C23 is required to compile the assembler.
.pfs stands for "printf .s", since it's like, an assembly file but for printf. idk
if you wanna try it for yourself, try with mandelbrot.pfs! it computes and
prints the mandelbrot set entirely in printf
to get an idea of how the assembler works, let's walk through an example fizzbuzz program:
alias i, fizz, buzz
->fizz ([i] + 1) % 3 == 0
->buzz ([i] + 1) % 5 == 0
->i [i] + 1
->exit [i] == 100
[i] if !![i] & ![fizz] & ![buzz]
"Fizz" if [fizz]
"Buzz" if [buzz]
"\n" if [i]
it's a "declarative" language of sorts; it essentially describes how the state
should be mutated on each iteration, and what should be printed. memory loads
use the syntax [foo], where foo resolves to an integer for the slot of memory
to load from (in C, this is basically mem[foo]). note that memory loads are
pre-computed before any stores; memory stores use a different syntax to
emphasize this (i'll get to that later). remember that memory loads are just
array subscripting expressions passed as arguments to printf, so they're
evaluated before printf is actually called.
the alias keyword declares "constants", kinda. the most intuitive form of its
syntax is this:
alias foo = 1
alias bar = foo + 1
when the "initializer" is omitted, it defaults to the previous alias's value
plus one, or to 0 for the first alias. so alias i, fizz, buzz defines i as
0, fizz as 1, and bar as 2. these will be used as memory indices; they're
just given names to make the code more readable. they have no affect on the
semantics of the resulting code; you can think of them sorta like macros which
are substituted with their value (note that, unlike C macros, it's not lexical
substitution, so you don't need to worry about parentheses or whatever).
moving on:
->fizz ([i] + 1) % 3 == 0
->buzz ([i] + 1) % 5 == 0
->i [i] + 1
the syntax for memory stores is ->foo bar, where bar is an expression whose
resulting value is stored to slot foo. so for example, ([i] + 1) % 3 == 0 is
evaluated and its result is stored to slot fizz (which resolves to 1). like i
mentioned above, this uses a different syntax from loads to emphasize that any
later use of [fizz] will still evaluate to the previous value; the stores only
"take affect" after the iteration is complete (this is technically not entirely
true, but it's good enough for now. there's one kinda-advanced feature which
breaks this abstraction, but don't worry about it yet).
->exit [i] == 100
->exit is special: it stores to a special reserved "exit" slot, which is
checked at the beginning of each iteration. if its value is non-zero, the
program exits. so it's basically an exit condition.
here's the complete list of allowed forms for memory stores (outrefs):
->0
->foo
->exit
->(foo + 1)
->[foo]
the parenthesized form allows any arbitrary expression. the bracketed form is
identical to ->([foo]), i.e. it stores to the slot whose value is stored in
slot foo.
a couple final notes here:
moving right along:
[i] if !![i] & ![fizz] & ![buzz]
"Fizz" if [fizz]
"Buzz" if [buzz]
"\n" if [i]
the syntax foo if bar prints the value foo if bar is non-zero. the if
clause can be omitted, if you just want to print a value unconditionally.
it's worth noting that && and || are deliberately not supported. the same
goes with ternary conditional expressions ?:. these aren't included because
they affect control flow, and the entire gimmick is that the resulting code has
no explicit control flow besides the loop. because of this, printf assembly code
usually uses the bitwise operators & and |. because the assembler copies C's
operator precedence rules, these have the same precedence as && and ||
(most people agree that these precedence rules are a mistake which only exist
for historical reasons; this is like the one time where they're actually useful
lol). to do boolean arithmetic, stuff like !! is used (converts any non-zero
value to 1), as well as unary - on boolean operands of & (to convert 1 to
-1, which has every bit set). so like, foo & -!!bar evaluates to foo if
bar is non-zero, otherwise it evaluates to 0.
so let's look at the entire fizzbuzz source code again:
alias i, fizz, buzz
->fizz ([i] + 1) % 3 == 0
->buzz ([i] + 1) % 5 == 0
->i [i] + 1
->exit [i] == 100
[i] if !![i] & ![fizz] & ![buzz]
"Fizz" if [fizz]
"Buzz" if [buzz]
"\n" if [i]
memory is initalized to zero, so we start counting i from 0. we store whether
the next number is divisible by 3 in fizz, and whether it's divisible by 5 in
buzz. we then increment [i] for the next iteration, except the program will
terminate if [i] is 100. we then print [i] if it's neither divisible by 3
nor 5 (and if it's non-zero), otherwise "Fizz" or "Buzz" is printed. finally, a
newline is printed (again checking that [i] is non-zero).
this actually isn't the simplest fizzbuzz program! this is based off the first one i wrote as i was first writing the assembler, but aliases were only added later, so the program can be simplified to only load/store a single slot of memory:
alias i
->i [i] + 1
->exit [i] == 99
alias Fizz = ([i] + 1) % 3 == 0
alias Buzz = ([i] + 1) % 5 == 0
[i] + 1 if !Fizz & !Buzz
"Fizz" if Fizz
"Buzz" if Buzz
"\n"
i showed the other fizzbuzz first since i felt like it did a better job of introducing the concepts of the language, but this one shows aliases being used to define "temporary" variables, rather than just describing memory indices. the convention (i decided) is to use lower_case for memory indices, CamelCase for temporaries, and SCREAMING_CASE for constants.
but wait, there's more!
if the input keyword is used, then the form of the generated code changes:
/* a bunch of macros and includes... */
static unsigned char mem[65536];
static struct termios termios;
static void cleanup(void)
{
/* termios restore code... */
}
int main(void)
{
/* termios initialization code and stuff... */
for (int input = '\0'; !*EXIT; input = getchar()) {
printf(/*...*/);
}
}
basically this enables raw mode on the terminal, and nonblocking mode on stdin,
and attempts to read user input every iteration. input is a builtin "constant"
which is initially set to 0, and on each following iteration is set to the
character which was typed, or to EOF if no input was available (technically
the value of EOF is non-portable, but in practice i'm pretty sure it's always
-1).
as of right now, there's no way to print input as a character without storing
it to memory and using an inref (described below). i'd like to add a way to do
this in the future; i just don't know what the syntax should be lol
the intent here is that i'm eventually gonna write tetris in printf. it's a work in progress and may not ever be completed. but the groundwork is all layed out at least :3
this is a sorta advanced feature which breaks the abstraction that memory stores only occur at the end of the iteration.
<-foo
<-foo if bar
this prints the string starting at index foo. unlike memory loads, these
do use previously stored memory values.
this makes sense when you think about the resulting code: memory loads are just
array subscripts, evaluated outside of the printf call. inrefs are generated as
%s specifiers with a pointer argument, so they'll read stuff previously stored
with %n.
this is also relevant for aliases. if you wondered why they're called "aliases" and not, like, "constants" or whatever, this is why: aliases aren't pre-computed: they're substituted with their initializer, but that initializer isn't evaluated ahead of time. this only matters for inrefs; otherwise the semantics are the same as though they were only evaluated once.
inrefs have limited use, especially since there's no easy way to store a string without storing each byte individually. i considered removing them, but they're so simple to implement that ultimately i decided to keep them, so you can take advantage of more of printf's functionality.
inrefs have identical syntax to outrefs, except the arrow points in the opposite
direction. this means that, technically, <-exit is allowed. the byte after the
exit slot is always zero, so like, you could do this i guess; there's just no
reason you'd ever actually want to lol
# and go til the end of the line, like in other
scripting languages like sh and python.CHAR_BIT == 8 && sizeof(int) == 4.INT_MAX (2^31 - 1).1 << 31 is UB :/>> does an arithmetic right shift (technically that's
actually implementation-defined, but like, let's be real, you're not gonna
ever use this on a system which does logical right shift with signed
operands).[1-9][0-9]* for decimal, 0(o[0-7])?[0-7]* for octal, 0x[0-9A-Fa-f]+ for
hex, and 0b[01]+ for binary. digit separators aren't supported. this means
that e.g. 0foo lexes as two tokens: 0 foo. this detail will literally
never matter to anyone but i figured i'd document it anyway.by the way, the entire grammar is more thoroughly documented in grammar.txt :)
a vim plugin for .pfs is included in this repo! it has syntax highlighting and stuff like that.
bye :3