Arcan 0.6.3 – I, pty: the fool

Posted on December 19, 2023 by bjornstahl

It has been well over a year since last time, but also some fairly beefy updates to sum up for our dear old Desktop Engine. Strap in.

In general project news, there are both grim and good ones. Starting with the grim: Mr. T-iago, a great friend and avid supporter of the project since days of Recon, has left for the mainframe of yore. T forever remains a gentleman; a scholar as Portuguese as Danish Pride can get; a rare fellow reverser knowledgable in the ways of +Fravia; A BJJ brawling DC black badge bearer and Big Lebowsky sweater wearer. Fuck cancer.

In ‘good news everybody’, the project has received promises of generous funding from both a private entity and from EU-NGI/NLNets NGI0 Entrust fund.

The private funding will be used to improve client compatibility starting with the KVM/QEmu UI/Audio driver and working from there down the list of relevant targets; Xorg, Qt, Imgui, SDL/GLFW, Chrome-ozone.

The NLNet sponsored parts aims for improving and documenting our network transparency layer and its associated protocol, A12. The main focus is on safety and privacy aspects of service/device discovery of the implicit web of trust that emerge from keystore management, but also on the server-end state controls and coordination. A deeper dive into parts of this can be found in the recent post on ‘A12: Visions of the Fully Networked Desktop‘. Some of its video clips are used here as well, either because they were so good that they are worth repeating, or the release simply was not visual enough otherwise.

On related projects there are some pop:ing up that attempt to package and build OSes with Arcan as a base component. One early such attempt is Eltanin.

A note for the packagers and others with related interest in the configuration management of the project: we will move away from GitHub and git posthaste. Microsoft has completely obliterated what little good faith that was left in their operation and my account is no longer active. While peeling off that scab, the plan is to move to self-hosted Fossil with a public mirror or two. This will also be used as an opportunity to swap out CMake for Xmake and solidify our stance on ‘Lua pretty much everywhere’. Thanks to the advancements of our network tooling, the hope is that pseudonymous issue collaboration, forum and ephemeral communication all can be migrated away from our discord with GitHub and well, Discord, soon enough, but that is a different post. Since we are practically rolling release yet such distributions rarely apply that rule to packages, we will soon start automatically / incrementally tag updates .topic.release.patch on a weekly basis assuming there are any changes since last.

True to form, the following clip is a collection of the ones in this post, set to a fitting tune and padded with some from our terminal-free CLI shell Lash#cat9 (including something not shown before). Mr.T was a strong instigator behind that specific work, though he never got to see it come to fruition. Thank you, Mr. T – the rains have ceased old friend.

Discovery and Networking

The networking protocol, A12 consists of several loosely connected components. First we have the protocol implementation itself, src/a12. This is used to build two different interfaces. One which does not require all the rest of arcan, only the IPC system libraries (libarcan-shmif, libarcan-shmif-server). This is the arcan-net binary.

As shown in previous release posts, it can be used to setup and manage most practical aspects of the protocol, but it is a developer or a system tool more than an end user thing.

The other is afsrv_net and its corresponding path in the high level Lua APIs, e.g “net_open, net_discover) …” for building networked arcan appls. While using nearly the same code for key management and communication as arcan-net itself, since a graphical environment is guaranteed, there are more options for building user accessible interfaces and integrations.

In this old clip you can see a tool in our reference desktop environment, Durden , used to connect to an arcan-a12 directory server hosting appls (such as durden itself, or our more experimental dataflow ZUI, pipeworld), selecting an appl, downloading, running, changing some visible configuration and synching state.

Showing networked configuration persistence and appl download/execution.

In this clip we take things even further.

Live-editing and synching an appl across multiple devices through a shared external server.

Here I repeat the ‘download and run appl’ scenario, but I also open it up, modify it and you can see how other devices using it live updates as well.

The code for this part of the server has also been refactored to strongly (stdio + fdpassing + no-filesystem) contain the processing of each client.

A non-polling local network discovery form has also been added. Previously the keystore was just swept, trying to make connections to the hosts mapped to each petname. Now devices beacon a challenged set of identities that others can trigger from.

In the following clip there is a debug build running on an old 1st gen Surface Go connected over a dodgy WiFi. It has been been marked as discoverable (arcan-net discover beacon). On the main desktop I tell Durden to look for devices, and it is set to alert as a status-bar button on new discoveries. I drag a browser window to it, and decide to share-active, meaning to allowing it to provide input for this window only. I type a few things on the attached keyboard to show that it indeed doing that. I toggle a post-processing effect on the desktop end, showing that it is possible to mix-in accessibility features while sharing.

Lan network discovery with streamed sharing and contained input injection.

Yet another nice step towards the ideals defined in our principles as well as for implementing Arcan as OS Design.

Tan, Tui!

We are finally closing to the end of the long standing goal and sub-project of migrating away from terminal emulators as a way of building command line and text dominant applications, replacing it with the much more display server friendly arcan-tui library.

The more recent details of this can be found in the blog post on: Day of a new command line interface: shell along with the technical counterpart: Whipping up a new shell: Lash#Cat9, and tangentially Writing a console replacement using Arcan (for the getty like- replacement), although there is one article yet missing for the developer side about how we also have the means to get rid of curses.

Many of the deprecated functions previously around only to support how our terminal emulator used the screen abstraction in TSM are now blocked out by default, and the screen abstraction itself is gone from the TUI libraries, making updates faster and memory requirements much lower.

Border drawing is now processed as cell attributes and not actual ‘line character’ glyphs, making them faster to draw, consuming no precious grid space and does not interfere with text selection or confuse screen readers. The readline widget uses them by default, shown in the completion popup in the next clip.

This clip also shows how cooperation with the outer WM continues to advance. It is possible to dock into a possible tray or status-bar, as well as supporting ‘swallowing’ window management mode. The swallow trigger can be seen at the end of the clip, prompted by the s! command prefix.

CLI shell with non-grid borders, file system monitoring and WM negotiated client open modes

The gritty details on how embedding and controlling external processes work has also advanced. In the following clip you can see how a TUI client delegates embedded media playback, but can influence both scaling, positioning and some input routing. I then tear it out, and send it on its merry way to another computing device thanks to proper network transparency.

Transition between embedded external media in a tui surface to WM management to network redirection.

Cipharius also added a caching mechanism to the server side text processing, meaning that text surfaces (including all Tui clients) will now share glyph caches between each other, reducing the cost for windows that share default (font, size, hinting and density).

There are still some more gains to be had on the server side text processing end, especially for XR and accessibility by moving to atlases with distance field representation. Then we can re-add some of the features that were temporarily stripped during the move to server-side text, like smooth scrolling and BiDi/shaping. Combining that with map_video_display(…) and we can have all the knobs and tools to get fullscreen TUIs with embedded media racing the beam, reliably and tear-free across the stack — getting dangerously close to perfect balance between latency and power consumption.

Compatibility Work and NSA 0 Day

As stated in the previous release post, we’re pretty much ‘done’ wizzbang feature wise – anything left to add is rather minor. This is the point when one should think about compatibility in all kinds of directions — backwards, forwards and sideways.

In that spirit, much more work gone into our Xorg DDX, Xarcan. A number of interesting things has happened, warranting an article of its own, but several different ways of using it has emerged.

One is that it now supports the ‘Redirected’ way of translating window hierarchies into windows that integrate with the Arcan side in the way pioneered by Xwin and Xquartz way back in the early 00s. This means you can push a single X11 application over A12 — network transparent outbound:

 ARCAN_CONNPATH=a12://my.host Xarcan -redirect -exec chromium

Or serve it up inbound:

arcan-net --soft-auth -l 6680 -- /usr/bin/Xarcan -redirect -exec chromium

This clip shows that in Durden:

Network pushed chrome through single rootless Xarcan.

You can still go the old route through arcan-wayland -exec-x11 chromium but it will increasingly lag behind in features and performance unless someone steps up, as I am personally absolutely completely done touching anything Wayland code wise — the post mortem analysis of how ‘so simple it is nigh useless’ into ‘X but magnitudes worse’ is one for the ages.

With Xorg basically up for grabs – there are a lot of fun ventures to pursue here and it is contrary to popular belief quite salvageable, much more so than its proposed replacement.

As is the case with window management, there is a larger story on how to fix the remaining Xorg security and accessibility nuances. A historical oddity uncovered while playing around with this that will be left as an exercise to the reader to exploit: here is a part of the XAce security mechanism and attempt to define an intended security perimeter. You should be able to come up with at least three ways of circumventing it, without ever chasing overemphasized trails on langsec ‘memory safety’ and dull things like that — In code contributed by no other than the NSA.

More interesting still is the third mode shown in this clip:

Never before have Xeyes looked this confused.

Ignoring some glitches courtesy of my HDMI capture setup, you can see me starting Xarcan that proceeds to setup window maker. This makes Durden automatically assign it a new fullscreen workspace, with some special rules applied. I start Xeyes to feel less lonely, and, as per usual, it begins to death stare the mouse cursor. I start Pipeworld through the Durden HUD, but any window that spawns in a workspace owned by Xarcan somehow gets decorated and treated as part of the Xorg space, yet Xeyes loses track. Not depicted but also true: not even the mighty xinput --test-xi2--root knows what is going on. There might be something less neighbourly to this story.

In other news, afsrv_terminal has received support for another terminal emulation state machine, that of suckless. The point was partly to get something slimmer across the hips than TSM, and to get a very close 1:1 example “writing something like this in Arcan-tui vs X11”, server side text included. It also gives us a better point of comparison for quality aspects like text rendering, and quantifiers for latency and memory consumption.

For those interested in the Zig programming language, the true C-lang successor: Cipharius wrote an Arcan-Tui friendly frontend to the Kakoune editor, https://github.com/cipharius/kakoune-arcan.

Not only that, but he also whipped up a Remarkable2 friendly tool that will be used for various forms of sharing its features and resources over A12, what a guy! Hopefully the same approach can be used to bridge other sheltered ecosystems that thought they were safe from assimilation.

Audio

As a prelude to what is actually a large part of the topic for the 0.7 series of releases, Cipharius also dug into the build system and started to split out the engine paths previously tied to OpenAL.

While the work is mainly structural, it leads to being able to build LWA, Lightweight Arcan (compare the relationship to a less architecturally tragic form of what Electron is to Chrome) without having to patch and link a special version of OpenAL, which is one of the heavier parts of the current build environment.

Testing this out, it is now possible to chose a ‘stub’ audio platform which disables all audio processing. Thus if you never use the audio parts in neither playback nor recording or streaming, you can disable it and let the CPU idle some more.

With that a segment type for an audio only segment has been added. The use for this is high-performance critical situations where the synchronous nature of resizes could cause underrun artifacts like audible clicks on window resize operations. It will also be used to convey dynamic positional audio sources when mixing for HRTFs in XR and for general surround sound.

Video

On the video platform side, the EGLStreams code have been evicted and atomic modeset is now the default over legacy – layering violations and robust synchronisation primitives be damned. Now Nvidia users can enjoy things randomly working in a different way from before — yet one that seem to basically use the same internal driver paths according to my friend Ghidra Binaryblobbington; almost as if the one was shoehorned into the other at minimal effort. Curious how that came to be.

More plumbing have been made to HDR processing, and experimenting with the WM/Client side of things are next up as the core- bits are usable enough.

In a previous release we added the option for mapping / setting 10-bit and 16-bit /channel modes to corresponding rendertarget storage passes and mapping them onwards.

On the scripting level the WM can use “image_metadata(…)” to attach custom metadata to a video object (such as the rendertarget output from a colour corrected offscreen pass). If that object is mapped via “map_video_display(…)” the metadata will be passed on to the next layer (presumably your display).

For the client size, if one uses extended resize and toggles SHMIF_META_HDR as the list of buffer complements and the WM has marked that as permissible through target_flags(…), each signalled frame will now sync the associated metadata (though the WM can still override this through the image_metadata function, since HDR content in the wild come from various degrees of brokenness).

Next up is to wire this through our main testing tools for the purpose, aloadimage and afsrv_decode – and thereafter make sure that when arcan is used as a client/toolkit for itself, it too can tag the post-composition metadata.

Compressed Passthrough

The decode frameserver, used to aggregate media capture and parsing into one-off sandboxed processes, has received support for h264 video frames when running against a video device using UVC. With this there is also the option to try and pass-through the still-compressed frames through to the display server end.

Arcan rejects such frames as any local client should just offload into afsrv_decode and embed the resulting surface as shown in the Tui example. When redirecting over arcan-net on the other hand, that is a different story. That one will now happily try to passthrough and just multiplex/encrypt/stream – saving us a a full decompress/compress cycle and reduce degradation in transit.

This will turn out to be useful in quite a few networked video cases, like game streaming, networked surveillance cameras and low-tier media hardware.

Tracing

Cipharius has also been busy elsewhere in the project. The tracing hooks that are sprinkled all over the place can now be built with a version that fully integrates with the awesome Tracy profiler. This will be instrumental for the 0.8 branch when it is time to finally start optimizing for real.

Posted in Uncategorized | Leave a comment

A12: Visions of the Fully Networked Desktop

Posted on November 18, 2023 by bjornstahl

This is a higher level update on what has been going on with the current focus topic for Arcan releases, that is the network transparency (2020). It comes as a companion to the upcoming release post as a way to give it more context.

Backstory

A12 is our network protocol that lets applications written against libarcan-shmif communicate remotely. A12 and SHMIF share most of the same data and event model, but they have very different approaches to transmission, queuing, compression, authentication, confidentiality and so on.

Just as SHMIF consolidates a forest of IPC systems with an untapped common ground into one system around this common ground, A12 consolidates a forest of protocols into one. The history for this part spans many years, starting way before the naughty bits of protocol design and implementation.

The preparations cover everything from decoupling command line and shell from the terminal; due diligence decomposing expectations of features from existing solutions and their respective flaws; working in crash resilience to transition from local to networked operation and back; making sure it fits the security and accessibility story; that it is observable and debuggable; that it meshes with obscure input devices and as many varied windowing schemes as possible — on a foundation presented over a decade ago as a dissertation on the traditional and future role of networked display servers for surveillance and control in critical infrastructure.

This is far from the full ambition and scope of Arcan as a whole, but a substantial building block making the rest attainable and not just a pipe (world) dream. For the rest of this article I will cover some of the scenarios that the design is verified against, and match that to work that is either completed or close to completion.

This has been graciously sponsored by NLnet as part of their Zero Entrust fund.

With the networked desktop we make a few grand assertions:

The digital ‘you’ is the combined swarm of your devices — phone, smart watch, desktop, laptop, gaming devices, home server, network glasses, security token devices, note-taking eInk tablet, etc.
The digital ‘you’ persists as a story written in data captured and processed by these devices.
The digital ‘you’ is fluidly pieced together over an otherwise invisible communication substrate.

It is the ability to coordinate the swarm; control the sampling, accuracy and truthfulness of the data; to route the communication that, when combined, sets the perimeter of your digital agency. If you are unable to redirect-, delay- or deny- the communication; to intercept sampling; to erase storage or to re-purpose the devices — somewhere those parts of your digital story got surrendered, or were never yours to begin with.

The modern ‘cloud’ stitches these together in such convenient and seamless a way that it is easy to miss how it knows more about your digital you than you yourself — that you are confined to a stall rather than roaming the cyber-scented grasslands. This has been the game for a long time, even though that was not always the case.

I have referenced these talk in the past, but [27C3] Jeroen Massar ‘How the Internet Sees you’ and ‘[28C3] Stefan Burschka – Datamining for hackers’ both remain good reminders of just how visible things are from the ground up, and that is pre-‘Snowden-scare’ days. Very little has changed in this regard, and it looks like the aggressors are getting bolder by the minute.

This is all moody and gloomy, but that is not why we are here, now is it? No! we should try to improve the old and maybe create something new and reach higher heights or at least see different sights.

The Arcan play thus far has been to define and implement a user-scriptable control plane, and figure out a useful set of functions; a window manager in lack of a better word. This is not a particularly accurate one, so we use ‘appl’: not just an app, not really an application. This set has been repeatedly demonstrated to be both useful and sufficient for replicating anything that has been achieved elsewhere within the ‘desktop computing’ mental model — and then some.

Next up is the network communication substrate part, and why A12 is being forged. The current state is that the desktop ‘remoting’ case is all covered. We have a better reply to anything VNC / SSH / X11 / SPICE / RDP / Synergy / … has to say on the matter. This is a small yet important part of the story. For the rest of this article I will go through one of the means for linking the networked parts together.

Directory Server

The ongoing focus is an optional extension to the core protocol, which is the ‘directory server’.

The following diagram tries to cover how it fits together:

To describe the figure and the respective roles; source, sink, directory, appl, applgroup. The directory server act as a rendezvous point for discovery and connection negotiation, but also as a state store. It can act isolated or be linked/unlinked to other directory servers.

The directory hosts 0 to many arcan appls. Assuming a working arcan setup, running:

arcan-net arcan.divergent-desktop.org durden

Would connect to our public/test directory (hosted at openbsd.amsterdam), download, extract and run the ‘durden’ package. This can be kept on your device and handled offline, or it can retain the connection to the directory server and automatically sync to new versions of the appl as well as participate in the applgroup.

The configuration settings (state) of the appl gets its own per-user personal store on the directory server. The same store can also be used to submit debug reports and similar alternate data streams.

The following clip shows me running durden from the directory server as shown above. This is on a clean machine from the Linux console. I change the default display background colour and the mouse cursor to be very large just for something to be easy to spot in the video. Then I shut it down and re-issue the same command but with --block-state added. The desktop starts up with the default ‘calibrate/first-run’ config. I shutdown and repeat again, without the --block-state argument, and you can see how the previously directory-stored configuration returns.

Directory server side state persistence

The authentication and access control is only using the key pair that the client provided as “identity”, meaning that no tracking cookies or similar shenanigans are ever necessary. It is the client that explicitly push the state update and what this entails can be inspected or blocked at will.

Users with permission can also push an update to the appl itself. If that happens the changes would immediately sync to others currently running it.

In the following clip I run the ‘pipeworld’ appl on two machines from the same directory server. On the one machine, I open up the downloaded code, modify the default wallpaper and issue arcan-net --push-appl ./pipeworld arcan.divergent-desktop.org.

You can see how both machines trigger on the update, downloads and switches to the new version. The update cascade is atomic and optional. The effect is that I can live-develop, deploy, test and collect traces at the push of a button across a variety of devices.

As with all the clips here, they are recorded with fairly realistic network conditions: remote VPS, laptop tethered to a spotty mobile link and the desktop on a beefy wired connection.

Live appl editing and updating

Sourcing and Sinking

It is also possible to connect a source or another directory to a dynamic index. Others that are still connected would get notified, and can chose to try and sink it.

The directory will negotiate the connection between the source and sink based on network conditions, and proxy-forward the keymaterial needed for the two parties to establish an end to end encrypted connection.

With this, sharing a compatible piece of software can be as easy as:

A12_IDENT=qemu ARCAN_CONNPATH=a12://my.directory.server /usr/bin/qemu -display arcan disk.img

The above would expose a single arcan client as a source, and have it be used by a single sink at the time, reconnecting to the directory if the sink connection is severed.

Similarly, sinking the source is not more difficult:

arcan-net my.directory.server "<qemu"

If the network conditions makes the source unreachable to the sink, the directory can act as a tunnel, and if the source is currently not available, wait and get notified when it is:

arcan-net --keep-alive --tunnel my.directory.server "<qemu"

From an appl standpoint, there are dedicated APIs for repeating the same process. net_discover and net_open. Those are slightly more nuanced as they also have ways for discovering devices on the local network, or ones that have been tagged in the keystore with information of how to find them elsewhere. The ‘durden’ desktop appl comes with a tool using those functions.

This clip shows a test of me starting a client locally by browsing to a local video clip and launching it in a media player. I then migrating it to my directory as a drop action on the statusbar icon that was added when I connected to the directory. You can see the matching icon on the laptop reacting with an alert when the change is discovered. I request to sink it and it appears.

Moving a local media player client first to the directory, and then tunnel-sinking it to the laptop

Appl Group

Devices that are connected to a directory and running the same appl also get a local messaging group. The appl can leverage this to uni- and broadcast- short messages to other instances of itself, providing an easy building block for collaboration.

Code- wise, a minimal example would look something like this:

local vid = net_open("@stdin",
function(source, status)
    if status.kind == "message" then
        print(status.message)
    end
end)

message_target(vid, "message=hi there")

This example reads as ‘join the appl-group as “me” and send a message to everyone else. Others would receive “from=me:message=hi there”.

The following clip shows a testing tool for Pipeworld, running on both the desktop and the laptop. The tool adds a row for the directory server appl-group, and stores the message from users as a new cell on that row.

Messaging between two clients in the same appl-group

How this is leveraged is of course up to the appl itself.

Ongoing Work

Ongoing work in this area is using the key material from a previous directory-discovered source-sink pairing to re-bond and establish new connections without 3rd party involvement for the local area case.

This means that you can use your cloud hosted VPS as a rendezvous to discover devices, but then have them re-discover mesh networks locally without internet connections or dependencies to the likes of DNS.

More paranoid setups would include having your galvanically isolated attestation and build server inject the keymaterial into the base read-only / flashed firmware OS image and still allow individual devices to re-discover each-other in hostile hotel WiFi-like settings.

There are two big additions coming to the directory server soon.

Arcan supports user-defined namespaces. Normally these are used for providing a logical name mapping from some part of the file-system (e.g. /mount/thumb_drive to “my thumbdrive”) to be used by storage management services, cryptSetup and so on.

The logical extension to this is to have the directory server also provide a shared namespace for others attached to the same appl, as well as a private store so that you have somewhere to store data you create, encrypted and signed by default.

The second part is to let you slot in a server side set of scripts that can interface and govern the appl group, as well as dynamically launch and attach directory server hosted sources.

Other, more experimental bits concern how to query the network structure that emerges from letting others attach their own directory to another, either forming a distributed applgroup for load balancing and network performance or a dynamic mesh network of different hosted appls shared by friends, possibly compiled translations from other networked document formats.

By letting the directory server attached appl governing scripts opt-in to responding to search queries we get the mechanism for asking ‘where in my network did I store a picture or document that match this description, sorted and prioritised by my current location’ and so on.

Posted in Uncategorized | 1 Comment

The quest for a secure and accessible desktop

Posted on June 15, 2023 by bjornstahl

This article is an overview of accessibility and security efforts in Arcan “the desktop engine”: past, present and those just around the corner. It is not the great and detailed one I had planned, merely a tribute, presented in order to assist ongoing conversations elsewhere and as an overlay to the set of problems described by Casey Reeves here (linux:audio, linux:video).

Because of this the critique of other models is omitted but not forgotten, as is the firey speech about why UI toolkits need to “just fucking die already!” as well as the one about how strapping a walled garden handsfree to your face hardly augments your reality as much as it might “deglove” your brain — but it is all related.

What should be said about the state of things though is that I strongly disagree with the entire premise of some default “works for me”, a set of trapdoor features (repeatedly pressing shift is not a cue to modal dialog a ‘do you want to enable sticky keys?’ in someone’s face) and a soup of accessibility measures hidden in a setting menu somewhere. I consider it, at most, a poor form of ableism.

Due to the topic at hand this post will be text only. Short-links to the sections and their summaries are as follows:

Philosophy argues that accessibility is a broader perspective on how we attach and detach from computing both new and old.

Security extends the philosophy argument to say that security mechanisms and accessibility tooling need to evolve hand in hand, and not as a sacrifice of one quality in order to improve another.

Thereafter it gets technical:

Desktop Engine is a walk-through of the different layers of Arcan specifically from the angle of building more secure and accessible computing interfaces.

Client-Level covers the specific tools external sources and sinks can leverage to increase their level of cooperation towards accessibility.

Frameservers expands on role-designated clients for configurable degrees of protection and segment the desktop privilege level by the kind of data processing being made.

Shell and TUIs covers the middle ground from command-line shell and system management tools to composed desktop.

Examples: Durden – Pipeworld highlights preparations made for accessibility in a traditional desktop environment, and follows up with how it extends in a more unconventional one.

Trajectory briefly covers how this is likely to evolve and where the best opportunities to join in and contribute lie.

Philosophy

‘Secure and accessible’ is one of the grand narratives governed by the principles for a diverging desktop future and has been one since before the very first public presentation of Arcan itself. As such it is not a target of any one single principle, but a goal they all combine in order to deliver.

We view accessibility as a fundamental mission for computing. A rough mission statement would be ‘to increase agency, in spite of our stern and mischievous mother Nature’. It is for all of us or for none of us.

Whatever your preconditions or current state of decay happens to be, computing should still be there to grow the boundaries of your perceivable world. That world can be ever expanded upon, even if you are below 40 years of age and still believe yourself to be more than capable of anything — the mantis shrimp laughs at your crude perception of colours; a computer did not always fit in a pocket.

This reaches into both the future and into the past; gilded memories of youth and retrocomputing alike. Your current desktop and smart phone will eventually become dust in wind and tears in rain. Being suddenly rejected access to that past is a substantial loss and not just a mere hindrance — as the companies that to this day retains a win95 machine for their abandoned CNC control software would attest; some critical infrastructures still run on OS/2.

Computing history does not simply go away, it just becomes less accessible. This is why we chose to use virtualisation (including, but not limited to, virtual machines), not the toolkit, as the default ‘last resort’ model- and compartment- for compatibility that needs to be made accessible.

Security

A special topic with substantial overlap is that of security. Some have scoffed in the past at the idea of security having anything to do with the matter, yet if your threat model grows more complex directly through you or indirectly via friends or family performing sensitive work; managing access to substantial funds; having a tiff with the local ol’ drug peddler; fighting an abusive spouse or spotting the wrong criminal in the act — you will find out first hand how maintaining proper operational security (“opsec”) absolutely forms a substantial disability in a digitally traceable society where targeting information is ridiculously cheap to come by.

A trivial symptom of someone failing to understand this property are sprinkles of ‘we have just given up, here is a deliverable oh great product owner’ kind of dialog boxes: “Do you allow the file manager to access the file system yes/no/potatosalad?”, “The thing you just downloaded to do things with your camera did not pay us to pretend to trust them on your behalf and is trying to access your camera, which we reject by default to inconvenience you further; exit through the app store. To override this action please go to settings/cellar/display department, press flashlight and do beware of the leopard. We reserve the right to remove this option at any point in the future”.

On top of how such measures struggle with accessibility tooling as they, by their very nature and intent, can automate things some lobotomised designer thought should be exclusively interactive, the often ignored ‘Denial of Service‘ attack vector is really important, if not paramount.

This is because you do stupid things when put under wrong kinds of stress and pressure, hence the phrase ‘being under duress’. Stripping someone of agency at choke points is a tried and true tactic to achieve that. With tech advancing, integrating and gate-keeping just about everything it also provides great opportunity to deny someone service.

Revoking features disguised as gavaged-fed updates, reducing interaction or presentation options or dropping compatibility in the name of security is a surefire way of signalling that you do not actually care about it, just pretend to. These are properties that you develop in tandem or not at all.

Desktop Engine

Arcan is presented as a ‘desktop engine’ and not as a ‘display server’ for good reason. In this context it means to consolidate all the computing bits and computer pieces useful when mapping our perception and interaction into an authoritative behaviour and permission model, one entirely controlled by user provided scripts.

Such scripts can be small and short and just enable the bare essentials, as shown in Writing a console replacement using Arcan (2018) as well as reach towards more experimental modes like for VR (Safespaces: An Open Source VR Desktop, 2018). An exclusively aural desktop with spatial sound is fairly simple to achieve in this model, as is one controlled exclusively through eye movement or grunts of arousal, displeasure or both.

This time we are only concerned about how the desktop engine relates to the accessibility topic. To dive deeper, there already is Arcan as Operating System Design (2021) for a broader picture and Arcan versus Xorg – Approaching Feature Parity (2018), Arcan versus Xorg: Feature parity and Beyond (2020) for the display server role.

The core engine has the usual bells and whistles; an OS integration platform; media scene graph traversal; scheduler; scripting runtime; configuration database; supervisory process and interprocess communication library. Most other tools, from network translation and others come through separate processes using this interprocess communication library.

By itself it does nothing. You load a bunch of scripts that has some restrictions on what things are named and how they can access secondary resources like images. This is referred to as an ‘appl’ – more than an app, less than a full blown application. The scripting API is quite powerful, with primitives comparable with that of a web browser but thankfully layered on Lua instead of Javascript.

The end user- or an aggregate acting on their behalf- selects and tunes the active set of scripts. For the desktop perspective, this becomes your window manager. The set is interchangeable at runtime, sources- and sinks- (“clients”) adjust accordingly.

The user can augment the appl they chose to run with hook scripts, intended for agnostically adding specialised modifications of common functions to support exotic input devices and custom workarounds basically aggregating your desktop ‘hacks’.

The appl may open named ‘connection points’ (role like ‘statusbar’) to which external sources and sinks attach with an expressed coarse grained type (e.g. game, sensor or virtual-machine). Sources and sinks can work with both video as well as with audio. They also have an inbound and an outbound event queue. This ties in with the interprocess communication library mentioned above. The relevant parts of the event model is covered further below in the ‘Client Level’ section.

Connections over these named points can detach, reattach and redirect. By doing so the system can recover from crashes and transition between local and networked operation transparently.

This has been demonstrated to be sufficient even for fairly complex assistive devices. For more on that see the article on Interfacing with a ‘Stream Deck’ Device (2019) and on experimenting with LEDs (2017).

The most successful application to date has not been the use as a desktop, but similar to the setup of AWK” for Multimedia (2017) — for industrial machine vision verification and validation setups with hundreds of thousands of accumulated hours with multiple concurrent sources at high frame rates (1000+Hz) and real-time processing, which covered substantial portions of the overall project development costs.

Client Level

As mentioned in the engine section we have the concept of Window-Manager (WM) initiated named connection points (role) to which a client connects and announces a primary type.

These connection points are single-use and need to be explicitly re-opened to encourage the WM to differentiate and rate-limit. This works as a counter measure to Denial of Service through resource exhaustion. It is also one of the main building blocks for crash recovery (2017) and network migration (2020).

The WM can substitute connection point for named launch targets defined in a locked down database. This does not change the actual client code, but provides a chain-of-trust for sensitive specialised clients such as input drivers for input method engines and special assistive devices like eye trackers.

The WM combines connection point and type to pick a set of ‘preroll’ parameters. These parameters contain everything that is supposedly needed to produce a correct first frame.

This includes:

Preferred ‘comfortable visible font size’, font, hinting and possible fallbacks for missing glyphs.
Target display density and related properties for precise font rasterisation.
Preferred GPU device and access tokens.
Fallback connection points in the event of a server crash or connection loss.
Input and Output language preferences as per ISO-3166-1, ISO-639-2.
Basic colour palette (for dark mode, high contrast, colour vision deficiency).

Any and all such parameters can also be changed dynamically per client.

The client now has several options to announce further opt-in capabilities on top of a regular event loop:

labelhints are used to define a set of input tags that the WM can attach to input events, as a way of both communicating default keybindings and to let the WM map them to both analog and digital input sources. T2S can leverage this and say, for instance, ‘COPY_AT_CURSOR’ instead of CTRL+C.
contenthints are used to tell how much of available content is currently visible. This lets the WM decorate or describe boundaries and control seeking instead of clients embedding notoriously problematic controls like scroll bars that take precise eye-hand-coordination, good edge detection and has poor discoverability with the current ‘autohide’ trends.
state/bchunkhints are used to indicate that it can provide alternate data streams from a set of extensions (clipboard and drag and drop are just specialised versions of this), and if the exact current state can be snapshotted and serialised for rollback/network migration/persistence.

The WM now has the option to request alternate representations. This was used to implement a sudo-free debugging path in Leveraging the “Display Server” to Improve Debugging (2020) and the article eludes to other purposes, such as accessibility.

While X11 and others have the idea of a client creating a resource (like a window) and mapping/populating it and so on – Arcan also has a ‘push’ one. The WM can ‘send’ a window to a client. If the window gets mapped/used, it acts as an acknowledgement that the client understands what to do with it. These are typed. In the linked article, the ‘debug’ one was used (though with some extra special sauce).

There is another loosely defined one of ‘accessibility’. It signals that the client can provide a representation of the source window that is reduced to a ‘no-nonsense’ form. Thus you can have a main window with the regular bells and whistles of animated dynamic controls and transitions while also providing a matching secondary output which carries only the attention deficit, text-to-speech friendly and information dense part.

Frameservers

While the scripting itself is in a memory safe environment, it is by definition in a privileged role where caution is warranted regardless — memory is never safe, wear a helmet. For this purpose, traditionally dangerous operations such as parsing untrusted data is delegated to specialised clients referred to as frameservers. This also applies to transfers carrying potentially compromising data, e.g. desktop sharing or a regular image encode where someone might be tempted to embed some tracking metadata.

Frameservers come in build-time defined sets, designated to perform one task and be terminated with predictable consequence and repeatable results. The engine can act as the processing they provide is guaranteed. They get specialised scripting API functions to be easier to use for longer complicated processing chains.

Frameservers are chainloaded by a short stub that sets up logging and debugging; system call filtering and file system namespacing; hypervisors for hardware enforced separation all the way to network translation for per-device compartments — pick your own level of performance to protection trade-offs.

The decode frameserver takes an incoming data stream, be it an image file, input pipe or usb camera feed and outputs a decoded audio or video stream. This is also used for voice synthesis – the data stream would be your instructions for synthesis.

The encode frameserver works in the other direction: a human-native input stream to compute-friendly representation. Video streaming is an obvious one, but of more interest here is OCR — region of image interest goes in, text comes out.

Combine these properties and we have covered most bases for uncooperative forms of “screen reading”.

Shell and TUIs

One of the longest running subproject in Arcan has been that of arcan-tui (dream, dawn, day:shell, day:lash) – a migration path away from terminal emulators to better text-dominant user interfaces which follows the same synch, data and output transfer as everything else without the terminal-emulator legacy.

On top of exposing the CLI shell to the features mentioned in Client Level above, and the prospects of getting clean separation between the outputs and inputs of shell tasks, text is actually rendered server-side.

This means that the WM understands the formatting of each line of text and not merely as a soup of pixels or sequence of glyphs. It knows where the cursor is and where it has been and box drawing characters are cell attributes and not actually part of the text. Any window, including accessibility, can decide to provide this ‘TPACK’ format, deferring rendering to the last possible moment. With that using better composition tactics comes cheap. A big win is using signed distance field forms of the glyphs to let one zoom and pan at minimum cost with maximum sharpness.

Just as the WM model is defined entirely through scripts, so is the local job control and presentation for the command-line shell. Not being restricted to terminal emulation allows for discrete parallel output streams, prompt suggestions and event notifications. These different types can all be forwarded and rendered or played back separately, with different voices at different amplitude and head-relative position.

Examples: Durden and Pipeworld

Returning to the Appl “WM” level, we have several ready to build and expand on. The longest running one, ‘Durden’ has been the daily driver workhorse for many years by now, and is the closest to reaching a stable 1.0 release, although documentation and dissemination lags behind its actual state.

While first a humble tiling window manager, it has grown to cover nearly all possible management schemes, including some quite unique ones. What is especially important here is that it is organised as a file system.

Every possible action (and there are well over 600 by now) has a file system path with a logical name, such as /global/display/displays/current/zoom/cursor, a user intended label, a long form description and rules for validating and setting reasonable values. Keybindings, UI buttons, customised themes and presets are all just references to such paths. This makes it friendly to audio-only discovery, nothing in the model gets hidden behind a strictly visual form. Even file browsing attaches itself onto the same structure.

It is the best starting point for quickly testing out new assistive input and mixed vision methods, and has quite a few of them baked in already. Examples on the internal experimentation board that has yet to be cleaned up for upstream are edge detection and eye tracker guided mouse warping and selection; content update aware OCR and content-adaptive luma stepping to catch ‘flashbangs’.

The upcoming release has text to speech integration added, with the t2s tool supporting dynamic voices, and assigning different roles to different voices.

Even more experimental is the dataflow WM, Pipeworld. Although the introduction post (2021) painted a broad picture with more fancy visual effects and its zooming-UI, it has real potential as an environment for the visually impaired for a few reasons:

All actions yield a row:column like address, and the factory expression that sprung them into place is kept and can and modified/repeated.
It is 2D spatial, there is a cursor relative ‘above’, ‘below’, ‘left’, ‘right’ that can map to positional audio for alerts and notifications and a distance metric can be used for attenuation.
As with durden, there is a file system like view of all actions, but also a command-line interface for any kind of processing or navigation.
Each row and cell has magnification-minification as part of the zooming user interface, with zoom-level guided filtration.

Trajectory

The current boom in machine learning based video/audio analysis and synthesis fits like a glove with the frameserver approach to division of responsibilities. While current OCR is based on Tesseract and the voice synthesis uses eSpeak, each integration is in the 100-200 lines of code in a stand alone binary that can be swapped out and applied without restarting.

Adding access to more refined versions with ‘your voice’, transcription, automated translation or image contents description that then apply across the desktop is about as easy at it gets. Such extensions, along with packaging and demographic friendly presets would be suggested points where the project can be leveraged for your own ends without the time investment it would take to get comfortable with all the other project components.

While the current topic / release branch is focused on the network protocol and its supported tooling, when that is at a sufficiently high standard, we will return to a number of features with impact for accessibility and overlap with virtual reality.

Examples in that direction would be individually adapted HRTFs and better head tracking for improved spatial sound and better handling of dynamic audio device routing when you have many of them.

A more advanced one would be indoor positioning for ‘geographically bound computing’ with information being stored, ordered, secured and searched based on where you are – as a trivial example, my cooking recipes are mainly accessed in the kitchen, my doomsday weapon schematics in my workshop and my tax and banking in my home office.

Posted in Uncategorized | Leave a comment

Whipping up a new Shell – Lash#Cat9

Posted on October 15, 2022 by bjornstahl

This article introduces the first release of ‘Lash#Cat9’, a different kind of command-line shell.

A big change is that it is communicating with the display server directly, instead of being restricted and filtered by a terminal emulator. The source code repository with instructions for running it yourself can be found here: https://github.com/letoram/cat9. A concatenation of all the clips here can be found in this (youtube-link).

Cat9 serves as the practical complement to the article on ‘The day of a new command-line interface: shell‘. That article also covers the design/architectural considerations on a system level, as well as more generic advancements to displacing the terminal emulator.

The rest of the article will work through the major features and how they came about.

A guiding principle is the role of the textual shell as a frontend instead of a clunky programming environment. The shell presents a user-facing, interactive interface to make other complex tools more approachable or to glue them together into a more advanced weapon. Cat9 is entirely written in Lua, so scripting in it is a given, but also relatively uninteresting as a feature — there are better languages around for systems programming, and better UI paradigms for automating work flows.

Another is that of delegation – textual shells naturally evolved without assuming a graphical one being present. That is rarely the case today, yet the language for sharing between the two is unrefined, crude and fragile. The graphical shell is infinitely more capable of decorating and managing windows, animating transitions, routing inputs and tuning pixels for specific displays. It should naturally be in charge of such actions.

Another is to make experience self documenting – that the emergent patterns on how your use of command line processing gets extracted and remembered in a form where re-use becomes natural. Primitive forms of this are completions from command history and aliases, but there is much more to be done here.

Prestudy

I collected history from a few weeks of regular terminal use along with screen recordings of the desktop window management side. I then proceeded to manually sift through these, looking for signs of poor posture. I found plenty.

This is a humbling experience. The main conclusion drawn is that I am mostly a hapless twit who default to repeating the same things hoping for different outcomes. I consistently confuse ‘src’ and ‘dst’ for ‘ln -s’; ‘ls’ gets spelled ‘sl’ much too often; ifconfig remains the preferred choice to ‘ip’ even though its main output typically is ‘file not found’ these days; nearly every tool that expects regular expressions are first fed plaintext strings. When I actually want to use a regular expression I consistently pick the wrong expression language.

The signal to noise ratio in the history is abysmal. About 90% of scrollback contents were leftovers from cd, ls and tab completion sprinkled with repeated runs of the same command through sudo, with minor tweaks to the arguments or to get a redirection for stderr. Redirections that were then left in the file system, with descriptive names like “boogeraids2000”.

The screen recordings were also revealing. Some notable time sinks:

Copy paste across line-feeds and resizing windows to deal with incorrect wrapping.
Spinning up new terminals to work around man or vim hogging the alt screen.
Digging around in ps/proc/… for PIDs.
Redirecting to temporary files to transfer job outputs between windows or for later comparison.
Switching vim buffers between horizontal/vertical to fight the tiling WM.

All these can be fixed with relatively minor effort.

Improvements

Get the prompt out of the way.

Starting with the prompt – obvious bits are that its contents should be ephemeral and disappear after running a command. It should reflect information about the current context (directory, etc.) and whatever else of immediate short lived value. The point is to clean this up:

Old prompts polluting the history, bad / silent commands not pruned, dated command contents left around, …

Instead we get this:

Video clip of new prompt behaviour in two shell instances running side by side

Prompt is updated live regardless of input and can change its layout template dynamically.
Prompt format and contents depends on window management state (focus, unfocus).
Silent commands are kept away from the history.
Completions come up without interaction and do not trample/shuffle actual contents.
Commands that only resulted in errors are automatically delay purged.

Compartment

The previous options for compartmentation was a choice between juggling between a ‘foreground’ job and ‘background’ jobs. For this to work you needed either a fragile weave of signalling (SIGSTP, …) and file redirections — or spin up new terminals, either through a terminal multiplexer (a terminal emulator inside a terminal emulator inside ..) or new windows.

I find those solutions both noisy and distracting. Instead, I now have this:

Video clip of job compartments.

Every command-line submitted now becomes its own job.
Jobs can reference each other.
Job context (environment variables, working directory, …) is saved and tracked.
The jobs are presented in order of importance (active ones take priority over passive ones).
Spawning new jobs automatically folds old ones into a collapsed form.
Individual controls, status and statistics are added to a stateful bar at the top of the job.
Job contexts can be reused for new commands.

Remember everything, but right to be forgotten.

In the terminal world, all job outputs either get composed unto one shared buffer with a certain amount of memory (scrollback history); fight for a scratchpad (“altscreen mode”) or are redirected to files or other jobs. This happens regardless of stream source or job state (foreground/background).

With real compartmentation and much larger memory and CPU budgets thanks to server side text rendering, we can do much better:

Stdout and Stderr are tracked separately.
All job output is kept, tracked and addressed individually.
Contents can be forgotten, or selectively processed.
Completed jobs can be repeated, appending to the existing output or replacing that of previous runs.
Jobs can be repeated with an edited command-line.

Cooperate with the outer windowing system

Now that the shell can talk directly to the window manager without having the conversation dumbed down by a terminal emulator sitting in between, new integration options are possible:

Showing different forms of desktop / window manager cooperation

Snapshot the output of a job to a new window.
Window creation hints to window manager, like vertical split or tabbed.
Open applications and media embedded, with controls for position and size.
Detach and reattach embedded media, preserving input routing.
Directly route contents to clipboard and other data sharing mechanisms.
Trigger GUI file pickers.

Let legacy in

Now with a fairly functional environment, the last part is to account for all the edge cases where we still need access to the old world in various degrees:

Showing how traditional terminal emulation and legacy shells can still be accessed

Send data from a job to external processing pipes (#0 | grep hi).
Request a new window, attach a terminal emulator to it and run a pty dependent command (!vim).
Setup a PTY and attach a VTxxx view to it: (p! ls –color=yes).

Streamline command structure

The foundation to cat9 is the command-line language itself. All the UI elements that you see, mouse gestures and key bindings map to the same things that you could type in manually:

Showing the connection between command line, input bindings, mouse input and event hooks

Hooks and event actions can be added after a command has been setup or is running.
Mouse actions, bindings (clicking shown in clip: view #csel $=crow as in ‘cursor job, cursor row’).
Aliases and pre-commit expansion.

With these basics sorted out, it is time to build something more interesting.

Special Topic: Views on Life

Now that jobs keep their data around in nicely tracked structures rather than a prematurely composed and broken ‘scrollback buffer’, we can do something more. While we have data in its raw form, we can look at it through various lenses to get different representations of the data. These are baked into the ‘view’ builtin.

Simply put, they parse the data and reformat the contents by adding annotations, structures, formatting and so on. The current builtin ones are all shown in this clip:

Dynamically switching between job visualisers/parsers, “views”

In this one you see ‘wrap’ and ‘filter’ along with some options like line numbers and column wrapping. Filter even goes so far as to have an interactive mode that live-applies the filter as it is being written.

With the original data retained, re-executing previous pipelines is not needed, and the choice between using the formatted output and the original data is available when copying in/out.

This is one of the features that will be expanded heavily in future versions as we try to improve the presentation of the many ad-hoc text formats.

Special Topic: State Actor

This is a good one. Regular windowing systems provide Clipboard as well as Drag and Drop as forms of interactive data sharing. Some go further and also allow sequenced picking/sharing, like the “share” button popular in mobile operating systems. Arcan adds a state store/restore action to the mix.

This means that at any point, the windowing system can request that a state snapshot is created, or request that the application reverts to a provided one.

Examples of what gets stored in such a state blob here are configuration changes; command history; environment variables; aliases and so on. While this offloads the ‘where are my dot files’ responsibility, more interesting is that states can be transferred between instances at runtime.

Combine this with the job system: by marking a job as persistent, the command creating a job will be added to the state store. In the following clip you can see it being used to an interesting effect:

Per job state persistence across sessions

I first start a new cat9 session, run two jobs and mark one as persistent manual and other as automatic. Shutting down and restarting and you can see how the jobs come back, with the automatic one starting immediately. In the next clip I go one step further and copy the state between two live instances.

Dragging a persisted job between independent instances

When combined with remote shells, this becomes a really potent administration and automation tool. Perform a task once; visually confirm that the results matched expectations; Save the state and replay wherever and whenever. Use that for knowledge sharing, or hook it up to an event source for snapshotting and rollback to give anything history/undo.

Special Topic: Frontending

There is little consistency between many popular tools, no matter if they come as “argv hell”, “CLIs within the CLI” or “lots of small binaries”. This is natural, but also undesired from a user perspective. It feels rather futile to have gone through the strides of building a CLI that behaves like you want it to — just to have the work be undone by the tools you launch from it.

I am no stranger to uphill battles, but the odds of getting the likes of wpa_supplicant, git, gdb/lldb or ffmpeg to change their evil ways and follow the one true path are slim to none. The passive aggressive form of dealing with this is what bash_completion and the likes do – create helper scripts that at least make polite suggestions while building the command line. This works poorly when the tool is interactive. Other options include defining better programmable interfaces, language server style external oracles, then hope for the main drivers to convert.

With the extensive scripting, parsing and rendering options available to us now – there is a more actively aggressive way. In Cat9, you can define multiple sets of builtins and views, and switch between them. This means that you can create a set of builtins for a specific logical function, like networking, programming or debugging, then swap between those as needed.

This, along with views, will be the more active area being developed for future releases. The following short clip shows an early ‘in progress’ such set for networking.

Swap sets of builtins commands at runtime to build logically grouped CLIs

In the clip you can see the set of builtins being swapped to ‘networking’ which new builtins such as ‘wifi’. You can see the live completion of available SSIDs appearing asynchronously as a scan is complete. Commands can still be forwarded ‘raw’ with the output packaged into its own job that can be used by the other builtins. It can also attach polling status about signal levels and connection into the prompt, using all the same infrastructure as the previous demonstrations.

In Closing

I hope this conveyed some of the benefits of leaving the shackles of terminal emulators and its more abstract form of ‘virtualisation for compatibility through emulation as default’ behind. There are a whole lot more ideas to squeeze into this setup now that all the grunt work has been dealt with.

Better CLIs as part of better TUIs are key for making professional computing more accessible to budding sprout experts and cognitively challenged alike. The building blocks are here for your ‘speech- assisted’ command-lines without having to have a screen reader try and make sense of a poorly segmented word soup, or for your red team approved secret “leave no trace” cleanup sauce.

The last article in this series will dip into the programmable surface – how the APIs replacing curses work and integrate with the display server / window manager.

Posted in Uncategorized | 12 Comments

Arcan 0.6.2 – It’s all connected

Posted on July 15, 2022 by bjornstahl

This release should put us at about half way through the planned work for the networking focus set of releases (0.6.x), a scope roughly defined by the article on A12: Advancing network transparency on the desktop and the one on Arcan as OS design. Alas, it is also the single most difficult and time consuming part left on the entire roadmap.

Before dipping into the major additions and changes, I will break form a little and dwell on what is going on and why.

From the (set of design principles) that we follow; number four “Make State mobile“, five “No State left behind” and six “Privacy fights back” are at the center of attention here.

The idea is to get a protocol which replaces mDNS (local service discovery), SSH (interactive textual shell), X11/VNC/RDP (interactive graphical shell), RTSP (streaming multimedia), HTTP (networked application retrieval and state synchronisation) and a few other lesser knowns, and we are nearly there feature wise.

This is “less” effort than one might think as so much code is needlessly repeated again and again by not leveraging the many bits the designs of these protocols have in common, justified only by legacy and history and not technical and architectural merit.

This is similar to the IPC situation locally that we solve with SHMIF — while others will get to continue to enjoy the cacophony of IPC systems (D-Bus, Wayland, Pipewire, VTxxx, …) where the difficult parts (authentication, discovery, synchronisation, least-privilege separation, zero-copy ownership transfers, queue and resource management, resilience, …) keep on being implemented again and again and again in incompatible ways yet are supposed to work together to solve actual end-user problems — this is the price to pay for being stuck in ‘first order’ forms of reasoning, but simplicity is systemic.

With this protocol as a building block, every single component in Arcan can be de-coupled from one device and re-coupled to running on another – from media parsing and decoding to accelerated rendering and encoding.

To illustrate the point and the self-imposed “grand challenge” — in this photo from one of my labs are the set of user facing devices in the weekly rotation currently capable of running Arcan; each with some quirk or property that makes it interesting to keep in rotation (this is also the least depressing lab, wait until you see the one for displays or the one for input devices).

Together they represent a sort of lighter extreme here:

The devices that are active should be able to share workload and work ‘ as one ‘.
Repurposing any device to a ‘one ephemeral task’ runner should be achievable within minutes, and a queue of prepared runners should make activation near instant.
Installed static state (what it can do) should be known, dynamic state (what you changed) should be extractable.

These tactics should serve to raise the cost for both reliable persistent exploitation and for evading detection considerably. They should also work well for building intuitive and ergonomic compartmentation for harm reduction against both ‘smash and grab’ style attacks, micro-architectural side-channels and against physical theft.

All this across networking infrastructure that is assumed to be unreliable (no global clock), peer-to-peer (no DNS by default) and only accidentally connected to the Internet — what some would call air gapped.

As a refresher, the initial proof of concept – rougly the state in ~2019 – can be seen in this clip:

There will be reason to return to the topic in the near future, but for now, let’s move to the big ticket items for this release. For the both more- and less- detailed list of changes, see the regular Changelog.

Onward to the big ticket items:

Networking

The two main tools for using our network protocol are ‘arcan-net’ (standalone) and ‘afsrv_net’ (that transparently maps to script-reachable functions in our Lua scripting layer).

arcan-net now has the first draft implementation of ‘directory’ mode, which will be used for three purposes; as a discovery rendezvous in WANs where other communication might also be needed (proxying or NAT punching), as a trusted third party state store, and as an arcan appl host.

This part of directory mode covers the arcan appl host setup. It lets any arcan installation share the set of appls it has with any other, and act as a state store (configuration persistence).

There are articles in the queue about the implications of this but as an example out of many — it means I can have an offline ‘build box’ that generates device tailored ‘live’ images (e.g. the hacky scripts in arcan-void-mklive for now); injecting authentication keys into the image and whatever device boots from it can load/restore the same persistent desktop from an otherwise ephemeral read-only environment. The image is logged and attested, and can act as source for comparison against the device at a later date.

For the sake of it, arcan.divergent-desktop.org is currently hosting ‘durden’ and ‘pipeworld‘; subject to me breaking things during daily experimentation. It was started like this:

ARCAN_APPLBASE=./shared ARCAN_STATEPATH=./state arcan-net --directory --soft-auth -l 6680

(the –soft-auth makes it about as insecure as a world of self-signed https, it is unwise to run appls from this server on anything sensitive). In this clip you simply see me connecting to it over a fairly slow link:

running arcan-net –soft-auth arcan.divergent-desktop.org pipeworld

The state store bits are less promiscuous and still require an authenticated key exchange. Let’s write a simple arcan appl called ‘demo’:

mkdir demo
echo '
function demo()
 local counter = get_key("counter") or "A"
 local img = render_text("Hi " .. tostring(counter))
 show_image(img, 200)
 tag_image_transform(img, MASK_OPACITY, function() shutdown() end)
 store_key("counter", counter .. string.char(string.byte("A") + #counter))
end' > demo/demo.lua

The following clip is the result from me running this appl a few times:

state accumulation across multiple runs

You need to squint a bit, but for each run another letter is attached, walking from ‘A’ and onwards. The point is that this state is managed server-side, had the same public key (== identity) been running on another device, it would have continued where the other last synched :- remote, persistent, observable.

arcan-net now caches some binary transfers and ramp ups transfers after confirming something is not already in the cache. This is most useful for synching fonts. It also automatically compresses / decompresses binary file transfers.

afsrv_net has gotten a ‘sweep’ discovery mode, which enumerates the ‘petnames’ in the keystore and periodically tracks which ones that have started- or stopped- responding.

The protocol now covers role negotiation as part of the initial handshake, where each side can be either Source, Sink or Directory. Pair Source-Source or Sink-Sink will disconnect with an error.

Finally, the protocol itself has added role negotiation as part of the initial handshake. Since both sides can act as either source, sink or directory – we now pair that so trying to connect a source to a source or a sink to a sink would generate error notification.

Lash/TUI/Terminal

Our ncurses replacement, arcan-tui, has received a number of fixes in its basic widgets, most notably in its ‘readline’ implementation. The corresponding Lua bindings have also received a lot of attention, and are now suitable for writing most kinds of CLI/TUI applications.

The ‘terminal’ frameserver (afsrv_terminal) that has long acted as our terminal emulator of choice, now has an additional mode of operation (ARCAN_ARG=cli=lua) that we call ‘Lash’ (LuA SHell). This pulls in a Lua VM along with the TUI API bindings, coupled to a simple chainloader script that pulls in a custom shell ruleset from (default) $HOME/.arcan/lash.

The following clip demonstrates some capabilities of a shell written on top of this – “Cat9”.

The reasoning behind all this is covered in a separate article, ‘The Day of a new Command Line Interface: Shell‘ while Cat9 will be getting a more thorough introduction a little later when I am satisfied with its feature set and implementation quality.

A large target for the API/bindings is to quickly build / wrap system services (network device control, file system mounting, …) as frontends to existing tools (wpa_supplicant et al.) that integrate and compose similar to how we illustrated with the tray icon handler.

Namespacing

Arcan splits up its resource handling in a number of static namespaces. The rules behind many of these are quite complex, and really only makes sense when considered as a design to avoid some of the many mistakes in Android and dates back to the old days (2.x+) of that OS, and as preparation for sandboxing resource access in the context of loading appls as shown in the networking section.

In this patch, the configuration database managed through the arcan_db tool can be used to define dynamic user namespaces. A simple example would be:

arcan_db add_appl_kv arcan ns_myhome "Home:rw:/home/me"

Which would allow Arcan instances read/write access to /home/me with the symbolic name “myhome” and the user presentable label as Home. This fits in with the ‘wrapping tools around tui’ point from the lash/TUI section to grant/revoke access to storage as it becomes available.

Decode

The ‘catch all’ dependency sponge afsrv_decode (external client with a stricter ruleset on behaviour for tighter sandboxing and privilege separation) that absorbs parsers (one much beloved exploitation target in offensive security) has received basic support for PDFs and similar vector formats via MuPDF.

In the clip below, you can see how the preview feature in the Durden browser HUD spins up a bunch of decode-pdf processes and then toggles to navigate one.

clip showing live PDF previews

Encode

On systems that support the ‘v4l2-loopback’ device interface, the encode will now support exposing its input as a v4l2 device. This means that all the sharing and streaming features can emulate a webcam device for applications that trust such things. The video below shows exposing a window playing back a movie as being treated as a webcam in Chrome:

clip showing arbitrary sharing as v4l2 device

Posted in Uncategorized | 3 Comments

The Day of a new Command-Line Interface: Shell

Posted on April 2, 2022 by bjornstahl

This article continues the long-lost series on how to migrate away from terminal protocols as the main building block for command-line and text-dominant user interfaces. The previous ones (Chasing the dream of a terminal-free CLI (frustration/idea, 2016) and Dawn of a new Command-Line Interface (design, 2017)) might be worth an extra read afterwards, but they are not prerequisites to understanding this one. The practical demonstration to this can be found in Whipping up a new Shell – Lash#Cat9 and the series was concluded with Sunsetting Cursed Terminal Emulation.

The value proposition and motivation is still that such a critical part of computing should not be limited to device restrictions set in place some 50-70 years ago. The resulting machinery is inefficient, complex, unreliable, slow and incapable. For what is arguably a strong raison d’être for current day UNIX derivates, that is not a strategic foundation to neither rely nor expand upon.

The focus this time is about the practicalities of the user facing ‘shell’ — the cancerous and confused mass that hides behind the seemingly harmless command-line prompt. The final article will be about the developer facing programming interfaces themselves as application building blocks, how all of this is put together, and the design considerations that go into such a thing.

This article is structured as follows:

What is ‘Shell’ – gives a short primer about the specific role the CLI ‘shell’ plays.
‘Simplifying and Exemplifying’ – shows how the current stack is reworked.
‘Gains’ – goes into the capabilities that new shells can take advantage off.

The following clip is a very quick teaser from using one in-progress replacement shell that has been built using the tools that will be covered here. While the shell itself will be presented in greater detail in another future article, it is available for adventurous souls to poke around with [and can be found in this GH repository].

Early days of Lash#Cat9, a terminal-liberated shell

Starting with other/related work; many have attempted to deal with the embarrassing legacy of terminals and their inherent limitations.

“NOTTY” focused on replacing the “in-band” signalling and command format. This addresses some of the protocol issues, but has an impedance mismatch with what the rest of your desktop or basic rendering expects, and the emulator-shell split remains.

Some, like “Hyper” and “Upterm“, rewrites the terminal emulator and shell in more and more advanced UI frameworks to get better cooperation with an outer graphical shell — inviting in all the complexities of rancid behemoths like Electron and GTK/Qt while still leaving the protocols and TUI libraries in their currently poor state.

Others, like “Notcurses et al.” replaces the key TUI libraries like Curses, Readline and so on. These fix neither the emulator nor the protocols. Worse still, a few make the protocol situation worse by introducing sidebands, hard-coding escape sequences or introducing new ones.

Then there are a number of attempts like jc and relational-pipes that proxy or modifying the exchange format between stdin/stdout in a single pipeline, but that is mostly orthogonal to the problem discussed; solving for the others would provide another pathway for negotiating multiple, concurrent, exchange formats.

What is ‘Shell‘?

First, if you want better and deeper reading into the subject, I would suggest:

(Oil)Why a new Shell – Disambiguation of programming interface versus user interface roles.
Terminal Vision – Verbose walkthrough of the entire space.
Windows Command-Line: Background – Condensed walkthrough with MSWIN bias.

There are many more to be had, but piling them on would mainly add to the existing confusion between terms (console, shell, terminal, tui, gui, ..); these terms have contextual and historical interpretations that are slightly incompatible depending on where you are and where you come from, which makes discussing the topic even harder.

Here is a rough breakdown of different components and roles sufficient for the scope of this article:

Model over interaction between graphical shell, textual shell and their building blocks

Here, ‘Shell’ (as part of providing a textual shell as a command-line) is the first in line to consume and work with a terminal emulator, through a preassigned set of file descriptors (0, 1, 2) mapped to a terminal or pseudo-terminal device. Shells and some applications alike test and change their behaviour depending on the state of these descriptors (isatty(3) to tcgetattr(3) to ioctl(2)), sometimes referred to as an ‘interactive’ mode. These continue (unless explicitly told not to) to share and inherit into new jobs over this serial communication line.

The protocol/IPC marked blocks mask quite a bit of nuance as to how data exchange work and they are not created equal; the sockets used to communicate with the display server may be unpleasant, but are still infinitely better than this mix of “tty” devices, signalling groups, sessions and stdio used after the ‘terminal emulator’ stage — you do have to sit down and implement both consumer and producer side of the terminal instruction sets to get a fair grasp on just how bad things are. For the Linux kernel alone, the TTY layer is one that not even the most seasoned of developers wants to touch.

While the ’emulator’ part is often stripped and just referred to as ‘the terminal’ it is very much an emulator of ancient hardware (or rather the amalgamation of tens to hundreds of different ones). That fact should be stressed to emphasise the absurdity of it all — especially given the end goal of reading key presses and writing characters into a grid of cells.

There is valuable simplicity to TUIs (out of which CLIs are but one possibility), but that simplicity is wholly undone by the complexity of terminals and how the ‘instruction set / device model’ they expose makes the shell user experience itself unnecessarily hard to develop and provide.

It should be emphasised that the terminal emulator is also a poor take on a display server. This will become relevant later on. As such, it is at a disadvantage against better display servers for many reasons – one being that each job/client is not given a distinct bidirectional connection for data exchange, but instead share a single triplet of “files” (stdin, stdout, stderr), combined with a protocol that was never designed or intended for this.

This shared triplet as well as ‘multiplexing’ is important. Say that one of the shell-launched jobs is another cli shell itself, like gdb or glorious ed. Since the data is in-band over the shared set of stdio slots mapped to the kernel provided device, the previous shell(s) either needs to be full emulators on their own, or it they cannot safely intervene or layer other things on top of whatever the job is doing.

Even then it has few options for reliably restoring the emulated device state. This is why accidentally cat:ing something like /dev/random will quickly give you a screwed up prompt; it is likely that some of the many sequences that change character map, cursor or flow control was triggered – yet if the shell continues on unawares, the scroll-back history is forever tainted.

There are certainly was to hardcode and reset some state between jobs explicitly – and some shells do – but that also serve to mask the danger and fundamental issue with the design; it is executing random instructions in a complex and varying instruction set.

Before and after cat:ing something seemingly harmless.

Back to the model. The ‘graphical shell’ and ‘textual shell’ refer to the abstraction the user actually interacts with. The other ‘shell’ (as in bash, zsh, …) serves at least two roles. First you have a primary role as a ‘window manager’ of sorts. This provides the “prompt”, parsing the command-line into ‘built-in’ command execution; constructing processing pipelines or executing ‘fullscreen’ applications and choosing which pipeline that is currently being “presented”.

The other thing it provides is a scriptable programming environment (as in the scripting part of shell-scripting). This is a secondary feature at best, and not at all necessary. In a command-line environment free from the legacy of terminals, current shells can continue to play this specific role — even if that is a job that should be left to more competent designs.

For the ‘window manager’ role: these range from (visually) simple ones like the foreground/background of bash, zsh and fish to more complex such as tmux and screen (sometimes referred to as multiplexers). The first ones tend to focus on how to articulate jobs and their data exchange, while the second on tiling like window management.

In order for these more complex ones to achieve window management, they also went through the strides of writing additional terminal emulators and embedding other shells (recursive premature composition) in order to output to yet another terminal emulator as there is no proper handover or embedding mechanism in place. It is terminal emulators all the way down and a reason why we have to talk about how shell is complicit in all of this – even the basics of what is expected from ‘reading a line’ requires dipping into terminal drawing, cursor and flow control commands.

This division also reflects the ‘modes’ of how the terminal protocols operate, which, in turn, tie back to what the computer output device actually was in various parts of the timeline. Recall that once upon a time the output was a printer (“line-based”) and only later became monitors (“screen-based”) with incrementally added luxuries like colour and interesting tangents like vector graphics. Moving the cursor around arbitrarily back across previous lines is a privilege — not a right.

You can see some of this bleed through with ‘scroll-back/history’ working poorly (or not at all) in the screen mode, with “tab/context” suggestion popups causing scrolling and weird wrapping visuals when the prompt is at the edge of the last ‘line’ on the ‘paper’ or – heavens forbid – you try to erase previous characters across pages or newlines.

If you want the worst of both worlds, go no further than regular ‘gdb’ (as in the GNU debugger) and go back and forth between ‘tui enable’, for the luxury of seeing the source code you are debugging at the cost of scrolling back through the data you needed, and ‘tui disable’ where every ephemerally relevant output gets committed to the ‘paper’ and the data you needed quickly scroll off into the far away distance.

You can also see it in the ‘tab completion’ output in a line-mode shell having to ‘add lines’ in order to fill in the completion, and those are kept there, polluting the history — as well as in the special treatment certain characters like ‘erase’ received. The man page to ‘stty‘ (or worse still, how a tty driver is written) is a brief yet still frightening look into the special properties of the underlying device itself.

For both modes, the protocols restrict what these two kinds of text-dominant shells can do and how they can cooperate with an outer graphical one. In the model presented so far, there is zero real cooperation between the text and the graphics shells. In reality, there is some, but implemented in a near impenetrable soup of hacks involving a forest of possible sideband protocols — and availability vary wildly with your choice of emulator, the protocol set it is defined to follow, and the contents of a capability database (terminfo/termcap).

As an exercise for the reader, try to work out how and why you can- or why you cannot-

Paste a block of text from your desktop clipboard into the current command-line.
Drag and drop a file into your shell and have it stored into the current working directory.
Click a URL in the command-line buffer history and have it open in your browser.
Redirect the output of a previous command to another window or tab.
Fold / unfold the presentation of output from previous commands.
Have an accurate clock in your prompt that updates by itself.

Neither of these are particularly exotic use-cases, some would even go so far as to say that these are fairly obvious things that should be trivial to support — yet if you think the answers to any of these are simple and easy, you missed something; there is a Lovecraftian horror hiding behind each and every one.

Simplifying and Exemplifying

Using the model from the previous section, we restructure it to this:

Terminal, TTY and Signalling laid to rest – the shell being a regular client to the one and true display server without an emulator of ancient hardware in between.

The terminal emulator is gone, the rightfully maligned ‘tty’ layer hiding in the kernel is gone. There are now a whole lot of ways for the graphical shell to cooperate with the textual one.

For this to work and provide enough gains, a lot of subtle nuances of the IPC system need to be in place; the one in Arcan (shmif) was specifically designed for this as one of several ‘grand challenges’ that were used to derive the intended feature set many, many, many years ago.

One of the main building blocks is ‘handover allocation’ – where the shell requests new resources in the graphical shell on behalf of an upcoming job, and then forwards the primitives needed to inherit a connection into the job, retaining the chain of trust and custody. Another is the live migration used as part of crash resilience, which eliminates the need for multiplexers as each client can redirect at runtime to other servers by design.

The main Arcan process takes the role of the display server. With that comes a pick of graphical shell, ranging from the modest ‘console‘ to the more advanced (durden, pipeworld, safespaces). Do note that there is a choice in building Arcan as the system display server with authority on GPUs, input devices and so on – or as a regular graphical client that you would run in place of your terminal emulator inside Xorg or some Wayland compositor. You will lose out on several performance gains, and some nuances in how window management integrates, but many features will remain.

For the ‘text shell’ block, there is a little bit more to think about. While it is perfectly valid and intended to use libarcan-tui to write your own here, one also comes included in the box. A regular Arcan build produces ‘afsrv_terminal’ (or arcterm as it is referred to internally).

This is a terminal emulator with a secret; if the argument “cli” is passed, it switches to an extremely barebones built-in text-shell and skips all the terminal emulation machinery. It is intended to provide only the absolutely necessary bits for something like booting a recovery image for an OS. If you are C inclined, this is a fair basis to expand on or borrow from.

In the following clip from the Pipeworld article (a graphical shell), you can see it in use in the form of the small CLI cell where I am typing in commands.

afsrv_terminal “cli” mode used to launch processes in cooperation with a graphical shell.

While things go quite fast, you might be able to spot how it transitions from a command-line as part of the graphical shell at 0:02 into a terminal emulator liberated textual CLI shell. You can then see that the jobs which spawn are their own separate processes and do not multiplex over the same pseudo-terminal devices (as that would make more than one ‘tui’ like job impossible or require nesting composition/state through something like screen or tmux, reintroducing the premature composition problem).

The twist is that these jobs are negotiated with the graphical shell being aware of their purpose and origin. This is a feature that runs deep and dates far back, already in use at the time of the One Night in Rio – Vacation Photos from Plan9 article. It is also the reason why jobs spawn as new detachable windows, yet retain hierarchy in the optics of the window management scheme.

Another setup can be found in this clip, also from Pipeworld:

Multiple afsrv_terminals built to cooperate and mix/match between interactive and pipeline- processing.

Here we demonstrate how a processing pipeline can be built with separate outputs for each task in a pipeline, while at the same using stdin/stdout to convey the data that is to be processed. Any single one of these can be a strict text client, an arcan-tui one or wrapped around a terminal protocol decoder – yet both interacted with- and tracked- independently.

The ‘cli’ mode takes another argument, =lua. This enables a Lua VM, maps in API bindings, loads a basic script harness that provides some very crude and basic commands, but allows for plugging in a custom shell, like the one mentioned at the beginning of the article.

In this clip we can see a prompt from that shell where we run a job, and popup the ongoing results from that job into a window of its own with a hex view. The graphical shell, here operating in a tiling window manager setup, respects the request for this window to be a tab to the current one and creates it as such.

To add legacy to injury, this clip shows running a new job as a separate vertical-split window, wrapped around a terminal emulator. The standard error output, however, gets tracked and mapped into the shell view of ongoing jobs. Paste actions from the graphical shell has been set to accumulate into the data buffer of a job. This feature disabled and the paste action instead copies into the readline completion set, inserting at the cursor if activated.

In this clip we go even further – the shell opens a media resource, requests it to be embedded into its window. The resource scales, repositions and folds accordingly, yet the user can ‘drag it out’ should she so desire. The video playback in this case is delegated to one-shot dedicated processes, no parsing or exotic dependencies are imposed on the shell process itself.

Embedded composition of an external delegate, with user initiated decomposition.

The process responsible for composition gets to composite and gives the user independent controls for lossless decomposition.

In the following clip we see other forms of metadata interaction – the shell requests that the user picks a file, which it then redirects into a local copy inside the current working directory. The file picking it outsourced to whatever an outer graphical shell provides, and the chosen descriptor is forwarded into the text shell process that then saves it to disk. The process is repeated by picking an image file that is then opened and embedded similarly to the previous clip. Had the textual shell been running remotely or in some distant container, the transfer would have gone through just the same. The underlying mechanism works for explicit load-store, cut and paste as well as drag and drop.

Explicit file-picking into binary paste into embedded media viewing.

Gains

There is much more to be had than the parlour tricks shown in the previous section. What can, immediately, without rose-tinted glasses and speculation, be gained by leveraging this infrastructure?

Data Communication – With an actual IPC system to connect through, it is possible to:

Leave STDIN/STDOUT/STDERR as pure data channels, not mixing in UI events or draw commands.
Accidentally catting a binary file or device cannot break the UI state machine.
Explicit serialisation of state (store / restore runtime config) without filesystem trails.
Every single command in a pipeline is left alone and kept separable between jobs and they do not interfere with shell communication. Thus each tool in a pipeline can provide both in-stream processing and an interactive user interface at the same time.
All interfaces strongly encourage asynchronous processing.
Binary blob transfers pass as file descriptors locally, and scheduled/multiplexed over the network.

Input – Having an event model that is not limited to a range of reserved values in the ASCII table delivered over a pipe allows:

Non-ambiguity – there is a discernible difference between pressing the ‘ESCAPE’ key and the ESC-ASCII character that was used to mark the beginning of an escape sequence.
Modifiers exist, CTRL+C is a symbolic C key with CTRL modifier, that does not equal ^C or it being magically translated to broadcasting SIGINT.
Pasting is separated from entering text, is separated from pressing keys, and can undo as a discrete whole.
Mouse input is predictable and reliable and can be combined with keyboard modifiers.
Keyboard shortcuts are announced and semantically tagged, letting the graphical shell provide automation, rebinding and mapping to assistive devices.

Integration: With the same language for expressing graphical clients as for textual ones:

Clients can be redirected between shells at runtime, even across a network connection.
Graphical shell capabilities can be leveraged for universal file picking.
Decorations like borders and scrollbar are deferred to the outer graphical shell, avoiding mixing data with metadata in the grid by drawing ‘line characters’.
If the graphical shell is sufficiently capable — notifications, alerts, file picking and popups become available and behave according to the rules of the graphical shell.

Visuals and Performance:

The rendering responsibilities have been moved to the display server end of the equation, while the fonts currently in use are being passed as reference objects for features that need it (ligatures, …). There are no pixel buffers being passed around from the ’emulator’ client and the shell is explicit about when it is time to synchronise content onwards.

Tear-free updates and resizing.
Presentation buffer back-pressure control is deferred to the job, no more heuristics in the emulator.
Colours are always 24-bit or from a semantic palette (no more “red” is now “green”).
Embeddable interactive media content (* assuming an outer graphics shell support it).
Synchronised presentation, atomic commit of change sets and only updates between sets are synched.
Glyph caches can be shared between multiple shell instances and other clients.
Glyph indices and availability are queryable so fallbacks can be chosen.
Single buffered ‘chasing the beam’ style rasterisation for lowest possible latency.

Accessibility/Internationalisation:

Separate on-demand alt-views to propagate compacted accessibility friendly contents.
Semantically tagged input lets screenreader say ‘paste into job #1…’ rather than ctrl-v, proper separation between I/O streams makes it trivial to build ‘audio-only’ shell.
Locale properties on input language and presentation language can change at runtime and is the property of a window, not a process global passed through environment variables.
Everything is unicode.

These are features with direct impact for writing better shells. Then there are parts for writing better TUI applications and other command-line tools in general, but that is for another time.

Posted in Uncategorized | 1 Comment

Arcan 0.6.1

Posted on November 19, 2021 by bjornstahl

Time for another fairly beefy Arcan release. For those thinking in semantic versioning still surprised at the change-set versus version number here (‘just a minor update?’) do recall that as per the roadmap, disregarding the optimistic timeline, we work with release-rarely-build-yourself thematic versions until that fated 1.0 where we switch to semantic and release-often.

On that scale, 0.5.x was the display server role, 0.6.x is focused on the networking layer as the main feature and development driver. 0.7.x will be improving audio and some missing compatibility/3D/VR bits. Then it gets substantially easier/faster – 0.8.x will be optimization and performance. 0.9.x will be security — hardening attack surface, verification of protections, continuous fuzzing infrastructure and so on.

After the FreeNode IRC kerfuffle, do note that the oldtimers IRC channel has moved to Libera and whatever remains of the old freenode channel is to be assumed malicious. For some of the younger folks we have added a Discord server pending more manageable and sane alternatives – pick your poison(s).

The detailed changelog can be found here, and the release highlights are as follows:

Core
Networking
Client support (X11, Wayland, …)
Text-UI API
Frameserver
Native Graphics
Lua API
Build and Ports
Tools

Core

For the main engine there has been quite some refactoring to reduce input latency; better accommodate variable-refresh rate display; prepare for asymmetric uncooperative multi-GPU and GPU handover; explicit synchronisation and runtime transitions back and forth between low (16-bit) to standard (32-bit) to high-definition rendering (10-bit + fp16/fp32).

The biggest task has been to add support for “direct-to-drain” in the event queue management. This allows for more event producers (e.g. input drivers) to get a short path for minimising input latency. It also allows our scheduler (“conductor”) to reorder event delivery to better accommodate a WM specified focus target.

The major challenge has been that some operations are unsafe to perform when a GPU is busy drawing and scanning out to displays. Misuse can cause subtle visual glitches or stutter in inputs that are fiendishly hard to catch and diagnose. This was a big ticket item as part of the synchronization topic.

Since the Window Manager needs to be able to be first in line to filter- translate- remap- resample- mask- and otherwise respond- to input events — quite some care had to be taken to extend the API to make sure that it remains ergonomic enough to respond to input events in both a safe mode and a low latency one.

With this also comes the ability to allow selected clients to get a higher direct-to-drain queue dispatch so that we can have more efficient hot pluggable external input drivers, something that would, for example, benefit our VR service.

A12/Arcan-Net

As mentioned before, the main focus of this development branch is still the networking layer, and our related SSH/VNC/RDP/X11 replacement A12 network protocol, which is a priority for some of our sponsors.

While much of this work is not directly visible, some of the bigger changes have been to drop any and all traces of the DEFLATE support and go all in on ZSTD for binary transfers, TPACK and lossless image modes.

Connection-modes: The ‘last’ missing connection mode has mostly been sorted, though there are some nuances still to work out, particularly for serving complex clients like Xarcan.

To elaborate: since the tools (not the protocol or library) were built to mimic the use of X11 style forwarding, the ‘normal’ default push model worked like this:

"sink-inbound": arcan-net -l 6680
"source-outbound": ARCAN_CONNPATH=a12://some.ip:6680 my_client

This does not work for all cases and some find it rather non-intuitive to use. In some scenarios you might want to ‘serve’ an application instead:

"source-inbound": arcan-net -l 6680 -exec /usr/bin/afsrv_terminal
"sink-outbound" arcan-net some.ip 6680 (or arcan-net keyid@)

While simpler, this form has the downside that ‘crash recovery’ won’t work the same, as the listening end don’t know where to redirect ‘some_arcan_client’ here. For the other form, the keyid@ (see keystore below) can have multiple redundancies should one host lost connectivity or fail to respond.

Keystore: A basic keystore has been added for managing cryptographic key material and identities, and while the tooling is lacking, it should be sufficient for testing and non-sensitive use.

The following example shows how to setup one up, using a one-time password for authenticating the public keys, with the ‘push’ form of connectivity:

"source-outbound":
         export ARCAN_STATEPATH=/path/to/store
         arcan-net keystore myhost 192.168.0.2 6680
         echo 'somepass' | arcan-net -a 1 -s out myhost@ &
         ARCAN_CONNPATH=out afsrv_terminal

"sink-inbound":
         echo 'somepass' | arcan-net -a 1 -l 6680

One difference from ssh “type yes to be subject to man in the middle” is that a shared secret is used the first time (-a n == read or prompt secret from stdin and accept the first n public keys that had a valid shared secret). Another one is that the host tag for outbound connections can have multiple possible endpoints by adding more hosts to the same keyid. This ties into crash recovery as that also covers loss of connectivity.

The new default is that if there is no working keystore, arcan-net will refuse to run. You have to opt out of stronger authentication in one or two forms:

1: cat mypw | arcan-net -a --soft-auth -l 6680
2: arcan-net --soft-auth -l 6680

With the first form using a password stored in ‘mypw’, while the second will just appear protected (asymmetric primitives generated on the fly and assumed trusted) if you look at the network traffic (i.e. requires active MiM still), anyone is still allowed to connect.

For anything more serious the suggestion would still be to tunnel over Wireguard. I personally have no problems running a12 over untrusted networks for certain tasks — but my threat model is not your threat model and all that.

Backpressure: Changes in both protocol and implementation has been made to better track the current video frame back-pressure as part of congestion control. This is a rather involved topic that will need several iterations to find good parameter sets to switch between depending on the contents — media playback that doesn’t depend on user interaction to produce input can go with deeper buffers, while games would prefer shallow ones, with depth based on latency and bandwidth estimation.

The upside is that improvements here also translate to the local non-networked case — every layer is involved in getting timing better.

What is next for the networking stack is to find an optimal way of leveraging the keystore to answer the question of “which of the devices that know and have a previously established relationship to are currently reachable, at home or overseas” in a privacy preserving way (i.e. service discovery).

Other big ticket items is to make the keystore friendlier for setup and authentication of unknown public keys, expand back-pressure management with dynamic video codec bitrate selection and pre-compressed video passthrough so that a client that exposes a window with a h264 stream first tries to forward that intact, and only recompress if the network conditions are more strict.

XArcan, Wayland and other transports

While not strictly part of this ‘version’ as it is a standalone client and separate repository, our Xorg fork “Xarcan” now handles GPU transitions more gracefully and has received basic clipboard integration as well as accelerated cursor support.

The last updates to this should be pushed and tagged within the coming weeks, but worth mentioning nonetheless, and much more work is planned for this one as we continue to move further away from using Wayland as a client compatibility solution — even if that means writing custom Qt and Chrome-ozone platform plugins.

One thing on the compatibility workbench that, albeit experimental still, is that it is now possible to use X Window Managers to manage and decorate Arcan windows — while still not granting default access to their contents or input routing. Thus for those who are stuck in old ways when it comes to configuration, keyboards and window management, there is a migration path close to being useable.

A longer article on how/why this works, and how it can also be used to move Xorg forward, is being written and should be published shortly.

The QEmu ui-driver has been synched to the Qemu 6.x release series.

The Wayland implementation in the arcan-wayland service have also seen some updates, with the main ones being repackaging wl-shm (shared memory) buffers to forward as dma-buf when possible (keeping the GPU transfer cost outside of the composition process), and forwarding pulseaudio sockets into the temporary directory used with -exec and -exec-x11 single client use.

TUI

Our “curses/vt100 replacement” client API switches to server-side text rendering only. Cell attributes can now operate in indexed-colour mode where the server-defined colour scheme gets resolved and applied automatically.

API calls for hinting about window contents for server-side scroll controls are now exposed.

The related Lua bindings have been updated and are closer to complete, and now quite usable – with mostly the widgets left that are missing a Lua friendly API.

The NeoVIM frontend have been updated accordingly and now also allows buffer transfers. This is useful in Pipeworld pipelines for building mixed graphical/text/tui processing pipelines and using VIM as the editor for REPL like workflows.

There is also experimental integration with the Kakoune editor that can be found here thanks to Cipharius.

Frameservers

Our “single-purpose” designated special clients for engine core offloading and security compartmentation of sensitive tasks has also seen some new capabilities:

Decode: Added support for proto=img with a similar codebase and sandboxing setup as the aloadimage tool uses. It can also be run in a daemon like mode where the main engine can offload image decoding and vector image re-rasterisation (for low<->high DPI transitions).

The UVC based decode video capture stage has also received controls for toggling tracking dirty regions when running multi-buffered, so if you are using UVC based capture devices such as the ElGato Camlink 4k with stable low frequency changing sources (HDMI or good quality security cameras) fewer wasteful frames should propagate.

Terminal: the terminal emulator received support for dynamically switching colour schemes and server-side defined colour preferences. It can now resize to fit to contents when kept alive after the shell exits. It also received a pty-less ‘piped’ mode for having multiple instances displaying stdin/stdout contents as part of inspectable and individually controllable pipelines.

This is heavily used in Pipeworld as seen here:

Pipeworld staggered launch of independent terminal emulators with runtime stdio redirection

EGL-DRI

The low level graphics platform used when arcan acts as the native / primary display server has seen a number of smaller fixes to how modifiers are setup and propagated. More importantly though:

Front buffer text-surfaces: When the WM requests that a source with a TUI (TPACK) is to be mapped as the source to a single display, it will switch to single-front buffer rendering. This means that our terminal emulator and similar simple TUI clients can be run with close to the lowest possible latency and CPU cost.

With the pending updates to the default included ‘console’ window manager (see: Writing a console replacement using Arcan ) it is shaping up as a competent graphics-capable getty/fbcon/kmscon replacement.

FBO/EGLScreen scanout transitions: We can now also dynamically switch between the preferred ‘FBO’ direct scanout and the much maligned EGLScreen based recomposition scanout dynamically.

The reblit-to-EGL stage is still the safe default, as there are some really quirky multi-rotated-display resolution switching corner cases to work out still.

Low/SDR/Deep/HDR (+-VRR) transitions: The transitions back and forth between these were a big and fairly complicated part of the HDR/Color Processing work and are now working for the low/sdr/deep cases. There are low level dragons to slay here before the last stretch for working HDR composition, somewhat being blocked by my rather pricey DisplayHDR1000 monitor crashing outright on amdgpu when enabled, while on my intel GPUs the driver crashes instead. Fun times.

Lua

On the scripting side, there is now an optional “input_raw” entry point for indicating support for out-of-bound/direct-to-drain input event processing to reflect the changes to core input routing mentioned before.

Non-blocking I/O: The open_nonblock set of functions now has a “data_handler” function for setting a rising-edge event handler to make it cleaner to use than the non-blocking periodic polling approach. Similarly, the write call has added automatic queueing and a completion event. The scheduler will try to favour I/O operations while the platform is busy rendering/scanning out with GPUs in a locked state.

Platform control functions: The video_displaymodes() function has received an overloaded form for switching between desired sink colour depth (Low, Normal, Deep, HDR) as well as variable refresh toggles. Similarly, the map_video_display() function now also supports mapping several layers for hardware platforms with hardware controlled basic composition.

For platforms with low level input translation tables on keyboards (e.g. evdev), the input_remap_translation() function has been added to control/hint/swap desired LMVO (layout/model/variant/option) being applied to one or many keyboards.

Staggered client launch: The state control functions resume_target/suspend_target controls can now be applied to clients that are still locked in the initial preroll/open. This allows for staggering client releases on event storms — or when you want to have more precise control over client environment stage, e.g. remap stdin/stdout/stderr to descriptors delivered over the shmif connection. Pipeworld, for example, uses this to implement its pipe(cmd_1, cmd_2, …) command where the entire pipeline is first built and executed, then activated in one go.

Build / Packaging / OS Specific

NixOS: Thanks to the tireless work of Albin (a12l) and AndersonTorres, we are now packaged in NixOS.

BSDs: We should now be usable on all the major BSDs (Open, Free, Net, Dragonfly) thanks to Jan Beich, Zrj, Antonio, Leonid and others.

Also worthwhile to note is that the OpenBSD performance regression have been solved for 6.9, but it seems 7.0 brought other timing issues. It turned out to be tied to the VRR conductor refactoring making assumptions about kernel tick granularity.

This is now dynamically probed and reverted to a display-clocked scheduling tactic if the OS Kernel has a low scheduler tick configured (the default in OpenBSD being 100Hz still). While not recommended for prolonged use, the “-W processing” synchronisation strategy mitigates sluggishness at the expense of CPU.

Tools

A new ‘arcan-dbgcapture’ tool has been added. This acts as a tui- wrapper for core dumps, and is best used as a core_pattern handler on Linux. Here you can see it being used in this way inside Pipeworld:

arcan-dbgcapture tool wrapping coredumps around collection/debugging tooling

Posted in Uncategorized | Leave a comment

Arcan as Operating System Design

Posted on September 20, 2021 by bjornstahl

Time to continue to explain what Arcan actually “is” on a higher level. Previous articles have invited the comparison to Xorg ( part1, part2 ). Another possibility would have been Plan9, but Xorg was also a better fit also for the next (and last) article in this series.

To start with a grand statement:

Arcan is a single-user, user-facing, networked overlay operating system.

With “single-user, user-facing” I mean that you are the core concern; it is about providing you with controls. There is no compromise made to “serve” a large number of concurrent users, to route and filter the most traffic, or to store and access data the fastest anywhere on earth.

With “overlay operating system” I mean that it is built from user-facing components. Arcan takes whatever you have access to and expands from there. It is not hinged on the life and death of neither the Linux kernel, the BSD ones or any other for that matter. Instead it is a vagabond that will move to whatever ecosystem you can develop and run programs on, even if that means being walled inside an app store somewhere.

As such it follows a most pervasive trend in hardware. That trend is treating the traditional OS kernel as a necessary evil to work around while building the “real” operating system elsewhere. For a more thorough perspective on that subject, refer to USENIX ATC’21: It’s time for Operating Systems to Rediscover Hardware (Video).

This is a trick from the old G👀gle book: they did it to GNU/Linux with Android and with ChromeOS. It is not as much the predecessor mantra of “embrace, extend and extinguish” as it is one of “living off the land” — understanding where the best fit is within the ecosystem at large.

From this description of what Arcan is — what is the purpose of that?

The purpose of Arcan is to give you autonomy over all the compute around you.

With “autonomy” I mean your ability to move, wipe, replace, relocate or otherwise alter the state that is created, mutated or otherwise modified on each of these computing devices.

The story behind this is not hard in retrospect; user-facing computers are omnipresent in modern life — they outnumber us. You have phones, watches, tablets, “IoT” devices, e-Readers, “Desktop” Workstations, Laptops, Gaming Consoles and various “smart”- fridges, cars and meters and so on. The reality is that if a computer can be fitted into something, be sure that one or several will be shoved into it, repeatedly.

The fundamentals on how these computers work differ very little; even “embedded” is often grossly overpowered for the task at hand. On the other hand, getting these things you supposedly own to collaborate or even simply share state you directly or indirectly create is often hard- or impossible- to achieve without parasitic intermediaries. The latest take on this subject on the time of writing this is parts of “cloud”; routing- and subjecting- things to someone else’s computing between source (producer) and sink (consumer).

Part of the reason for this is persistent and deliberate balkanisation combined with manipulative monetisation strategies that have been permitted to co-evolve and steer development for a very long time; advances in cryptography has cemented this.

An example: In the nineties it was absurd to think that the entire vertical (all the ‘layers’) from datastore all the way to display would have an unbroken chain of trust. The wettest of dreams that the Hollywood establishment had was that media playback was completely protected from the user tampering with- or even observing- the data stream until presented. That is now far from absurd; it is the assumed default reality and you are rarely allowed to set or change the keys.

The scary next evolution of this is making you into a sensor and it is sold through claims of stronger security features that are supposed to protect you from some aggressor. A convenient side effect is that it actually serves to safeguards the authenticity of the data collected from-, of- and about- you. As a simple indicator: when no authentication primitive (password etc.) is needed, the “ai in the cloud” model of you has locked things down to the point that you best behave if you want to keep accessing those delicious services your comfort depend on.

The overall high-level vision how this development can be countered on a design basis is covered by the 12 principles for a diverging desktop future on our sister blog — but the societal implementation is left as an exercise for the reader.

A short example of the general idea in play can be seen in this old clip:

This demonstrates live migration between different kinds of clients moving to arcan instances with different levels of native integration and no intermediaries. As a native display server on OpenBSD in the centre to a limited application on the laptop to the left (OSX) and to a native display server on the pinephone on the right.

The meat and remainder of this article will provide an overview of the following:

Building Blocks
Programmable Interfaces
User Interfaces
Application Compatibility
Security Story

Building Blocks

The following diagram illustrates the different building blocks and how they fit together.

SHMIF – Shared Memory Interface

SHMIF is a privilege barrier between tasks built only using shared memory and synchronisation primitives (commonly semaphores) as necessary and sufficient components. This means that if an application has few other needs over what shmif provides, all native OS calls can be dropped after connection setup. That makes for a strong ‘least-privilege’ building block.

It fills the role of both asynchronous system calls and asynchronous inter-process communication (IPC), rolled into one interface. The main inspiration for this is the ‘cartridges‘ of yore — how entire computers were plugged in and removed at the user’s behest.

There is a lot of nuance to the layout and specifics of SHMIF which are currently out of scope, notes can be found in the Wiki. The main piece is a shared 128b event structure that is serialised over two fixed size ring buffers, one in-bound and one out-bound. The rest is contested metadata that is synchronously negotiated and transferred to current metadata that both sides maintain a copy of.

This is optionally extended with handle/resource-token blob references when conditions so permit (e.g. unix domain socket availability or the less terrible Win32 style DuplicateHandle calls), as that is useful for by-reference passing large buffers around which can be preferred for accelerated graphics among many other things.

The data model for the events passed around is static and lock stepped. This model is derived from the needs of a single user “desktop”. It has been verified against both X11, Android as well as special purpose “computer-in-computer” clients like whole-system emulation à QEmu, specialised hybrid input/out devices (e.g. streamdeck) and virtual- augmented- and mixed- reality (e.g. safespaces).

A summary of major bits of shmif:

Connection: named, inheritance, delegation, redirection and recovery.
Rendering: pixel buffers, formatted text, accelerated device buffers.
Audio: sources, sinks and synchronisation to/from video.
Input: digital, translated (keyboard), mice, touch / tablet, analog sensors, game / assistive, eye tracker, application announced custom labels.
Sensors: Positional, Locational and Analog.
Window management (relative position, presentation size, ordering, annotation).
Color profiles and Display controls (transfer functions, metadata).
Non-blocking State transfers (clipboard, drag and drop, universal open/save, fonts).
Blocking state controls (snapshot, restore, reset, suspend, resume).
Synchronisation (to display, to event handler, to custom clock, to fence, free-running).
Color scheme preferences.
Key/value config persistence and “per window” UUID.
Coarse grained timers.

Most of this is additive – there is a very small base and the rest are opt-in events to respond to or discard. All event routing goes through user-scriptable paths that are blocked by default, forcing explicit forwarding. A client does not get to know something unless the active set of scripts (typically your ‘window manager’) routes it.

Upon establishing a connection, requesting a new ‘window’ (pull) or receiving one initiated by the server side (push), the window is bound to an immutable type. This type hints at policy and content to influence window management, routing and to the engine scheduler.

A game has different scheduling demands from movie playback; an icon can attach to other UI components such as a tray or dock and so on. This solves for accessibility and similar tough edge cases, and translates to better network performance.

A12 – Network Protocol

SHMIF is locally optimal. Whenever the primitives needed for inter-process SHMIF cannot be fulfilled, there is A12. The obvious case is networked communication where there is no low latency shared memory, only comparably high-latency copy-in-copy-out transfers.

There are other less obvious cases, with the rule of thumb being that two different system components cannot synchronise over shared memory with predictable latency and bandwidth. For instance, ‘walled garden’ ecosystems tend to disallow most forms of interprocess communication, while still allowing networked communication to go through.

A12 has lossless translation to/from SHMIF, but comes with an additional set of constraints to consider and builds on the type model of SHMIF to influence buffer behaviour, congestion control and compression parameters.

The constraints placed on the design are many. A12 needs to usable for bootstrapping; operate in hostile environments; isolated/islanded networks and between machines with unreliable clocks; incompatible namespaces and possibly ephemeral-transitive trust. For these reasons, A12 deviates from the TLS model of cryptography. It relies on a static selection of asymmetric- and symmetric- primitives with pre-shared secret ‘Trust On First Use’ management rather than Certificate Authorities.

Frameservers

Around the same era that browsers started investing heavily into sandboxing (late 2000s, early 2010s) Arcan, then a closed source research project, also focused on ephemeral least privilege process separation of security and stability sensitive tasks. The processes that carry out such tasks are referred to as ‘frameservers’.

In principle ‘frameservers’ are simply normal applications using SHMIF with an added naming scheme to them and a chainloader (arcan_frameserver) that is responsible for sanitising and setting up respective execution environments.

In practice ‘frameservers’ have designated roles (archetypes). These control how the rest of the system delegates certain tasks, and gives predictable consequences to what happens if one would crash or be forcibly terminated. It is also used to put a stronger contract on accepted arguments and behavioural response to the various SHMIF defined events.

The main roles worth covering here are ‘encode‘, ‘decode‘ and to some extent, ‘net‘.

Decode samples some untrusted input source e.g. a video file, a camera device or a vector image description, and converts it into something that you can see and/or hear, tuned to some output display. This consolidates ‘parsing’ to single task processes across the entire system. These processes have discrete, synchronous stages where incrementally more privileges, e.g. allocating memory or accessing file storage, can be dropped. The security story section goes a bit deeper into the value of this.

Encode transforms something you can see and/or hear into some alternate representation intended for an untrusted output. Some examples of this are video recording, image-to-text (OCR) and similar forms of lossy irreversible transforms.

Net sets up transition and handover between shmif and a12. It also acts as networked service discovery of pre-established trust relationships (“which devices that I trust are available”, “have any new devices that I trust become available”) and as a name resource intermediary e.g. “give me a descriptor for an outbound connection to <name in namespace>”.

Splitting clients into this “ephemeral one-task” and regular ones lead to dedicated higher level APIs for traditionally complex tasks, as well as acting as delegates for other programs.

It is possible for another shmif client to say “allocate a new window, delegate this to a decode frameserver provided with this file and embed into my own at this anchor point” with very few lines of code. This lets the decode frameserver act as a parser/decode/dependency sponge. Clients can be made simpler and not invite more of the troubles of toolkits.

Engine

The ‘engine’ here is covered by the main arcan binary and invites parallels to app frameworks in the node.js/electron sense, as well as the ‘Zygote’ of Android.

It fills two roles — one is to act as the outer ‘display server’ which performs the last composition and binds with the host OS. The scripts that run within this role is roughly comparable to a ‘window manager’, but with much stronger controls as it acts as ‘firewall/filter/deep inspection’ for all your interactions and data.

The other role is marked as ‘lwa’ (lightweight arcan) in the diagram and is a separate build of the engine. This build act as a SHMIF client and is able to connect to another ‘lwa’ instance or the outermost instance running as the display server. This lets the same code and API act as both display server, ‘stream processor’ (see: AWK for Multimedia) and the ‘primitives’ half of a traditional UI toolkit.

Both of these roles are programmable with the API marked as ‘ALT’ in the diagram and will be revisited in the sections on ‘Programmable Interfaces’ and ‘Appl’.

The architecture and internal design of the engine itself is too niche to cover in sufficient detail. Instead we will merely lay out the main requirements that distinguish it against the many strong players in the core 2D- supportive 3D- game engine genre.

Capability – enough advanced graphics and animation support for writing applications and user interfaces on the visual and interactive span of something ranging from your run-of-the-mill web or mobile ‘app’ to the Sci-Fi movie ‘flair over function’ UIs. It should not rely on techniques that would exclude networked rendering or exclude devices which cannot provide hardware acceleration.

Abstraction – The programmable APIs should be focused on primitives (geometry, texturing, filtering), not aggregates/patterns (look and feel). Transforms and animations should be declarative (“I want this to move to here over 20 units of time”), (“I want position of a to be relative to position of b”) and let the engine solve for scheduling, interpolation and other quality of experience parameters.

Robust – The engine should be able to operate reliably in a constrained environment with little to no other support (service managers, display/device managers). It should avoid external dependencies. It should be able to run extended periods of time — months, not hours or days.

Resilient – The engine should be able to recover from reasonable amounts of failures in its own processing, and that of volatile hardware components (primarily GPU). It should be able to hand over/migrate clients to other instances of itself.

Recursive – The engine should be able to treat additional instances of itself as it would any other node in the scene graph, either as an external source node or a subgraph output sink node.

Programmable Interfaces

SHMIF has been covered already from the perspective of its use as an IPC system. As an effect of this, it is also a programmable low-level interface. A thorough article on using it can be found in (writing a low level arcan client), and a more complex/nuanced follow up in (writing a tray icon handler). The QEmu UI driver, the arcan-wayland bridge and Xarcan are also interesting points of reference to hack on.

TUI is an abstraction built on top of SHMIF. It masks out some features and provides default handlers for certain events — as well as translating to/from a grid of cells- or row-list- of formatted text. It comes with the very basics of widgets (readline, listview, bufferview). Its main role in the stack is to replace (n)curses style libraries and improve on text dominant tools as a migration strategy for finally leaving terminal emulation to rest.

ALT is the high level API (and protocol*) for controlling the Engine. The primary way of using it is as Lua scripts, but the intention is a bit more subtle than that. For half of this story see ‘Appl’ below. Lua was chosen for the engine script interface in part for its small size (with that low memory overhead and short start times), the easy binding API and minimal set of included functions. It is treated and thought of as a “safe” function decoration for C more than it is treated as a normal language.

The *protocol part is that the documentation for the API double as a high level Interface Description Language to generate bindings that would use the API out of process — allowing both Lua “monkey patching” by the user and process separation with an intermediate protocol. This makes the render process and ALT into a dynamic sort of postscript for applications with animations and composition effects rather than static printer-friendly pages.

Appl

This is not a discrete component, but rather a set of restrictions and naming conventions added on top of the core engine. To understand this, a rough comparison to Android is again in order.

The Android App is, grossly simplified, a Zip archive with some hacks, a manifest XML file, some Java VM byte code, optional resources and optional native code. The byte code traditionally came from compiling Java code, but several languages can compile to it. The manifest covers some metadata, importantly which system resources that the app should have access to.

The Arcan Appl (the ‘l’ is pronounced with a sigh or ‘blowing raspberries’) has a folder structure:

A subdirectory to some appl root store with a unique name
A .lua file in that directory with the same name.
A function with the same name as the directory and the .lua file.

Resources the appl can access and data stores it can create and delete files within are broken down into several namespaces. The main such namespaces are roughly: application-local-dynamic, application-local-static, fonts, library code and shared.

Similarly to how the Android app can load native code via JNI, the Arcan appl can dynamically load shared libraries. In contrast to Android where native code is pulled in for supporting the high level Java/Kotlin/…, the high level scripting in Arcan is to support the native code so that the tedious and error prone tasks are written in a memory safe, user hackable/patchable code by default.

The mapping of the namespaces themselves, restrictions or additional permissions, configuration database and even different sets of frameservers are all controlled by the arguments that was used to start each engine process.

The database act as a key / value store for the appl itself, but also as a policy model for which other shmif capable clients should be allowed to launch (* enforcement for native code is subject to controls provided by the host OS), as well as a key / value store for tracking information for each client launched in such a way.

Other resources permissions are not directly requested or statically defined by the appl itself, it the window manager that ultimately maps and routes such things.

User Interfaces

There are a number of reference appls that have been written and presented on this page throughout the years. These are mainly focused on filling the ‘window manager’ role, but can indeed also be used as building blocks for other applications.

These have been used to drive the design of ALT API itself; to demonstrate the rough scope of what can be accomplished, and they are usable tools in their own right.

The ones suitable as building blocks for custom setups or ‘daily drivers’ are as follows:

Console — (writing a console replacement using Arcan) which act as fullscreen workspaces dedicated to individual clients, with a terminal spawned by default in an empty one. This is analogous to the console in Linux/BSD setups and comes bundled with the default build (but with unicode fonts, OSD keyboard, touchscreen support, …).

Durden — Implements the feature set of traditional desktops and established window management schemes (tiling, stacking/floating and several hybrids). It is arranged as a virtual filesystem tree that UI elements and device inputs reference.

Safespaces — Structurally similar to Durden as far as a ‘virtual filesystem’ goes, but intended for Augmented-, Mixed- and Virtual Reality.

Pipeworld — Covers ‘Dataflow’; a hybrid between a programming environment, a spreadsheet and a desktop.

There are other shorter ones that are not kept up to date but rather written to demonstrate something. A notable such example is the Plan9-like Prio (One night in Rio – Vacation Photos from Plan9).

Compatibility

There are several intimidating uphill battles with established conventions and the network/lock-in effect of large platforms — no interesting applications equals no invested users; no developers equals no interesting applications.

The most problematic part for compatibility comes with ‘toolkit’ (e.q. Qt and GTK) built ones. Although often touted as ‘portable’, what has happened time and again is a convergence to some crude and uninteresting capability set tied to whatever platform abstraction can be found deep in the toolkit code base — it is never pretty.

There is fair reason as for why many impactful projects went with ‘the browser as the toolkit’ (i.e. Electron). The portability aspects for the big toolkits will keep on loosing relevance; the long term survival rate for well-integrated ‘native’ feel portable software looks slim to none. The end-game for these rather looks like banking on one fixed idea/style or niche.

The compatibility strategy for Arcan is “emphasis at the extremes” — first to focus on the extreme that is treating other applications as opaque virtual machines (which include Browsers). Virtualization for compatibility is the strongest tactic we have for taking legacy with us. This calls for multiple tactics to handle integration edge cases and incrementally breaking the opaqueness – such as forced decomposition through side-band communicated annotations, “guest-additions” and virtual device drivers.

The second part to the strategy is to focus on the other extreme that is the ‘text dominant’ applications, hence all the work done on the TUI API. As mentioned before, it is needed as a way out of getting ‘Terminals’ with command lines that are not stuck with hopelessly dated and broken assumptions on anything computing. Terminal emulators will be necessary for a long time, and Arcan comes with one by default — but as a normal TUI client.

TUI is also used as a way of building front-ends to notoriously problematic system controls such as WiFi authentication and dynamic storage management. It is also useful for ‘wrapping’ data around interactive transfer controls; leave the UI wrapping and composition up to the appl stage.

The distant third part of the compatibility strategy are protocol bridges — the main one currently being ‘arcan-wayland’. For a while, this was the intended first strategy, but after so many years of the spec being woefully incomplete, then seriously confused, it is now completely deranged and ready for the asylum. That might sound grim, yet this is nothing compared to the ‘quality’ of the implementations out there.

Security Story

One area that warrants special treatment is security (and the overlap with some of privacy). This is an area where Arcan is especially opinionated. A much longer treatment of the topic is needed, and an article for that is in the queue.

The much condensed overarching problem of major platforms are that they keep piling on ‘security features’ (for your own good they say) and (often pointless) restrictions or interruptions that are incrementally and silently added through updates — with you in the dark of what they are actually supposed to protect you from, and the caveats with that.

The following two screenshots illustrate the entry level of this problem:

sudo-sickness: “bash wants access to control Finder”

“something” wants to “use your microphone”

Note: The very idea that the second one even became a dialog is surprising. Most UIs that predates this idiocy had trained users to route data using their own initiative and interaction alone through “drag and drop”, “copy and paste” and so on. It is a dangerous pattern in its own right, and a mildly competent threat actor knows how to leverage this side channel.

There is a lot to unpack with these two alone, but that is for another time.

The core matter is that these fit in some threat model, but unlikely to be part of your threat model. The tools to actually let you express what your threat model currently is, and tools to select mitigations to fit your circumstances — are both practically nonexistent.

Compare to accessibility: supporting vision impairment, or blindness, has vastly different needs from deafness which has different needs from immobility. Running a screen reader will provide little comfort to someone that is hard of hearing and so on. Turning such features on without the user being informed; or even rudely interrupting by repeatedly asking at every possible junction, is rightly to be met with some contention.

In the other end, someone working on malware analysis has different needs from someone approving financial transactions for a company has different needs from someone forced to use a tracking app from an abusive partner or employer. Yet here protections with different edge cases and failure modes are silently enabled without considering the user.

The security story is dynamic and context dependent by its very nature. A single person could be switching between having any of the needs expressed above at different times over the course of a single day. More technically, it might be fine for your laptop to “on-lid-open: automatically enable baseband radio, scan for known access points, connect to them, request/apply DHCP configuration” coupled with some other service waking up to “on network: request and apply pending updates” and so on from the comforts of your home. It might also get you royally owned while at an airport in seemingly infinitely many ways.

To tie things back to the Arcan design. The larger story comes from the 12 principles linked earlier, and a few of those are further expanded into the following maxims:

The Window Manager defines the set of mitigations for your current threat model.

This is hopefully the least complicated one to understand. To break it down further:

The window manager is first in line to operationalise your intents and actions.
The window manager reflects your preferences as to how your computer should behave.
You, or someone acting on your behalf, should always have the agency to tune or work around undesired behaviours or appearances.
Any interaction should be transformable into automation through your initiative and bindable to triggers that you pick.
Automation and convenience should be easy to define and discover, but not a default.

The point is to make the set of scripts that is the Appl controlling the outermost instance of Arcan (as a display server) the primary control plane for your interests. If you think of each client/application as possibly sandboxed-local or sandboxed-networked, the scripts define the routing/filtering/translation rules between any source to any sink — a firewall for your senses.

There is no IPC but SHMIF.

Memory safety vulnerabilities (typically data/protocol parsers) were for a long time a cheap and easy way to gain access into a system (RCE – Remote Code Execution).

The cost and difficulty increased drastically with certain mitigations, e.g. ASLR, NX, W^X and stack canaries – but also through least-privilege separation of sensitive tasks (sandboxing). Neither of these are panaceas but they have raised the price and effort so substantially that there is serious economy and engineering effort behind just remote code execution alone (public examples) — which is far from what goes into a full implant.

Bad programming patterns break mitigations. If you don’t design the entire solution around least-privilege, very little can safely be sandboxed. In UNIX, everything is a file-descriptor. Subsequently, blocking write() to file-descriptors breaks everything.

What happens when trying to sandbox around non least-privilege friendly components is that you get IPC systems. Without a systemic perspective you end up with a lot of them, and they are really hard to get right. Android developers had serious rigour and a lot of effort in Binder (their primary IPC system) yet it was both directly and indirectly used to break phones for many years — and probably still is.

Few IPC systems actually gets tested or treated as security boundaries, and eventually you get what in offensive security is called ‘lateral movement’.

This is the story of (*deep breath*) how the sandboxed but vulnerable GTK based image parser triggered by a D-Bus (IPC) activated indexer service on your mail attachment exposed RCE via a heap corruption exploited with a payload that proceeded to leverage one of the many Use-after-Frees in the compositor Wayland (IPC) implementation, where it gained persistence by dropping a second stage loader into dconf (IPC) that used PulseAudio (IPC) over a PipeWire (IPC) initiated SIP session to exfiltrate stolen data and receive further commands without your expensive NBAD/IDS or lazy blue team noticing — probably just another voice call.

In reality the scenario above will just be used to win some exotic CTF challenge or impress women. What will actually happen is that some uninteresting pip-installed python script dependency (or glorious crash collection script) just netcats $HOME/.ssh/id_rsa (that you just used everywhere didn’t you?) to a pastebin like service– but that’ll get fixed when everything is rewritten in Rust so stay calm and continue doing nothing.

The point of SHMIF is to have that one IPC system (omitting link to the tired xkcd strip, you know the one); not to end them all, or be gloriously flexible with versioned code-gen ready for another edition of ‘programming pearls’ — but to solve for the data transport, hardening, monitoring and sandboxing for only the flows necessary and sufficient for the desktop.

Least privilege parsing and scrubbing

Far from all memory safety vulnerabilities are created equal. The interesting subset is quite small, and somehow need to be reachable by aggressor controlled data. That tend to be ‘parsers’ for various protocols and document/image formats. If you don’t believe me, believe Sergey and Meredith (Science of Insecurity).

This can (and should be) leveraged. Even parsing the most insane of data formats (PDF) have fairly predictable system demands. With a little care they can do without any system calls at all after they have been buffered, and it is really hard to do anything from that point even with robust RCE.

This is where we return to the ‘decode’ frameserver — a known binary that any application can, and should, delegate parsing of non-native formats to. One which aggressively sandboxes the worst of offenders. With support from the IPC system that tunes the parsing and consumes the results, it also becomes analysis, collection and fuzzing harness in one. Leveraging the display server to improve debugging.

Someone slightly more mischievous can then, run these delegates on single-purpose devices that network boot and reset to a steady state on power-cycle. Let them consume the hand-crafted and targeted phishing expedition, remote-forward a debugger as a crash collector to a team that extract and reverse exploit chain and payload, replicate-inject into a few honey pots with some prices for the threat actor to enjoy. This clip from pipeworld looks surprisingly much like part of that scenario, no?

Really quick ‘gdb attach -p’

Many data formats are becoming apt at embedded compromising metadata. Most know about EXIF today, fewer are aware just how much can be shoved into XMP – where you can find such delicacies as metadata on your motion (gait), or tracking images hidden inside the image as a base64 encoded jpeg inside the xmp block of a jpeg. A good rule of thumb is to never let anything touched by Adobe near your loved ones. If you would manage to systematically strip the one, something new is bound to pop up elsewhere.

By splitting importing (afsrv_decode) and (afsrv_encode) to very distinct tasks with a human observable representation and scriptable intermediary model the transition from the one to the other — you also naturally get designs that lets you define other metadata to encode. If that then is what gets forwarded and uploaded to whatever “information parasite” (social media as some tend to call it) that pretends to strip it away, but really collects it for future modelling, starts to trust it, well shucks, that degrades the value of the signal/selector. The point is not to “win”, the point is to make this kind of creepiness cost more than what you are worth so some are incentivised to try and make a more honest living.

Compartment per device.

With A12 comes the capability to transition back and forth between device-bound desktop computing and several networked forms. This opens up for a mentality of ‘one task well’ per discrete (‘galvanically isolated’) device, practically the strongest form of compartmenting risk that we can reasonably afford.

The best client to compartment on ‘throwaway devices’ is the web browser. The browser has dug its claws so deep into the flesh of the OS and hardware itself, and expose so much of it to web applications that the distance between that and downloading and running a standalone binary is tiny and getting shorter by the day — we just collectively managed to design a binary format that is somehow worse than ELF, a true feat if there ever was one.

The browser offer ample opportunity for persistence and lateral movement, yet itself aggregates so much sensitive and useful information that you rarely need to seek elsewhere on the system.

In these cases lateral movement as covered before is less interesting. Enough ‘gold’ exists within the browser processes that it is a comfortable target for your disk-less ephemeral process parasite to sit and scrape credentials and proxy through; ‘smash and grab’ as the kids say.

There is generally an overemphasis on memory safety to the point that it becomes the proverbial ‘finger pointing towards the moon’ and you miss out on all the other heavenly glory. There are enough great and fun vulnerabilities that require little of the contortionist practices of exploiting memory corruptions, and a few have been referenced already.

A category that has not been mentioned yet is micro-architectural attacks — one reason why the same piece of hardware is getting incrementally slower these days. You might have heard about these attacks through names indicative of movie villains and vaguely sexual sounding positions, e.g. SPECTRE and ROWHAMMER. Judging by various errata sections between CPU microcode revisions alone, there is a lingering scent in the air that we are far away from the end of this interesting journey.

Instead of handicapping ourselves further, assume that ‘process separation’ and similar forms of whole-system virtualization is useful for resiliency and for compatibility still, but not a fair security mechanism; sorry docker. Instead, split up the worst offenders over multiple devices that, again, are wiped and replaced on a regular basis. You now cost enough to exploit that a thug with a wrench is the cheaper solution.

At this juncture, we might as well also make it easier to extract state (snapshot/serialize) and re-inject into another instance of the same software on another device (restore/deserialize). In the end, it is a prerequisite for making the workflow transparent and quick enough that spinning up a new ephemeral browser tab should be near instant.

Another gain is that you reduce the amount of accidental state that accumulates, and you get the ability to inspect what that state entails. This is the story about how your filthy habits built up inside and between the tabs that you, for some reason, refuse to close. Think of the poor forensics examiner that has to sift through it — toss more of your data.

Anything that provides data, should be able to transition to producing noise.

Consider the microphone screenshot from earlier, or ‘screen’ sharing for that matter. What value does it have to your actions over abstracting away from the device?. Having a design language that is ‘provide what you want to share’ might look similar enough to a browser asking for permission to use your microphone or record your desktop — but there is quite some benefit to base the user interaction to work at this other level of abstraction.

Some are strictly practical, like getting the user to think of ‘what’ to ‘present’ rather than to try and clean the desktop from any accidentally compromising material. Being explicit about source makes it much less likely that the iMessage notification popups from your loved one showing something embarrassing will appear in the middle of a zoom call with upper management.

By decoupling the media stream from the source (again, fsrv_decode and fsrv_encode), there is little stopping you from switching out or modifying the stream as it progress. While it can be used for cutesy effects such as adding googly eyes to everything without the application being any the wiser — it also permits a semantically valid stream to slowly be morphed to- and from- a synthetic one.

With style transferring GANs improving, as well as adversarial machine learning for that matter, AI will no doubt push beyond that creepy point where AI synthetic versions of you will plausibly pass the Turing test to any colleagues and loved ones. This also implies that your cyber relations or verbal faux pas will be more plausibly deniable. You can end a conversation, and let the AI keep it going for a while. Let a few months pass and not even you will remember what you actually said. Taint the message history by inserting GPT3 stories in between. Ideally the cost for separating truth from nature’s lies will cost as much as dear old Science.

This is a building block for ‘Offensive Privacy’ — Just stay away from VR.

Posted in Uncategorized | Leave a comment

Introducing Pipeworld: Spreadsheet Dataflow Computing

Posted on April 12, 2021 by bjornstahl

Now for something completely different. In the spiritual vein of One Night in Rio: Vacation photos from Plan9 and AWK for multimedia, here is a tool that is the link that ties almost all the projects within the Arcan umbrella together into one – and one we have been building towards for a depressing number of years and tens of thousands of hours.

Pipeworld (github link) combines ‘dataflow‘ programming (like Excel or Userland) with a Zooming-Tiling User Interface (ZUI). It builds dynamic pipelines similarly to (Ultimate Plumber), and leverages most of the gamut of Arcan features — from terminal emulator liberated CLIs and TUIs to dynamic network transparency. It follows many of the principles for a Diverging Desktop Future, particularly towards the idea of making clients simpler and more composable by focusing on interactive “breadboarding” data exchange.

There is a whole lot of ground to cover, so let’s get started.

The following is a video (youtube) that joins all the clips in this article together, but you will likely understand more of what is going on by reading through the sections.

Shortlinks to the individual sections are as follows:

Zoomable UI
Dataflow Computing
Terminal, CLI and Pipelines
Advanced Networking, Sharing and Debugging
Trajectory and Future

Zoomable (Tiling) UI

The core is based around ‘cells’ of various types. These naturally tile in two dimensions as rows and columns. The first cell on each row determines the default behaviour of the row, and moving selection around will ‘pan’ the view to fit the current cell.

Each row has a scale factor, and the same goes for the cell. This means that different sets of cells can have different sizes (‘heterogenous zooming’) with different post-processing based on magnification and so on. Some cells can switch contents based on magnification, while others stay blissfully ignorant.

In the following short clip you see some of this in play, using a combination of keybindings as well as mouse gestures. This might make some feel a bit nauseous as this is not something that generalises well to groups of observers (we have solutions for that too) — but it is a different effect when it is your interaction that initiates and control the “zoom”.

Tiling-Zoomable Window Management

The heterogeneous zooming allows for a large number of cells to be visible at the same time. Even when scaled down, client state such as notifications and alerts can still be communicated through decoration colours and animations.

Befitting of tiling workflows, everything can be keyboard controlled. Just like in Durden, mouse gestures, popup menus, keyboards and exotic input devices simply map to writes into file-system like paths.

When- or why- would tens to hundreds of simultaneous windows be useful? Some examples from my day to day would be monitoring, controlling and automating virtual machines, ~~botnets~~ remote shell connections, video surveillance, ticketing systems and so on.

Dataflow Computing and the Expression Cell

Each cell has a type – with the default one being ‘expression’, where you type in expressions in a minimalistic programming language. The result of that expression mutates the type- or the contents- of the cell.

Expressions are processed differently based on which scope they are operating in – with the current scopes being:

Expression – Arithmetic, string processing and other ‘basic types’ operations.
Cell – Used for event handlers, changing, annotating or otherwise modifying visual or window management behaviour of the current cell.
System – Global configuration, shutdown, idle timers and so on.
Factory – Producing new cells or mutating existing ones.

In the following clip you can see some basic arithmetic; changing numeric representation; string processing functions and the ‘cell’ scope as a popup to tag a cell with a reference identifier which is then used in another expression.

Expression cells used for basic arithmetic, number and text processing

In a sense this is 3 different kinds of command-line interfaces wrapped into one. These can modify, import from- and export to- cells of other types. This is where things gets more interesting and powerful.

The following video is an example where I first copy the contents of a file to the clipboard of a terminal (the somewhat unknown OSC;52 sequence). I then create an expression cell where I set its contents to a text message. I then start vim in the first terminal and then run the cell-scope expression:

type_keyboard(concat(a1.clipboard, b1), “fast”)

This reads as ‘send the clipboard content of the A1 cell and the content A2 as simulated keypresses into the currently selected cell using a moderately fast typing speed model. From the perspective of vim itself, this looks just like someone typing on a keyboard.

Simulated keypresses using a function composing the clipboard of a cell and the contents of another

There are many more cell types than just terminals, command lines and expressions. Video playback; capture devices; images; native Arcan clients, such as our Qemu backend or Xorg fork and many more. In this clip you can see some of those – including Pipeworld running pipeworld. You can also see the current state of autocompletion and expression helper.

In this video I first open a capture device (web camera). I then spawn a terminal and copy the contents of a shader (GPU processing program) to its clipboard. This gets compiled and assigned the name ‘red‘. Lastly I sample a frame from the capture device using this shader.

Sample and filter a frame from a capture device using a freshly compiled, all organic, shader.

What all of this means practically is that we can gather measurements, trigger inputs and stitch audio, video and generic data streams from clients together in a fashion that more resembles breadboards as used in electronics, and sequencers as used in sound production, than traditional desktop or terminal work.

Terminal, CLI and Pipelines

Lets go back to the command-line for a bit, as we have yet to poke at pipelines. In this clip I run the system scope function: pipe(“find /usr”, “grep –line-buffered share”)

Building a pipeline of shell commands

A lot of magic is happening inside of this one. Each client is spawned separately and suspended. Then their runtime stdin/stdout pipes gets remapped based on the current pipeline before the chain is activated. Note that when I reset the cell representing the “find /usr” cell, the grep one remains intact and unaware that find was actually killed off and re-executed.

The API is lacking somewhat still, but technically there is nothing blocking the ‘pipes’ from being a mix between terminal emulators (isatty() == true), stdin/stdout redirected normal processes (isatty() == false) and fully graphical ones.

Typing # into an expression cell gives us a dusty old terminal emulator. We can also add some command to run afterwards, like #find /tmp. ‘Resetting’ a cell means re-evaluating the expression or command it represents.

In the following clip I first list the contents of /tmp and then setup a timer that will reset the cell ~every second. This is indicated by the coloured flash. Then I spawn a normal terminal and create a file in tmp, and you can see it appearing. To show that the ‘terminal’ behind the first command is still alive, I also swap out its colour scheme and have it resize to fit its contents.

Per-cell reset timer repeating a single terminal command

Typing ! into an expression cell switches it into a CLI one. It uses a special mode in afsrv_terminal (the terminal that comes with Arcan) where no terminal protocols are emulated, no ‘pty devices’ are needed, and the CLI is native. As such we are no longer restricted to running a shell like bash that hides (“premature composition”) the identity, state and inputs/outputs of its children and commands.

Non-vt100 command-line with cell controlling new cell creation

Note that each discrete command becomes its own window, and the cell itself dictates the layout and scale of new and old command outputs.

Individual commands can be re-executed, clients run simultaneously and states such as clipboards are kept separate. Clients can switch to being graphical or embed media elements; they can announce their default keybindings; handle dynamic open and save from cut/paste and drag/drop; runtime redirect stdio; they can spawn sub-windows that attach on the same logical row and much more.

Advanced Networking, Sharing and Debugging

In the following clip you can see how quickly I go from a cell with external content and a debugger attached to the process associated with it, with lots of other options for process manipulation and inspection. Protections like YAMA are kept enabled, yet there is no sudo gdb -p mess going on. To learn more about this, see the article on Leveraging the Display Server to improve Debugging.

Lets go deeper. In the following clip I have set ‘arcan-dbgcapture’ as the kernel ‘core_pattern’ handler. Whenever a process crashes, the collector grabs crash information and the underlying executable. These gets wrapped around a Textual-UI with some options of what to do with the information. I then add a ‘listen’ cell that exposes a connection point for this TUI (‘crash’ in the video). Anything that connects to this point gets added to the row that this cell controls. To show it off, I run a simple binary that just crashes outright a few times, and you can see how the collections appear.

Since it is built using arcan-shmif, these connection points can be routed over a network – so swapping that ‘crash’ point out with something like a12://my.machine, embed into your base VM image, distribute to your fuzzing cluster or CI and enjoy network transparent debugging.

Moving on to the networking path and streaming/sharing. The following clip shows me migrating a terminal and qemu cell from a workstation to my laptop via the A12 network protocol. Note that the cell colours and font size change automatically as it is the target a client is presenting on that controls the current look and feel.

(The slight delay for the qemu window is a bug in the qemu Arcan display backend that does not properly re-submit a frame on migration, so it does not appear until the next time the guest updated its output).

Almost all cells can have an ‘encoder’ attached to them. This is simply a one-way composition transform that will convert the contents to some other format. A very practical example is recording to a video file or RTMP host, or something more obscure like OCR for pixels to text conversion.

In the following clip I set a cell encoder that act as a VNC server and then connect to that using the built-in VNC client that is part of Mac OS X. I then destroy the encoder and assigns a new one to a terminal. The OS X VNC viewer reconnects automatically, and you can see that input also works over the return path.

A VNC cell encoder exposing the cell contents as a VNC server

A cell type that blends well with this is ‘composition’ which puts the contents of multiple cells together using some layouting algorithm. The following clip shows that being used.

Composition cell used to stitch together multiple different sources

Attach an encoder to the composition cell and you have compartmented partial desktop sharing, another potent building block for interesting collaboration tools.

Trajectory and Future

Pipeworld will join Safespaces in acting as the main requirement ‘driver’ in improving Arcan and evolving its set of features, while Durden takes the backseat and moves more towards stabilisation.

These projects are not entirely disjunct. Pipewold has been written in such a way that the dataflow and window management can be integrated as tools in these two other environments so that you can mix and match – have Pipeworld be a pulldown HUD in Durden or 360 degrees programmable layers in Safespaces with 3D data actually looking that way.

The analysis and statistics tools that are part of Senseye will join in here, along with other security/reverse engineering projects I have around here.

Accessibility will be one major target for this project. The zoomable nature helps a bit, but much more interesting is the data-oriented workflow; with it comes the ability to logically address / route and treat clients as multi-representation interactive ‘data sources’ with typed inputs and outputs rather than mere opaque box-trees with prematurely composed (mixed contents) pixels and rigid ‘drag and drop’ as main data exchange.

Another major target is collaboration. Since we can dynamically redirect, splice, compose and transform clients in a network friendly way, new collaboration tools emerge organically from the pieces that are already present.

Where we need much more work is at the edges of client and device compatibility, i.e. modify the bridge tools to provide translations to non-native clients. A direct and simple example is taking our Xorg fork, Xarcan, and intercept ‘screen reading’ requests and substitute for whatever we route to it at the moment – as well as exposing composed cell output as capture devices over v4l2-loopback and so on.

I can editorialise about this for hours, and although the clips here show some of what is already in here, there is so much more already existing and — much more to be done.

Posted in Uncategorized | 2 Comments

Durden 0.6 Released

Posted on December 20, 2020 by bjornstahl

Hot on the heels of the recent Arcan release, it is also time for a release to our reference desktop environment ‘Durden‘.

To refresh memory somewhat, the closest valid comparison is probably the venerable AwesomeWM – but there is quite a lot more to it in Durden. It has been my daily driver for around 5 years now and implements all popular window management styles, and some unique ones.

During this time, it has been used to drive development of Arcan itself — but that will start to wind down now as there is little need for more major features. Instead updates will mainly be improvements to the existing ones before we can safely go 1.0. The two other projects, Safespaces and [Undisclosed], will take its place in helping the engine reach new heights.

Durden has an unusual way of putting a desktop together where everything is structured as a file-system and reconfigurable at run-time with results immediately visible:

/global/workspace/background/color=127,0,0
/windows/type/terminal/input/mouse/click=0.1,0.1
/target/share/remoting/passive/vnc=5900:guest

This filesystem can be accessed through build-in HUD and popup menus, as well as externally controlled through a unix socket. Through the arcan-cfgfs tool it can even be mounted as a FUSE-file system.

All higher level UI primitives — no matter if it is decoration buttons, keybindings, statusbar and so on are simply references paths in this file system. There is over 600 such paths at the moment.

You can easily extend or ‘slim it down’ by adding or removing ‘tools’ scripts and ‘widgets’ scripts (for the HUD).

We are at the point where many traditional X window managers styles can be transferred through scripts that translate your old dot files into paths that can be run as schemes (atomic set of paths).

If you are interested in helping out with developing such translation scripts, get in contact.

Here is a link to the full Changelog. The rest of the post will describe some of the major changes but since most of them are not particularly visual, videos will be used more sparingly this time around.

Core / Input

Shutdown has received a new important option, ‘silent’. This will still cause a shutdown of Durden but the clients will treat it as if the connection has crashed and enter a sleep-reconnect cycle. Whenever you start Durden again, the clients should come back as if nothing happened.

Touch – This layer has been refactored to make it much easier to add or modify classifiers. New classifiers have been added for (relative mouse, touch-fit and touch-scaled).

More controls have been added to the touchscreen and trackpad input handler through /global/input/touch for runtime tuning, but many more controls still exist as part of the device profiles (see devmaps/touch/…). Touch device profiles have received more options and bindable gestures, as well as different handling for ‘enter n-finger drag’ versus ‘step n-finger drag’.

Rotary – Through /global/input/rotary and devmaps/rotary/… you can now control binding and mapping for devices like the Surface Dial and GriffinPower Mate. These are great alternatives to the mouse wheel for actions like scrolling. This tool also has basic experimental support for 3D mice, though those are much more complex.

New Tools

As always, we have some new tools:

streamdeck – This is a complex one. There is a separate article on the inner workings of it (Interfacing with a ‘Stream Deck’ Device). This video demos it:

showing off a stream deck working with Durden

This is not strictly limited to this narrow class of devices. The tool can be used to provide any display+input device pair to run as a ‘minimap’ screen, or integrate with wm- defined properties like titlebar buttons (in the video), client announced custom keys, workspace and window miniatures, custom bindings and so on.

Popup – This tool allows you to spawn either a custom menu (defined in the devmaps/menus folder) or any subtree of the normal file-system as a popup. Either tied to another UI component or relative to the current known cursor position. The video below shows how they are mapped to parts of the statusbar and titlebar.

Todo – This is a simple task tracker / helper that can be found under /global/tools/todo. You set a task group, add some tasks with a shortname and description and activate. A task is picked at random and appears as a statusbar button. Click on it to postpone or mark for completion and you get a new one, and so on. The best use of this tool is integration with your other ticketing systems if you, like me, work on many projects and sometimes have a tough time choosing what to focus on.

Tracing – This is simply a debug assistant that will merge the engine tracing facility in Arcan with all the logging going on in Durden into one coherent timeline. Saves to the about:// trace format in Chrome or used through a converter with the much better Tracy.

Extbtn – The ‘external button’ tool was also covered in part by another article (Another low-level Arcan client: A tray icon handler). It allows you to map external clients as icon providers into the statusbar tray. This is also a way to mix the status bar contents with multiple external ‘bar-tools’. The tool inside the Arcan source repository has some helper scripts for doing so using the lemon-bar protocol.

This video is from that article, showing off attaching a terminal emulator as well as Xorg with wmaker as a tray icon popup.

Autostart –This tool allows you to define, view and modify paths that will run on startup or reset.

UI Components

HUD – The built-in HUD file browser now triggers on clients (on user input) requests universal save/load, and sorting / searching received a ‘fuzzy matching’ mode (thanks to Cipharus). By typing ‘%’ into the HUD command line you can switch sorting mode, where fuzzy_relevance is a new entry.

Colour-Picker Widget – HUD settings that request an input colour, such as setting the background to a fix colour, will now popup several different palettes or reference image to pick from. In the video below you can see it being used to set a single colour wallpaper.

Statusbar – The popup tool now gets mapped to custom menus for workspace- switching buttons when right-clicking. This also applies to window titlebar. Display buttons are now dynamically added on display hotplug for quicker switching or migrating windows by drag-and-drop.

If window titlebars are marked as hidden due to client-side decorations, or default- setting them to off, they can now be set to ‘merge’ into the center-fill area of the statusbar instead – saving precious vertical space.

Window Management

Mouse controls have been added to the tiling window management modes. This means that you can re-organise / swap / migrate and so on by drag-and-dropping, with a live preview as to where in the hierarchy they will attach. The video below shows that in action:

A ‘column-‘ tab mode has been added that dedicates a side column for each ‘window as tab’.

The previous ‘CLI group’ mode of handling new connections have been re-written in favour of a new swallowing mode (where a window can share the hierarchy spot of another, swapable). It is more robust than the previous tactic, but currently more limited when dealing with multiple clients.

The option for controlling which workspace new clients spawn on has also been added, along with the (tiling) option of having subwindows spawn as children to their parent window.

Visual / Accessibility

There is now a shared core for caching, loading and generating non-textual icons. Before all icons were picked from a font file, but can now be picked from pre-rastered sources or shaders, and then resampled for the target display output density it is being used on.

The global/displays/<display or current> path has received a ‘zoom’ group that allows you to set or step per-screen magnification around a specific region or relative to the cursor.

Multiple UI components now have optional shadow controls. /global/settings/visual/shadow allows for enabling/disabling the effect, adjusting size, colour and intensity and so on.

The flair – tool has received an effect category for window selection, with a ‘shake’ or ‘flash’ option included.

An ‘invert light’ (color preserving) shader has been added to the collection that can be applied per window, see target/video/shaders.

Posted in Uncategorized | Leave a comment

Find Us
Github
IRC: #arcan @ libera.chat
Discord
Email: contact [at] arcan-fe.com
RSS
- RSS - Posts
January 2026

M T W T F S S

1 2 3 4

5 6 7 8 9 10 11

12 13 14 15 16 17 18

19 20 21 22 23 24 25

26 27 28 29 30 31

« Dec

Discovery and Networking

Tan, Tui!

Compatibility Work and NSA 0 Day

Audio

Video

Compressed Passthrough

Tracing

Backstory

Directory Server

Sourcing and Sinking

Appl Group

Ongoing Work

Philosophy

Security

Desktop Engine

Client Level

Frameservers

Shell and TUIs

Examples: Durden and Pipeworld

Trajectory

What is ‘Shell‘?

Simplifying and Exemplifying

Gains

Building Blocks

SHMIF – Shared Memory Interface

A12 – Network Protocol

Frameservers

Engine

Programmable Interfaces

Appl

User Interfaces

Compatibility

Security Story

Zoomable (Tiling) UI

Dataflow Computing and the Expression Cell

Terminal, CLI and Pipelines

Advanced Networking, Sharing and Debugging

Trajectory and Future

Core / Input

New Tools

UI Components

Window Management

Visual / Accessibility

Recent Posts

Find Us

RSS