Should we merge the Stream and Channel interfaces?

This is a weird/radical idea, but @oremanj's comment [here](https://github.com/python-trio/trio/pull/872#discussion_r261459795) (in response to @Badg) raised some red flags for me:

> And the channel interface is nicer than the stream one for incremental processing -- `async for chunk in channel:` rather than
> 
> ```python
> while True:
>     chunk = await stream.receive_some(ARBITRARILY_CHOSEN_POWER_OF_TWO)
>     if not chunk:
>         break
>     ...
> ```

At the conceptual level, the output from a process is *exactly* a `Stream` (or `ReceiveStream` or whatever), and if people are trying to jump through hoops to avoid using our `Stream` ABC to represent the Stream concept, then that seems like a bad sign!

So, let's at least go through the thought experiment: what if we got rid of `Stream` and used `Channel[bytes]` instead?

# Basic usability

Remembering which is a "stream" and which is a "channel" is [super annoying](https://github.com/python-trio/trio/issues/719#issuecomment-453334770). Merging them would eliminate this problem. Also annoying: constantly going through the pointless ritual of inventing a made-up buffer size (@oremanj's `ARBITRARILY_CHOSEN_POWER_OF_TWO`). And writing that `while True` loop over and over is also annoying. Making it a plain `await channel.receive()` or `async for chunk in channel:` would eliminate these annoyances.

# Conceptual level

For me, the major conceptual difference is that I've thought of `Channel` as inherently preserving object boundaries, i.e., whatever I pass to `send` is what comes out of `receive`. In this way of thinking, a `Stream` is equivalent to a `Channel[single_byte]`, but since handling single bytes individual would be inefficient, it uses batched-send and batched-receive operations. If we do decide to merge `Stream` and `Channel`, then we'd have to change this, and start saying that some `Channel`s don't preserve the 1-to-1 mapping between `send` and `receive`.

I'm not sure how I feel about this. It's certainly doable on a technical level. But conceptually – it feels weird to say that a websocket and a TCP socket are both `Channel[bytes]`, given that one is framed and the other isn't – that's a fundamental difference in their usage. (Right now one is `Channel[bytes]` and the other is `Stream`.) It would mean `Uint32Framing` adaptor doesn't convert a `Stream` into a `Channel[bytes]`, it converts a `Channel[bytes]` into another `Channel[bytes]`. And that a TCP socket and a UDP socket have the same type. Intuitively this feels weird. It seems like this is a distinction you want to expose, and emphasize, on the type level.

An interesting partial counter-example would be `h11`: an `h11.Connection` object is essentially a `Channel[h11.Event]`: you send a sequence of objects like `h11.Request`, `h11.Data`, `h11.EndOfMessage`, and then receive a sequence of similar objects. Sometimes, the objects on the sender and receiver sides match 1-to-1, like `Request` and `EndOfMessage`. But sometimes they don't, like `Data`, which might be arbitrarily rechunked! So if you want to treat `h11.Connection` as a `Channel[h11.Event]`, it's sort of simultaneously a 1-to-1 `Channel` and also a re-chunking `Channel`.

One possibility is to distinguish them somehow at the type level, but make them more consistent, or identical, in terms of the operations they happen to implement.

~~In Liskov terms, a 1-to-1/framed `Channel[bytes]` IS-A rechunking/unframed `Channel[bytes]` – all it does is add stronger guarantees. So merging them would at least have that going for it. But I don't put a huge amount of weight on that – in practice they're used very differently.~~ Actually, they are *not* Liskov-compatible – see below!

# Technical level

Currently we have:

`Stream`: `send_all`, `wait_send_all_might_not_block`, `send_eof`, `receive_some`
`Channel`: `send`, `send_nowait`, `receive`, `receive_nowait`, `clone`, iteration

Problematic bits:

* `wait_send_all_might_not_block`: we actually have an issue open about possibly getting rid of this: #371. If we do that would remove this problem :-)
* `send_eof`: we could add this to a bidirectional `Channel`, it wouldn't be too weird
* `*_nowait`: we've [been considering](https://github.com/python-trio/trio/issues/719#issuecomment-461271654) moving these to memory channels specifically, instead of the generic `Channel` interface
* `clone`: we've been considering dropping this (same link as for `*_nowait`)

So all those might get sorted out? And `send_all` and `send` are already basically the same. So that just leaves `async def receive()` versus `async def receive_some(max_nbytes)`. And `max_nbytes` is also the obstacle to having iteration. ...basically this is THE core distinction between the two APIs. So, what do we think about `max_nbytes`.

Specifying `max_nbytes` manually all the time is tiresome and annoying, as noted above.

Also, I note that Twisted/Asyncio/libuv always handle `max_nbytes` internally, and the user just deals with whatever they get.

Most `Stream` users basically want to read everything, and the only thing `max_nbytes` effects is efficiency, not correctness. In practice it's almost always set arbitrarily. I've never even seen anyone even benchmarking different values, except in extreme cases like trying to transmit multiple gigabytes/second through python. For `SocketStream`, there's some penalty for setting it too big – Python has to first allocate a `max_nbytes`-sized buffer, then `realloc` it down to size ([see](https://github.com/python/cpython/blob/9add4b3317629933d88cf206a24b15e922fa8941/Modules/socketmodule.c#L3408-L3425)). And of course if you set it too small then you pay some overhead from doing lots of small `recv`s instead of one big one. So you want some kind of "not too big, not too little" setting.

For other `Stream` implementations, this doesn't apply – for example, `SSLStream.receive_some` forces you to pass `max_nbytes`, and that controls how many bytes it reads out of its internal decrypted data buffer at any one time, but this has no effect at all on how much data it reads at a time from the underlying socket, when it needs to refill its buffer. That's controlled by the constructor argument `SSLStream(max_refill_bytes=...)`.

There are also cases where there is a "natural" size to return from `receive_some`. For example:

* in most applications, `SSLStream` might as well return whatever data has already been decrypted and is sitting in memory, instead of spending instructions messing around with buffers.

* A `GunzipStream` might as well return whatever data it got from decompressing the last chunk it read. (This can avoid some non-trivial complications: https://github.com/urllib3/urllib3/issues/1467)

* When reading from the Windows console, the underlying representation is natively unicode. This means that when we want to read bytes, we have to transcode into utf8, which in turn means that it may be [impossible to read less than 4 bytes at a time](https://www.python.org/dev/peps/pep-0528/#using-the-raw-object-with-small-buffers) (at least without nasty buffering tricks)

Given that most people don't tune it at *all*, I bet if we did a bit of benchmarking then we could pick a default `SocketStream` `recv` size that would work better that 99% of what people currently do. And I guess we'd make this an argument to the `SocketStream` / `SocketChannel` constructor, exactly like how `SSLStream` currently works, so people could override it if they want. This could complicate code where the stream is constructed implicitly though, like `p = Process(..., stdout=PIPE)` – if you don't want `p.stdout` to use the default `max_nbytes` setting, then how do you specify something different? Some options:

* We could simply set the default and tell everyone to live with it.

* We could add some way to pass this through, like `Process(..., stdout=NewPipe(max_nbytes=...))`.

* We could provide some API to mutate it, like `process.stdout.max_nbytes = new_value`.

* We could tell people with this unusual requirement that they should create their own pipe with whatever settings they want (this functionality is somewhat needed anyway, see #174, #824), then pass in one end by hand.

What about cases where correctness *does* depend on setting `max_nbytes`? It can never be the case that setting `max_nbytes` too *small* affects correctness, because `Stream` is already free to truncate `max_nbytes` to some smaller value if it wants to – no guarantees. But, we do make a guarantee that we won't return *more* than `max_nbytes`.

That... actually is important for correctness in some cases. For example, from [this comment](https://github.com/python-trio/trio/issues/174#issuecomment-465098700):

> 4\. We should provide a `trio.input`, that's like `builtins.input` but async and routed through our stdio-handling machinery. Probably it just calls `receive_some(1)` a bunch of times until it sees a newline.

This is why we can't *quite* think of our current `Channel[bytes]` as being a sub-interface of `Stream` – in this one very specific case, `Stream` genuinely has slightly more functionality.

This is probably a rare occurrence in practice. Most protocols need an internal buffer anyway, so any over-reads just go into the buffer for next time. And sometimes you want to hand-off between protocols, e.g. `SSLStream.unwrap`, or something like switching from HTTP/1.1 to Websocket... but in those cases we generally don't try to avoid over-reading from the underlying stream. Instead, we just accept that some over-read may have happened, and give it to the user to deal with ([example 1](https://trio.readthedocs.io/en/latest/reference-io.html#trio.SSLStream.unwrap), [example 2](https://h11.readthedocs.io/en/latest/api.html#h11.Connection.trailing_data)). And in many cases, it's actually impossible to avoid this in any efficient way – e.g. if you have a newline-delimited protocol, then you have no idea where the next line boundary will be, so the only way to avoid over-read is to read one-byte-at-a-time, which is way too inefficient. In theory we could avoid it for TLS (which is length-prefixed), or for other length-prefixed protocols (like `Uint32FramedChannel`), but it doesn't seem worth it in most cases.

The special thing about `trio.input` is that it's sharing the process's `stdin` with who-knows-what-else, so we can't coordinate our buffer usage with other users, and are reduced to this kind of stone-age `receive_some(1)` technique.

Some options here:

* Treat this as a special case for `trio.input`, and implement it using some specialized tools. E.g., make sure that `open_stdin` can safely be called repeatedly within a single program and returns different handles that don't interfere with each other, and then have `trio.input` do `async with trio.open_stdin(max_nbytes=1): ...`. Or provide some low-level `receive_some_from_stdin` function, or something.

* Have some `Channel[bytes]` implementations where `receive` takes an optional `max_nbytes` argument, as a matter of convention.

* Same as previous point, but also formalize this convention as a named sub-interface – though I'm having trouble thinking of a good name! This might help with our problem up above, about wanting some more informative way to describe the type of `Uint32Framing`? But of course proliferating names always has its own cost, especially if the names are awkward.

  Also, naming the interface creates an interesting challenge: how do you type `StapledChannel`? You want `StapledChannel.receive` to have the same signature as `StapledChannel.receive_channel.receive`, and at runtime this is easy – just use `*args, **kwargs`. But if we name this sub-interface, then the proper static type for `StapledChannel` depends on the static type of its `ReceiveChannel`. I'm not sure whether giving `StapledChannel` the right static type matters or not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Should we merge the Stream and Channel interfaces? #959

Basic usability

Conceptual level

Technical level

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Should we merge the Stream and Channel interfaces? #959

Description

Basic usability

Conceptual level

Technical level

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions