Skip to content

Conversation

@giampaolo
Copy link
Contributor

@giampaolo giampaolo commented Dec 18, 2018

@pitrou
Copy link
Member

pitrou commented Dec 18, 2018

You're hitting CI issues. In particular, IPv6 is well-known to be broken on Travis :-/

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!


.. function:: create_server_sock(address, *, dual_stack=True, queue_size=5, reuse_addr=True)

Convenience function which creates a TCP server bound to *address* (a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the function be extended to support UDP sockets or that would require a different implementation?
If the latter, perhaps you want to rename the function to create_tcp_server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I forgot about UDP. =) Yes, I think we can support it. I can simply add a type=SOCK_STREAM parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to support UDP please consider adding allow_broadcast argument as we have in asyncio: https://github.com/python/cpython/blob/master/Lib/asyncio/base_events.py#L1154-L1158

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth adding because one can simply set SO_BROADCAST against the returned socket. reuse_addr and reuse_port parameters are different because they necessarily must be set before bind(). I would rather keep the API as simple as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

"""
if not has_ipv6 \
or not hasattr(_socket, 'AF_INET6') \
or not hasattr(_socket, 'IPV6_V6ONLY'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must also test for IPPROTO_IPV6 apparently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And note that due to a bug in the socket module, socket.IPPROTO_IPV6 is not exposed on Windows, even though this constant does exist: https://bugs.python.org/issue29515

# http://mail.python.org/pipermail/python-ideas/2013-March/019937.html
host = None
# If dual stack is supported and no host was specified '::' will
# accept AF_INET and AF_INET6 connections and all interfaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on all interfaces"

if reuse_addr:
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
if af == AF_INET6_ and dual_stack:
if sock.getsockopt(IPPROTO_IPV6, IPV6_V6ONLY):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth calling getsockopt() before setsockopt()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not sure. My reasoning here was: being setsockopt a write operation it may have more chances of failing (EPERM or whatever) than just a get operation.

sock.settimeout(2)
t = threading.Thread(target=run)
t.start()
time.sleep(.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is reliable. @vstinner will probably object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object. Don't do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will use a sync primitive.

cl.sendall(b'foo')
self.assertEqual(cl.recv(1024), b'foo')

with socket.create_server_sock((None, 0)) as sock:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to skip this if the testing platform doesn't support IPv6 at all (there has to be a flag for this already :-)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will look into it

return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

queue_size is named backlog with the same meaning.
I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Also asyncio loop.create_server() supports reuse_port() flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

No because internally asyncio has to bind on multiple sockets (on purpose):
https://bugs.python.org/issue17561#msg185765

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

Mmm. I think that for an asyncio app this function is fundamentally useless because it will use asyncio API. You're not supposed to import socket where you import asyncio, so I'm not sure how much consistency matters in this case.

queue_size is named backlog with the same meaning.

I will rename it to backlog

I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Good point. I took a look at some projects. Tornado uses 128, Twisted uses 50, curio uses 100, Trio (and Golang) uses a big size (but never exceding 65536):
https://github.com/python-trio/trio/blob/cfa82d8abc42a0bd56be9bc788686cb22590ace6/trio/_highlevel_open_tcp_listeners.py#L10
One may argue whether we may want to set a default kwarg for socket.listen() and also socketserver module (but those are separate issues).
CC-ing @njsmith who I suspect may be interested in this.

Also asyncio loop.create_server() supports reuse_port() flag.

Interesting. I never needed SO_REUSEPORT myself but it appears it's useful so it probably deserves its own argument:
https://medium.com/uckey/the-behaviour-of-so-reuseport-addr-1-2-f8a440a35af6

Copy link
Contributor Author

@giampaolo giampaolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review.


.. function:: create_server_sock(address, *, dual_stack=True, queue_size=5, reuse_addr=True)

Convenience function which creates a TCP server bound to *address* (a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I forgot about UDP. =) Yes, I think we can support it. I can simply add a type=SOCK_STREAM parameter.

return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, it would be useful to find out if asyncio's variant can be rewritten to benefit from this new function.

No because internally asyncio has to bind on multiple sockets (on purpose):
https://bugs.python.org/issue17561#msg185765

if reuse_addr:
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
if af == AF_INET6_ and dual_stack:
if sock.getsockopt(IPPROTO_IPV6, IPV6_V6ONLY):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not sure. My reasoning here was: being setsockopt a write operation it may have more chances of failing (EPERM or whatever) than just a get operation.

sock.settimeout(2)
t = threading.Thread(target=run)
t.start()
time.sleep(.1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I will use a sync primitive.

cl.sendall(b'foo')
self.assertEqual(cl.recv(1024), b'foo')

with socket.create_server_sock((None, 0)) as sock:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will look into it

return False

def create_server_sock(address, *, dual_stack=has_dual_stack(), queue_size=5,
reuse_addr=os.name == 'posix' and sys.platform != 'cygwin'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In asyncio the flag is called reuse_address. Maybe use the same name here for sake of consistency?

Mmm. I think that for an asyncio app this function is fundamentally useless because it will use asyncio API. You're not supposed to import socket where you import asyncio, so I'm not sure how much consistency matters in this case.

queue_size is named backlog with the same meaning.

I will rename it to backlog

I suspect 5 as default is too small. In asyncio the parameter default value is 100. For blocking sockets high value is even more important than for async case.

Good point. I took a look at some projects. Tornado uses 128, Twisted uses 50, curio uses 100, Trio (and Golang) uses a big size (but never exceding 65536):
https://github.com/python-trio/trio/blob/cfa82d8abc42a0bd56be9bc788686cb22590ace6/trio/_highlevel_open_tcp_listeners.py#L10
One may argue whether we may want to set a default kwarg for socket.listen() and also socketserver module (but those are separate issues).
CC-ing @njsmith who I suspect may be interested in this.

Also asyncio loop.create_server() supports reuse_port() flag.

Interesting. I never needed SO_REUSEPORT myself but it appears it's useful so it probably deserves its own argument:
https://medium.com/uckey/the-behaviour-of-so-reuseport-addr-1-2-f8a440a35af6

.. function:: has_dual_stack()

Return ``True`` if the kernel supports creating a socket which can handle
both :data:`AF_INET` and :data:`AF_INET6` (IPv4 / IPv6) connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name of this function is pretty confusing. A "dual stack host" is one that has both IPv4 and IPv6 support. A host can have dual stack support without being able to support both of them in a single socket.

Also, for Python docs it should probably be "the platform", not "the kernel".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the name is not the best. Actually, on a second thought, I don't think it's necessary to expose this function after all.

@njsmith
Copy link
Contributor

njsmith commented Dec 19, 2018

I'm not sure if this is a good idea. For users who don't understand all the intricacies of BSD sockets and IPv4/IPv6, dual-protocol sockets are kind of confusing. People expect them to handle IPv4 connections like IPv4 connections, but in fact they convert IPv4 connections into "virtual" IPv6 connections. This can be especially problematic when porting legacy IPv4-only code. For example, right now, python -m http.server prints the source address for each request:

~$ python -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [18/Dec/2018 19:32:10] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [18/Dec/2018 19:32:10] "GET /favicon.ico HTTP/1.1" 404 -

It does this by using sock.getpeername() to check the client address. But, if you're using a dual-protocol socket and gets a connection over IPv4, then sock.getpeername() doesn't return an IPv4 address, it returns a magical IPv6 address that has the IPv4 address encoded inside it. So flipping http.server to use this would produce:

~$ python -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
::ffff:127.0.0.1 - - [18/Dec/2018 19:32:10] "GET / HTTP/1.1" 200 -
::ffff:127.0.0.1 - - [18/Dec/2018 19:32:10] "GET /favicon.ico HTTP/1.1" 404 -

This is surprising and confusing (and in the case of http.server, might break existing code). In some cases it can create security vulnerabilities. (E.g., you blacklisted 1.2.3.4, but forget to blacklist ::ffff:1.2.3.4.)

I think this is why tornado/asyncio/trio all make a special effort to disable the dual-protocol mode, and instead handle dual-stack servers by opening multiple single-protocol sockets.

OTOH, for people who do understand the intricacies of BSD sockets and IPv4/IPv6, a helper is not really needed... and if you want to use the socket API directly you kind of have to know a lot of obscure intricacies.

I'm not like, massively opposed. If it's merged I'll just ignore it and it won't hurt me :-). http.server.HTTPServer could work around the logging issue by manually converting V4-mapped-V6-addresses back to V4 addresses before printing them.

If you do go ahead, the docs definitely need to highlight this issue, because otherwise you're going to get complaints from confused users.

@giampaolo
Copy link
Contributor Author

giampaolo commented Dec 20, 2018

People expect them to handle IPv4 connections like IPv4 connections, but in fact they convert IPv4 connections into "virtual" IPv6 connections. [...] if you're using a dual-protocol socket and gets a connection over IPv4, then sock.getpeername() doesn't return an IPv4 address, it returns a magical IPv6 address that has the IPv4 address encoded inside it.

That's a good point which I didn't consider. I guess that invalidates my original idea of indiscriminately using this function in other parts of the stdlib, like the tcpserver module. It also means assuming dual_stack=True as the default is not a good idea. I think a better signature could be the following:

def create_server_sock(address,
                       family=AF_UNSPEC,
                       type=SOCK_STREAM,
                       *,
                       reuse_addr=os.name == 'posix' and sys.platform != 'cygwin',
                       reuse_port=False,
                       backlog=100,
                       hybrid_v46=False):

Note: if family is AF_UNSPEC or None it will be determined from address via getaddrinfo(), similarly to create_connection(). This way one can force a given socket family if they want.

It probably also makes sense to rename has_dual_stack() to supports_hybrid_v46_socket() or something.
Thoughts?

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2018

reuse_addr=os.name == 'posix' and sys.platform != 'cygwin',

I wouldn't expose any option for controlling this. SO_REUSEADDR is so tricky that every library I've looked at gets it wrong, so what hope do regular people have.

The right way to do it is:

  • On Unix, everyone always wants SO_REUSEADDR
  • On Windows, you should never ever ever use SO_REUSEADDR (because it means something completely different than it does on Unix, and what it means is completely broken), and instead you should always use SO_EXCLUSIVEADDRUSE.
  • No idea about cygwin

@giampaolo
Copy link
Contributor Author

Closing this out. New PR is #11784.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants