Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion Lib/http/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@

DEFAULT_ERROR_CONTENT_TYPE = "text/html;charset=utf-8"

# Data larger than this will be read in chunks, to prevent extreme
# overallocation.
_MIN_READ_BUF_SIZE = 1 << 20

class HTTPServer(socketserver.TCPServer):

allow_reuse_address = True # Seems to make sense in testing environment
Expand Down Expand Up @@ -1284,7 +1288,16 @@ def run_cgi(self):
env = env
)
if self.command.lower() == "post" and nbytes > 0:
data = self.rfile.read(nbytes)
cursize = 0
data = self.rfile.read(min(nbytes, _MIN_READ_BUF_SIZE))
while (len(data) < nbytes and len(data) != cursize and
select.select([self.rfile._sock], [], [], 0)[0]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timeout=0 fourth arg here makes this non-blocking. the loop won't wait for further data to arrive. so we wind up not processing all of the data.

(Claude Code Opus 4.5 looking at the buildbot failure, issue, pr & branch triaged this FWIW)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why select at all here given we just want to block on reading another chunk? if the socket closes or errors, that'd return anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick PR to make it blocking: #142176

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should this do if the data doesn't arrive?
Wait forever, as in did before the fix?
Or doses this PR also meant fix the DoS? (I don't see how, is there a timeout setting or an implicit one?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was so long ago that I don't remember my train of thought, but playing around with the code I think I've recreated it.

I tested this code on Linux with have_fork = False. The tests failed because the underlying file object is unbuffered and read(n) can return less than n bytes. So the loop differs from loops in other fixes -- it continues until it reads 0 bytes. select() is needed to avoid blocking. Until now all worked on Linux and Windows on my computer and in CI tests. Perhaps this depends on the system load or on non-debug build (I only tested with the debug build).

Perhaps we need to use non-zero timeout (self.timeout?) and set it to non-None for tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xmlrpc/server.py has a similar code path

max_chunk_size = 10*1024*1024
size_remaining = int(self.headers["content-length"])
L = []
while size_remaining:
chunk_size = min(size_remaining, max_chunk_size)
chunk = self.rfile.read(chunk_size)
if not chunk:
break
L.append(chunk)
size_remaining -= len(L[-1])
data = b''.join(L)

but without a select. The last change there was ec1712a in 2012 adding the break if no data was read. So without a timeout it will suffer the same problem if content-length lies and less data is sent?

OTOH, does anybody out there operate a server without a timeout?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait forever, as in did before the fix?

yes. this is intentionally a blocking read until sufficient data arrives or the connection dies. :)

fwiw, i missed this myself in the first code review. cursed unnamed parameters and presumptions about what 0 means. 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need to add timeout logic here. just remove the select. this path was always intended to block until it got enough data or the socket is closed. the issue being protected against has nothing to do with blocking by holding a connection open, it's just about memory allocation.

This 1990s era server code was never meant to provide a robust server. There's a reason we got rid of it in 3.15.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need timeout logic here to be able to test this change.

cursize = len(data)
# This is a geometric increase in read size (never more
# than doubling our the current length of data per loop
# iteration).
delta = min(cursize, nbytes - cursize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment that this is a geometric increase in read size (never more than doubling our current the length of data per loop iteration) could be good. otherwise it takes some staring at the code to understand what it is doing with the delta.

not a big deal though given CGIHTTPRequestHandler is gone in 3.15 onwards. :)

data += self.rfile.read(delta)
else:
data = None
# throw away additional data [see bug #427345]
Expand Down
38 changes: 38 additions & 0 deletions Lib/test/test_httpservers.py
Original file line number Diff line number Diff line change
Expand Up @@ -913,6 +913,20 @@ def test_path_without_leading_slash(self):
print("</pre>")
"""

cgi_file7 = """\
#!%s
import os
import sys

print("Content-type: text/plain")
print()

content_length = int(os.environ["CONTENT_LENGTH"])
body = sys.stdin.buffer.read(content_length)

print(f"{content_length} {len(body)}")
"""


@unittest.skipIf(hasattr(os, 'geteuid') and os.geteuid() == 0,
"This test can't be run reliably as root (issue #13308).")
Expand Down Expand Up @@ -952,6 +966,8 @@ def setUp(self):
self.file3_path = None
self.file4_path = None
self.file5_path = None
self.file6_path = None
self.file7_path = None

# The shebang line should be pure ASCII: use symlink if possible.
# See issue #7668.
Expand Down Expand Up @@ -1006,6 +1022,11 @@ def setUp(self):
file6.write(cgi_file6 % self.pythonexe)
os.chmod(self.file6_path, 0o777)

self.file7_path = os.path.join(self.cgi_dir, 'file7.py')
with open(self.file7_path, 'w', encoding='utf-8') as file7:
file7.write(cgi_file7 % self.pythonexe)
os.chmod(self.file7_path, 0o777)

os.chdir(self.parent_dir)

def tearDown(self):
Expand All @@ -1028,6 +1049,8 @@ def tearDown(self):
os.remove(self.file5_path)
if self.file6_path:
os.remove(self.file6_path)
if self.file7_path:
os.remove(self.file7_path)
os.rmdir(self.cgi_child_dir)
os.rmdir(self.cgi_dir)
os.rmdir(self.cgi_dir_in_sub_dir)
Expand Down Expand Up @@ -1100,6 +1123,21 @@ def test_post(self):

self.assertEqual(res.read(), b'1, python, 123456' + self.linesep)

def test_large_content_length(self):
for w in range(15, 25):
size = 1 << w
body = b'X' * size
headers = {'Content-Length' : str(size)}
res = self.request('/cgi-bin/file7.py', 'POST', body, headers)
self.assertEqual(res.read(), b'%d %d' % (size, size) + self.linesep)

def test_large_content_length_truncated(self):
for w in range(18, 65):
size = 1 << w
headers = {'Content-Length' : str(size)}
res = self.request('/cgi-bin/file1.py', 'POST', b'x', headers)
self.assertEqual(res.read(), b'Hello World' + self.linesep)

def test_invaliduri(self):
res = self.request('/cgi-bin/invalid')
res.read()
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Fix a potential memory denial of service in the :mod:`http.server` module.
When a malicious user is connected to the CGI server on Windows, it could cause
an arbitrary amount of memory to be allocated.
This could have led to symptoms including a :exc:`MemoryError`, swapping, out
of memory (OOM) killed processes or containers, or even system crashes.
Loading