Skip to content

Dealing with the wrong content encoding provided by the repository #517

@jancespivo

Description

@jancespivo

Hi
some repositories return packages (*.tar.gz) with a wrong header Content-encoding: gzip and requests automatically decompresses these files (see http://docs.python-requests.org/en/master/user/quickstart/#raw-response-content). So the resulted file is not "gzipped" but only "tarred" and the exception is raised:

   tar = tarfile.TarFile(str(filepath), fileobj=gz)
 /usr/lib/python3.6/tarfile.py in __init__() at line 1480
   self.firstmember = self.next()
 /usr/lib/python3.6/tarfile.py in next() at line 2295
   tarinfo = self.tarinfo.fromtarfile(self)
 /usr/lib/python3.6/tarfile.py in fromtarfile() at line 1090
   buf = tarfile.fileobj.read(BLOCKSIZE)
 /usr/lib/python3.6/gzip.py in read() at line 276
   return self._buffer.read(size)
 /usr/lib/python3.6/_compression.py in readinto() at line 68
   data = self.read(len(byte_view))
 /usr/lib/python3.6/gzip.py in read() at line 463
   if not self._read_gzip_header():
 /usr/lib/python3.6/gzip.py in _read_gzip_header() at line 411
   raise OSError('Not a gzipped file (%r)' % magic)

There are two options:

  1. Use the tarfile.open which can deal with it, instead of using directly the tarfile.TarFile(str(filepath), fileobj=gz).
  2. Use the requests raw stream instead of iter_content. Replace r.iter_content(chunk_size=1024) with the r.raw.stream(1024) in poetry.repositories.pypi_repository.PyPiRepository#_download and poetry.repositories.legacy_repository.LegacyRepository#_download Btw these spots violate DRY principle.

I suggest the option 1, because it is more robust and it doesn't depend on the download method.

I can prepare pull request even the tests will be tricky, because now the _download methods are not tested at all ;)

Best regards
PS: The pip can deal with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions