travis fails when remote data option is on in tests #491

eteq · 2012-11-19T20:57:08Z

As discussed in #481, turning on the --remote-data option in the tests causes the tests to all fail on travis. This is probably some permissions issue that may or may not be fixable. The error messages can be viewed at https://travis-ci.org/astropy/astropy/builds/3229836

eteq · 2012-11-19T19:17:30Z

Oh, and you can also look at the .travis.yml file at https://github.com/eteq/astropy/tree/travis-yml-remote-data

mdboom · 2012-11-19T20:35:11Z

I think you explicitly need to specify the encoding as utf-8 on the line that tries to decode the Google home page.

mdboom · 2012-11-19T20:37:55Z

Or, alternatively, don't decode and search for bytes rather than string, i.e. b'oogle</title>'

eteq · 2012-11-19T20:55:02Z

oh, tricky... but why would that be different on travis' machines?

eteq · 2012-11-19T20:57:41Z

I attached code just so travis will actually run as changes happen on this branch - this is not ready to be merged, though.

mdboom · 2012-11-19T21:08:44Z

The default encoding is platform and user-specific. Not sure the details of what Travis is using, but one should never depend on it being the same across different systems.

astrofrog · 2012-11-19T21:17:08Z

I think Travis runs on contributed/distributed machines, which would explain this kind of issue (maybe?)

eteq · 2012-11-22T06:47:15Z

@mdboom - as you can see here I tried a variety of different approaches, and all were unsucessful (see the travis builds for the commits above). Or did I mis-understand what you were suggesting?

astrofrog · 2012-12-11T14:48:09Z

@eteq - just out of curiosity, do things work if you include @mdboom's recent PR (#539) which fixes some encoding-related bugs?

mdboom · 2012-12-11T15:06:31Z

@eteq: Sorry I missed your question from a few weeks ago... Let's confirm first that #539 doesn't solve this, and if not, I'll have another look. I think @astrofrog is right that it's probably somehow related.

eteq · 2012-12-12T12:42:20Z

Still failing... @mdboom did anything from #539 give any insights here? I had to wipe the other commits to do the rebase, but I basically tried all combinations of 'utf-8' and 'ascii' with encode and decode...

astrofrog · 2012-12-12T13:58:57Z

I managed to reproduce the issue locally! Will see if I can come up with a fix.

mdboom · 2012-12-12T14:12:54Z

I was also able to reproduce locally at the following fixes it for me:

--- a/astropy/utils/tests/test_data.py
+++ b/astropy/utils/tests/test_data.py
@@ -199,7 +199,7 @@ def test_data_noastropy_fallback(monkeypatch, recwarn):
     #now try with no cache
     fnnocache = data.download_file(TESTURL, cache=False)
     with open(fnnocache, 'rb') as googlepage:
-        assert googlepage.read().decode().find('oogle</title>') > -1
+        assert googlepage.read().decode('utf8').find('oogle</title>') > -1

     #no warnings should be raise in fileobj because cache is unnecessary
     assert len(recwarn.list) == 0

If that isn't working for others, maybe Google is serving the page in a different encoding in different contexts? If that's the case, we probably need to be using a different reference URL (perhaps one we control on astropy.org?)

astrofrog · 2012-12-12T14:59:22Z

Yeah, I'm in Germany so maybe that's why I get the error. I agree we should just use http://www.astropy.org instead.

astrofrog · 2012-12-12T15:07:59Z

By the way, I still have issues even after @mdboom's suggested fix. The error is then:

    def decode(input, errors='strict'):
>       return codecs.utf_8_decode(input, errors, True)
E       UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 7133: invalid start byte

I did a print repr(googlepage.read()) and got: http://pastebin.com/cbqmRiVm

It turns out "I'm feeling lucky" in German doesn't decode to UTF8 ;-)

In [8]: "Auf gut Gl\xfcck!".decode('utf8')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/Volumes/Raptor/<ipython-input-8-ea8915abb885> in <module>()
----> 1 "Auf gut Gl\xfcck!".decode('utf8')

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.pyc in decode(input, errors)
     14 
     15 def decode(input, errors='strict'):
---> 16     return codecs.utf_8_decode(input, errors, True)
     17 
     18 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 10: invalid start byte

Anyway, this is probably a good reason to just switch to using the Astropy website.

astrofrog · 2012-12-12T15:10:52Z

Ah now this is interesting - it looks like the issue is that we must already be converting to UTF8 beforehand:

In [13]: "Glück".decode('utf8')
Out[13]: u'Gl\xfcck'

which is what's in my string before we try and decode it. Is this a bug?

astrofrog · 2012-12-12T15:12:45Z

Quick update - it seems that in this case, urllib is returning output that's already in UTF8, hence the issue I'm seeing. We might want to think of a fix for this, because if we ever put a non-ascii character on e.g. the Astropy homepage, things won't work anymore. Just out of curiosity, why are we calling decode at all?

mdboom · 2012-12-12T15:44:12Z

I see. Indeed, it looks like Google is serving iso-8859-1, not utf-8 at least for me... and with only English on the page (for me) the two are equivalent, but when serving German (based on your IP, I assume) the two are not equivalent.

We may not need the decode if we search for bytes rather than a unicode string, i.e.:

b'oogle</title>'

But ideally, we'd do this against a file we include on our website, as Google could change the encoding of their page at any time. (We could also be more robust to that by reading the "Content-Type" header, but as it stands now, download_file is completely encoding agnostic, and it should probably be kept that way).

Also, we should add a test that downloads a binary file (e.g. a PNG file or something) just to make sure it works. At present it does, but there's enough potentially for accidentally introducing a codec in download_file that we should watch out for problems.

eteq · 2012-12-15T06:51:22Z

I see your points @astrofrog and @mdboom - the main reason I opted for google is that my general theory is that google is probable one of the most "reachable" sites with the best up-time of anywhere. I was trying to avoid the possiblity of the tests failing because google was down instead of a problem with the good. That said, I see your points about the locality issues and I can't think of any other consistent way to deal with it.

(@mdboom - I tried the b'oogle</title>' trick earlier but it didn't seem to work... and anyway, isn't that the same as 'oogle</title>' in py 2.x ? I admit string encoding/decoding is something that has often confused me, though...)

So we could add a (small) page along the lines of http://www.astropy.org/test.html? And perhaps a very small binary file (like a zipped short text file or something) as http://ww.astropy.org/test.tgz? Once those are up I could update the test appropriately.

mdboom · 2012-12-17T14:47:12Z

Yes b'oogle</title>' is the same as 'oogle' in Python 2.x, not when run through 2to3, the former gets converted to a byte string (i.e. stays the same) whereas the latter gets converted to a unicode string.

Yes -- I'm all for adding those files at the top level of the website.

eteq · 2013-02-06T06:19:19Z

I think I got to the bottom of this and have some solutions... but I think it's best approached separately (switching to a different test URL and the underlying problem here with encoding), so I'm going to close this with the intent that it get replaced by #734 and #735

eteq added 2 commits January 29, 2013 18:29

make travis run tests with remote data

fc53787

try to wake up travis

88993cc

astrofrog mentioned this pull request Feb 5, 2013

VO Client and Server for Cone Search #552

Merged

This was referenced Feb 6, 2013

add remote-data to Travis tests #734

Closed

download_file does not do anything about encodings #735

Open

eteq closed this Feb 6, 2013

eteq mentioned this pull request May 1, 2014

change remote data tests to use astropy site instead of google #2412

Merged

Uh oh!

travis fails when remote data option is on in tests #491

travis fails when remote data option is on in tests #491

Uh oh!

Conversation

eteq commented Nov 19, 2012

Uh oh!

eteq commented Nov 19, 2012

Uh oh!

mdboom commented Nov 19, 2012

Uh oh!

mdboom commented Nov 19, 2012

Uh oh!

eteq commented Nov 19, 2012

Uh oh!

eteq commented Nov 19, 2012

Uh oh!

mdboom commented Nov 19, 2012

Uh oh!

astrofrog commented Nov 19, 2012

Uh oh!

eteq commented Nov 22, 2012

Uh oh!

astrofrog commented Dec 11, 2012

Uh oh!

mdboom commented Dec 11, 2012

Uh oh!

eteq commented Dec 12, 2012

Uh oh!

astrofrog commented Dec 12, 2012

Uh oh!

mdboom commented Dec 12, 2012

Uh oh!

astrofrog commented Dec 12, 2012

Uh oh!

astrofrog commented Dec 12, 2012

Uh oh!

astrofrog commented Dec 12, 2012

Uh oh!

astrofrog commented Dec 12, 2012

Uh oh!

mdboom commented Dec 12, 2012

Uh oh!

eteq commented Dec 15, 2012

Uh oh!

mdboom commented Dec 17, 2012

Uh oh!

eteq commented Feb 6, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants