ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python#895
ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python#895wesm wants to merge 5 commits intoapache:masterfrom
Conversation
xhochy
left a comment
There was a problem hiding this comment.
I made some initial comments but it seems to be still quite rough so I would wait for more comments until it's settled.
python/testing/hdfs/Dockerfile
Outdated
There was a problem hiding this comment.
As you normally run as root inside of docker, sudo should not be required here.
There was a problem hiding this comment.
Thanks. I had to change the user back to root from the base image
There was a problem hiding this comment.
Adding --rm to docker run should make this line redundant
There was a problem hiding this comment.
I had some issues when I removed this line and then added --rm to the docker run command. Perhaps we can tweak in a subsequent PR when adding more features to these ad hoc Docker tests
wesm
left a comment
There was a problem hiding this comment.
Thanks. I'm adding the Python HDFS tests and will remove the WIP
python/testing/hdfs/Dockerfile
Outdated
There was a problem hiding this comment.
Thanks. I had to change the user back to root from the base image
Change-Id: I40d3acd46802ecb2a37f4d83ed08a841645772ba
|
This is ready to go. This is good for one-off use but we should see about refactoring our CI scripts to be able to share code more easily with this kind of thing. It's a little bit tricky because of the various interdependent environment variables As part of ARROW-1213 I will add S3 testing to this setup so you can check things out locally against an access/secret key for a cloud bucket |
Change-Id: I9819fb4f79ae202164dc4cf41c8d35961cff2589
Change-Id: I820f8eb707df50c6d12602fe2d816c80b1402ee1
Change-Id: Ib247a679667a40365846507b6ea9795660226272
|
@xhochy let me know any more comments on this. I'm going to look at the Parquet RC in the meantime |
|
+1; I'm going to keep adding some more ad hoc tests |
xhochy
left a comment
There was a problem hiding this comment.
Added some comments about best practices I gather from developing a lot with Docker comtainers
| # under the License. | ||
|
|
||
| use_gcc() { | ||
| export CC=gcc-4.9 |
There was a problem hiding this comment.
It's better to set these variables in The Dockerfile via ENV CC gcc-4.9
There was a problem hiding this comment.
I wasn't sure about this as the idea was to be able to easily switch between compilers depending on the test script
| set -e | ||
|
|
||
| export PATH="$MINICONDA/bin:$PATH" | ||
| conda update -y -q conda |
There was a problem hiding this comment.
in my dockerfiles, i normally include the conda installation in the image. Thus I get faster iteration times on repeated test runs.
There was a problem hiding this comment.
It is also helpful to separate the base installation and the project specific dependencies into different layers of the docker image so they are shared between similar images.
There was a problem hiding this comment.
Good point, I will do that in the next patch
| set -ex | ||
|
|
||
| docker build -t arrow-hdfs-test -f hdfs/Dockerfile . | ||
| bash hdfs/restart_docker_container.sh |
There was a problem hiding this comment.
For multiple docker containers, have a look at docker-compose. This lets you start and plug multiple containers together.
We aren't testing this in Travis CI because spinning up an HDFS cluster is a bit heavy weight, but this will at least enable us to do easier ongoing validation that this functionality is working properly.