Skip to content

Conversation

@pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Jun 8, 2017

This supersedes #467

This is ready for review. Next steps are

  • Integration with the arrow CI
  • Write docs on how to use the object store

There is one remaining compilation error (it doesn't find Python.h for one of the Travis configurations, if anybody has an idea on what is going on, let me know).

@pcmoritz pcmoritz force-pushed the plasma-store-2 branch 3 times, most recently from 47a10ff to eec3e4c Compare June 8, 2017 06:54
@pcmoritz pcmoritz changed the title Integrate in-memory object store into arrow ARROW-1104: Integrate in-memory object store into arrow Jun 8, 2017
@wesm
Copy link
Member

wesm commented Jun 11, 2017

Can you rebase and get the CI build passing? I will start reviewing, mostly on IP documentation concerns over the code itself, so we can get this merged ASAP this coming so Ray can transition, and we can have this released in Arrow 0.5.0

@pcmoritz
Copy link
Contributor Author

Thanks, sounds good, I'll do these! I hope we can get this merged and do the switch soon to avoid having to backport changes from Ray.

Let me know about any IP issues you see.

@pcmoritz
Copy link
Contributor Author

Also let me know if you have ideas about improving the code!

@pcmoritz pcmoritz force-pushed the plasma-store-2 branch 2 times, most recently from 4f82c52 to eec3e4c Compare June 11, 2017 20:59
@wesm
Copy link
Member

wesm commented Jun 20, 2017

Looks like maybe just some linting now

@pcmoritz
Copy link
Contributor Author

Yeah, there are only a few small things left, will try to finish it today! Concerning Cython I agree we should do it, but suggest to first get this merged and rebase Ray on top of it to make the latter as easy as possible.

@wesm
Copy link
Member

wesm commented Jun 20, 2017

See google's guidance on naming: https://google.github.io/styleguide/cppguide.html#Function_Names

It seems that there used to be an allowance to name function with snake case when they are "cheap". This was recently changed; I think it would be a good idea to make all functions PascalCase instead of questioning "how cheap is this function to call" google/styleguide@db0a263#diff-26120df7bca3279afbf749017c778545R4277

I suspect that this issue came up too much in TensorFlow and they decided to drop it

@wesm
Copy link
Member

wesm commented Jun 21, 2017

That's weird that the tests are hanging. Maybe killall -9 would help (perhaps not without sudo)?

@pcmoritz
Copy link
Contributor Author

I can now reproduce it locally; if I just run "./client_tests", it doesn't hang, but

/home/pcmoritz/tmp/arrow/cpp/build-support/run-test.sh "/home/pcmoritz/tmp/arrow/cpp-build" "test" "/home/pcmoritz/tmp/arrow/cpp-build/debug//client_tests"

hangs and if I do control-c, I get

^CTraceback (most recent call last):
  File "/home/pcmoritz/tmp/arrow/cpp/build-support/asan_symbolize.py", line 368, in <module>
    loop.process_stdin()
  File "/home/pcmoritz/tmp/arrow/cpp/build-support/asan_symbolize.py", line 340, in process_stdin
    line = sys.stdin.readline()
KeyboardInterrupt

Let me know if you have an idea, I'm not familiar with the address sanitizer scripts but will keep digging.

@wesm
Copy link
Member

wesm commented Jun 21, 2017

What happens when you run only /home/pcmoritz/tmp/arrow/cpp-build/debug//client_tests?

@pcmoritz
Copy link
Contributor Author

pcmoritz@saturn:~/tmp/arrow/cpp-build$ /home/pcmoritz/tmp/arrow/cpp-build/debug//client_tests
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from TestPlasmaStore
[ RUN      ] TestPlasmaStore.ContainsTest
called plasma_create on conn 3 with size 100 and metadata size 1
[       OK ] TestPlasmaStore.ContainsTest (3 ms)
[ RUN      ] TestPlasmaStore.GetTest
called plasma_create on conn 4 with size 4 and metadata size 1
Allowing the Plasma store to use up to 1GB of memory.
starting server listening on /tmp/store
[       OK ] TestPlasmaStore.GetTest (2 ms)
[ RUN      ] TestPlasmaStore.MultipleGetTest
Allowing the Plasma store to use up to 1GB of memory.
starting server listening on /tmp/store
New connection with fd 6
called plasma_create on conn 5 with size 4 and metadata size 1
creating object 3e448ce234a4c780739ef29417467c2119b3e062
0x7f9968b6c008 = fake_mmap(131080)
called plasma_create on conn sealing object 3e448ce234a4c780739ef29417467c2119b3e062
5 with size 4 and metadata size 1
creating object 0f0dcf666c3bbb444d10d62c8890a78eb6c163a1
sealing object 0f0dcf666c3bbb444d10d62c8890a78eb6c163a1
[       OK ] TestPlasmaStore.MultipleGetTest (6 ms)
[----------] 3 tests from TestPlasmaStore (11 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (11 ms total)
[  PASSED  ] 3 tests.
Disconnecting client on fd 6
Disconnecting client on fd 6
pcmoritz@saturn:~/tmp/arrow/cpp-build$ Allowing the Plasma store to use up to 1GB of memory.
starting server listening on /tmp/store

pcmoritz@saturn:~/tmp/arrow/cpp-build$

@pcmoritz
Copy link
Contributor Author

Got it now, the ASAN script seems to get confused by the output of the store. Piping that to /dev/null seems to fix it.

@xhochy
Copy link
Member

xhochy commented Jun 21, 2017

@pcmoritz I still have to go over this code: But could there be problems with dlmalloc and the PR #761 that makes jemalloc the default allocator in Arrow?

@wesm
Copy link
Member

wesm commented Jun 21, 2017

Plasma runs as a separate process, is that right?

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jun 21, 2017

Yeah it is a separate process. Also, even in plasma we are not replacing the malloc function with the dlmalloc one (the #define USE_DL_PREFIX in malloc.cc makes sure that the dlmalloc malloc has "dl" as a prefix); and then we call dlmalloc with the prefix manually when we need it to get a chunk of share memory. So for the heap of plasma it will use whichever allocator is the default, right now that's the system malloc but it could be jemalloc in the future if we prefer that.

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Jun 21, 2017

I'm done for now, the next step is to get this merged, rebase Ray and address some of points I haven't addressed yet, and write some more documentation. Shall I rebase this PR now or wait for other changes in Arrow like the jemalloc PR? I hope the remaining Java related test failures go away once it is rebased.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Since your test fails are only the Java things that Uwe fixed, I'm going to go ahead and merge so we can proceed to follow up work. Thank you for the hard labor on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants