ARROW-47: [C++] Preliminary arrow::Scalar object model#3604
ARROW-47: [C++] Preliminary arrow::Scalar object model#3604wesm wants to merge 10 commits intoapache:masterfrom
Conversation
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
If anything this should probably be arrow::Decimal128
There was a problem hiding this comment.
There's uint128 in abseil.io and in the future, int128.
There was a problem hiding this comment.
I don't think this is worth stressing over
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
Swap the order of is_valid and type, you'll get better struct packing & alignment.
There was a problem hiding this comment.
Will do, though if this ever mattered these classes are being used wrong ;)
There was a problem hiding this comment.
Should we write small document in the Kernel header about this subject?
cpp/src/arrow/scalar.h
Outdated
There was a problem hiding this comment.
Doesn't this need a virtual destructor if we're going to be using shared_ptr<Scalar>s?
There was a problem hiding this comment.
Good question. I just added some checks to see if shared_ptr<Buffer> in a subclass is properly destructed and it is. I'm happy to add a virtual dtor for good measure, though
There was a problem hiding this comment.
Ah, I forgot: shared_ptr's block manager is instantiated with the constructed type, so base_ref is pointing to a manager<BinaryScalar> which calls ~BinaryScalar. The memory won't leak with shared_ptr<Scalar>, but it will leak for unique_ptr<Scalar>. Probably best to add the virtual dtor
4fab706 to
fd90264
Compare
|
I added a note that the API is experimental in 0.13 so we can change things. @xhochy could you take a glance at this and let me know if you're OK with it going in? I think this will be the subject of some iteration as we write more kernels that use these objects. |
cpp/src/arrow/compute/kernel.h
Outdated
There was a problem hiding this comment.
given scalars are supposed to be "small" it might be worth comment on why you use a shared_ptr type here (i.e. instead of the previous variant)
There was a problem hiding this comment.
- Scalars of nested types may be semantically small but physically occupy a fairly large amount of memory
- The object model uses subclasses. So you need to use a pointer type of some kind here
cpp/src/arrow/type_traits.h
Outdated
There was a problem hiding this comment.
It seems like maybe one shouldn't exist? Otherwise just an int and a dictionary pointer?
There was a problem hiding this comment.
We're also lacking BuilderType here for fundamentally the same reason. pair<int64_t, std::shared_ptr<Array>> is attractive but is it acceptable for when the index type is uint64_t?
An option is to use/wrap Scalar or std::shared_ptr<Scalar> here. Then BuilderType could be TypeErasedDictionaryBuilder or so:
std:: unique_ptr<ArrayBuilder> builder;
RETURN_NOT_OK(MakeBuilder(pool, dictionary(int32(), string_array), &builder));
for (std::shared_ptr<Scalar> s : user_inputs) {
RETURN_NOT_OK(builder->Append(s));
// Append null for strings not in string_array
// and non-StringScalars
}
There was a problem hiding this comment.
I don't think a decision needs to be made right now. We need to write some kernels that involve dictionaries first
|
@wesm Looks good, the things that most annoys me is the GTEST_VENDORED option ;) |
|
@xhochy I can remove that, I was having some trouble installing gtest 1.8.0 in my toolchain |
Change-Id: Ie7b17cef495b5a6b047e9205cd0408c939774e88
Change-Id: I87f690898c7af7b028f2e50247e96703ede4f457
Change-Id: Ieaa467c3058a44c23f1e3f36176da1592814ffb5
…stamp types Change-Id: I00ca69a9c1ec1ddaa5b1fd9db35e9db09a12eb8b
Change-Id: I98ae62598dbc35c609dfb92f212465bc0d67f58a
|
+1 |
This is the start of a Scalar object model suitable for static and dynamic dispatch to correspond with the existing array and array builder types.
I modified the first aggregation kernel (sum) to use these types for outputs.