Skip to content

Conversation

@DanielMahu
Copy link
Contributor

@DanielMahu DanielMahu commented Jan 5, 2018

Implement the Table::ReadRows() call, fixes #32

The call returns a RowReader wrapper around the gRPC stream, that can also restart the call with updated arguments.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 5, 2018
Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this early. I think your iterator class does not conform to the requirements for a C++ iterator, and would need some changes to meet them.

Please take a look and let me know if my comments do not make sense.

}

RowReader Table::ReadRows(RowSet row_set, int rows_limit, Filter filter) {
return RowReader(client_, table_name(), row_set, rows_limit, filter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to std::move() the "expensive" parameteres (row_set, filter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

RowReader Table::ReadRows(RowSet row_set, int rows_limit, Filter filter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You return a RowReader but the class is defined in bigtable/client/internal and presumably in the internal namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,158 @@
// Copyright 2017 Google Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RowReader is part of the interface exposed to the user. They may need to define variables of this type:

bigtable::RowReader reader = table.ReadRows(...);

it cannot be hidden in the internal directory or namespace. We want to be able to say "do not use anything in internal/ or bigtable::internal directly".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the timely review! I still owe the tests, but replying to the comments:

done - moved

namespace bigtable {
inline namespace BIGTABLE_CLIENT_NS {
/**
* Object returned by Table::ReadRows(), enumerates rows in the response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Iterate over the results of ReadRows() using the STL idioms"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


public:
/// Signifies that there is no limit on the number of rows to read.
static int const NO_ROWS_LIMIT = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

RowReaderIterator(RowReader* owner, bool is_end)
: owner_(owner), is_end_(is_end) {}

RowReaderIterator& operator++();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are required to implement the postfix ++ operator too, though it might be a bad idea to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I followed your advice and refactored to hold the row in the iterator.

I kept to the absl::optional to be able to use emplace.

Updated the * and -> operators as well by basically snatching the ones from absl::optional.

}

private:
RowReader* const owner_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the constant pointer to RowReader disable copying for the iterator class? And I thought iterators were required to be CopyConstructible?

Please consider adding a test where you verify the compile time assertions about iterators:

  // RowReader::iterator must satisfy the requirements of an InputIterator.
  static void check_iterator_requirements() {
    using base_iterator = std::iterator<std::input_iterator_tag, CellReader>;
    static_assert(std::is_base_of<base_iterator, iterator>::value,
                        "iterator must be an InputIterator"); 
    static_assert(std::is_copy_constructible<iterator>::value,
                         "iterator must be CopyConstructible");
    static_assert(std::is_move_constructible<iterator>::value,
                         "iterator must be MoveConstructible");
    static_assert(std::is_copy_assignable<iterator>::value,
                         "iterator must be CopyAssignable");
    static_assert(std::is_move_assignable<iterator>::value,
                         "iterator must be MoveAssignable");
    static_assert(std::is_destructible<iterator>::value,
                         "iterator must be Destructible");
    static_assert(std::is_swappable<iterator>::value,
                         "iterator must be Swappable");
    // ugly hacks to check that if o is of type iterator then ++o and *o 
    // are valid expressions with the usual return types ...
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are static asserts, I've put them at the top of row_reader.cc in an anonymous namespace. They did fail until I removed the const field.

}

private:
RowReader* const owner_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you hold the already parse and read Row object in the iterator you can implement postfix, make it copy constructible, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

};

std::shared_ptr<DataClient> client_;
absl::string_view table_name_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties the lifecycle of the RowReader to the lifecycle of the Table used to create it. Consider making a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// Number of rows read so far, used to set row_limit in retries.
int rows_count_;

/// Holds the last read row, non-end() iterators all point to it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can have the end() iterators have owner_ == nullptr, and move the Row to the actual iterators. That would make them behave like other iterators do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes so quickly. As you know, the tests need a lot of work, but the classes are pretty close.

namespace {
using base_iterator = std::iterator<std::input_iterator_tag, Row>;

static_assert(std::is_base_of<base_iterator, RowReader::iterator>::value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing these checks. Can you review the requirements for an InputIterator in this doc, or any documentation you like really, and make sure we are not missing anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are exercising the equality and non-equality in tests through assertions. I've also added an explicit use of the postincrement in ReadOneRowIteratorPostincrement.

RowReader::RowReader(std::shared_ptr<DataClient> client,
absl::string_view table_name, RowSet row_set,
int rows_limit, Filter filter,
std::unique_ptr<RPCRetryPolicy> retry_policy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not copy or use retry_policy or backoff_policy. That is because this is WIP and you want to make the interface complete?
If we are going to merge this then we either need a (a) TODO saying why these are unused, or (b) remove them and put the TODO() from wherever this is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see below that you are using backoff_policy_, you probably want to transfer those policies then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

do {
if (NextChunk()) {
parser_->HandleChunk(
std::move(*(response_.mutable_chunks(processed_chunks_))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to figure out what is going on here. Seems like you move the data into response_ and then immediately move it out. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only move out one chunk at a time. I chose to simplify the parser (compared to what it used to do) and have it take single chunks. Therefore RowReader needs to handle multi-chunk responses, which is what happens here and in NextChunk(). If you agree with the approach, I'll add some comments explaining it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, processed_chunks_ is the count of processed chunks not the vector or list of processed chunks. Do you mind naming that member variable processed_chunks_count_?

With that I think no additional comments would be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (NextChunk()) {
parser_->HandleChunk(
std::move(*(response_.mutable_chunks(processed_chunks_))));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer to minimize if/else depth:

if (blah) {
  do thing;
  continue;
}
do other things...

over:

if (blah) {
  do thing;
} else {
  do other things;
}
// end of loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've rearranged the whole thing a bit

std::move(*(response_.mutable_chunks(processed_chunks_))));
} else {
grpc::Status status = stream_->Finish();
if (status.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same preference for shallower if/else here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Reads data from a set of rows.
*
* @param row_set the rows to read from.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blanks between the @param sections?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

class Table {
public:
/// Signifies that `Table::ReadRows()` should not limit the number of rows.
static int const NO_ROWS_LIMIT = RowReader::NO_ROWS_LIMIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use two overloads instead of a magic value... Not sure what is best, I just dislike magic values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero gets special treatment in the protocol, but I agree we can protect the users from accidentally passing it. I've added an overload and enforced that rows_limit, if given, is strictly greater than 0.

static_assert(std::is_destructible<RowReader::iterator>::value,
"RowReader::iterator must be Destructible");

// TODO(C++17): std::is_swappable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO was meant as a "when C++17 is available we can check that iterator works with std::swap() easily". You can probably remove it, assuming the iterator does work with std::swap?


// TODO(C++17): std::is_swappable

// TODO: check that if o is of type iterator then ++o and *o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know the standard tricks to check such things? For example, we could do:

static_assert(
  std::is_convertible<decltype(*std::declval<RowReader::iterator>()), RowReader::value_type>::value,
  "*it when it is of RowReader::iterator type must be convertible to RowReader::value_type>);

You can do something similar for the prefix ++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! done, I think


private:
RowReader* owner_;
absl::optional<Row> row_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking for std::istream_iterator for inspiration on this. Your implementation seems like a modern version of the libstdc++ one:

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-3.4/classstd_1_1istream__iterator.html

I think that is pretty cool.

@DanielMahu
Copy link
Contributor Author

PTAL

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the namespace things before merging. Otherwise LGTM.

#ifndef GOOGLE_CLOUD_CPP_BIGTABLE_CLIENT_INTERNAL_ROWREADERITERATOR_H_
#define GOOGLE_CLOUD_CPP_BIGTABLE_CLIENT_INTERNAL_ROWREADERITERATOR_H_

#include "bigtable/client/version.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#include <thread>

namespace bigtable {
inline namespace BIGTABLE_CLIENT_NS {} // namespace BIGTABLE_CLIENT_NS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are opening and closing the namespace in the same line, are you sure that is what you wanted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, that was not supposed to close there. fixed

inline namespace BIGTABLE_CLIENT_NS {} // namespace BIGTABLE_CLIENT_NS

// RowReader::iterator must satisfy the requirements of an InputIterator.
namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defines an anonymous namespace inside the bigtable namespace, are you sure that is what you wanted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#ifndef GOOGLE_CLOUD_CPP_BIGTABLE_CLIENT_ROW_READER_H_
#define GOOGLE_CLOUD_CPP_BIGTABLE_CLIENT_ROW_READER_H_

#include "bigtable/client/version.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last minute question, we can resolve in a future PR if needed.


if (not last_read_row_key_.empty()) {
// There is a previous read row, so this is a restarted call
row_set_ = row_set_.Intersect(RowRange::Open(last_read_row_key_, ""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this returns an empty row set? That means "everything" but should mean "we are done".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent question, and I've just committed a test that fails in such a scenario. Will follow up with the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of keeping other users from doing the same mistake, I've fixed this by handling the special case in RowSet. PTAL

Copy link
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes to the comments, otherwise LGTM.

namespace btproto = ::google::bigtable::v2;

RowSet RowSet::Intersect(bigtable::RowRange const& range) const {
RowSet result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move result until after the empty check.

if (row_set_.row_keys().empty() and row_set_.row_ranges().empty()) {
return RowSet(range);
}
// Normal case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider // Normal case: find the intersection with row keys and row ranges in the RowSet.

*result.row_set_.add_row_ranges() = std::get<1>(i).as_proto_move();
}
}
// Another special case: intersection is empty, return empty range
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: // Another special case: a RowSet() with no entries means "all rows", but we want "no rows"

@DanielMahu DanielMahu merged commit 69e7cc7 into googleapis:master Jan 15, 2018
@DanielMahu DanielMahu deleted the row-reader branch January 15, 2018 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Table::ReadRows() wrapper

3 participants