Skip to content

Conversation

@lyubomyr-shaydariv
Copy link
Contributor

@lyubomyr-shaydariv lyubomyr-shaydariv commented Nov 4, 2017

Description

RethinkDB does not support bitwise operations that may be very important and useful in some cases. Despite bitwise operations cannot be implemented easily using the existing RethinkDB expression terms/operators, RethinkDB suggests a workaround of using r.js() where bitwise operations can be expressed with. For example:

const READ = 0x04;
const WRITE = 0x02;
const EXECUTE = 0x01;

const USER_1_PERMISSIONS = READ | WRITE | EXECUTE;
const USER_2_PERMISSIONS = READ;

r.expr({
  user1Write: r.expr([USER_1_PERMISSIONS])
    .map(r.js(`(function (permissions) { return permissions & ${WRITE}; })`))
    .nth(0)
    .ne(0)
    .branch('GRANTED', 'DENIED'),
  user2Write: r.expr([USER_2_PERMISSIONS])
    .map(r.js(`(function (permissions) { return permissions & ${WRITE}; })`))
    .nth(0)
    .ne(0)
    .branch('GRANTED', 'DENIED'),
})

The r.js() approach is not recommended by the official documentation due to performance costs and losing RethinkDB expressiveness. As an alternate approach, the basic bitwise operations can be implemented as built-in first class terms/operators.

The suggested bitwise operations are:

Since the r.and(), r.or(), and r.not() terms are already occupied, the operations above are suggested to have the bit prefix in order to avoid name clashing:

  • bitAnd
  • bitOr
  • bitXor
  • bitNot
  • bitSal/bitShl
  • bitSar
  • bitShr

The shift operation names (bitSal/bitShl, bitSar, and bitShr) borrow the naming mnemonics from the x86 Assembly languge (SAL/SHL, SAR, and SHR).

All the operations, except r.bitNot(), are variadic and can accept any number of arguments and implement both prefix (r.bitOP(a, b, ...)) and infix (a.bitOP(b, ...) or a.bitOP(b).bitOP(...)) forms.

Updated example from the above now becomes:

const READ = 0x04;
const WRITE = 0x02;
const EXECUTE = 0x01;

const USER_1_PERMISSIONS = READ | WRITE | EXECUTE;
const USER_2_PERMISSIONS = READ;

r.expr({
  user1Write: r.expr(USER_1_PERMISSIONS)
    .bitAnd(WRITE)
    .ne(0)
    .branch('GRANTED', 'DENIED'),
  user2Write: r.expr(USER_2_PERMISSIONS)
    .bitAnd(WRITE)
    .ne(0)
    .branch('GRANTED', 'DENIED'),
})

As the result, the query now looks simpler and is cheaper to execute.

RethinkDB protocol update

The protocol now maps the above operations to their respective Protocol Buffers term types:

  • BIT_AND = 191
  • BIT_OR = 192
  • BIT_XOR = 193
  • BIT_NOT = 194
  • BIT_SAL (virtually BIT_SHL) = 195
  • BIT_SAR = 196
  • BIT_SHR = 197

Drivers

Java driver

The Java driver is not included to the main build script for some reason, therefore its source code should be generated and built then separately. No implementation details.

JavaScript driver

The easiest driver to test since it can be tested directly in the Data Explorer. No implementation details.

Python driver

This driver, like the Ruby driver, uses snake case for the bitwise operations:

  • bit_and
  • bit_or
  • bit_xor
  • bit_not
  • bit_sal/bit_shl
  • bit_sar
  • bit_shr

Despite Python supports operator overloading, the bitwise operations are not bound to Python operators since some of them are already occupied by the logical operation terms counterparts (r.and() - &, r.or() - |, and r.not() - ~).

Ruby driver

The easiest driver to implement. The Ruby driver uses snake case similarly to the Python driver. However, the Ruby driver does not implement r.bit_shl(), an equivalent to r.bit_sal(). due to the specifics of the driver implementation (similarly, the Ruby driver implements r.to_json_string() but does not implement the r.to_json() overload as the other drivers do). The shift operators remain unchanged since the rest of operators might already have their operators overloaded for logical operation purposes.


External issues:

@bchavez
Copy link
Contributor

bchavez commented Nov 4, 2017

Pretty cool. It might help if the PR had some ReQL bitwise query tests to go with it. See: https://github.com/rethinkdb/rethinkdb/tree/next/test/rql_test/src/math_logic

Related issues: #1401, #1018

@lyubomyr-shaydariv
Copy link
Contributor Author

@bchavez Thank you for the feedback. Yes, I saw the tests in that directory, but I cannot run the tests for some reason. I have no idea what I was doing wrong, and this is what I've been stuck (logs from a Docker container, but also occurs on my host machine):

Got error while shutting down server at exit: Server cluster_uztbzj stopped unexpectedly with return code: -5
Traceback (most recent call last):
  File "/home/rethinkdb/src/test/rql_test/../common/driver.py", line 73, in endRunningServers
    server.check_and_stop()
  File "/home/rethinkdb/src/test/rql_test/../common/driver.py", line 881, in check_and_stop
    raise RuntimeError('%s %s stopped unexpectedly with return code: %r' % (self.server_type.capitalize(), self._name, self.returncode))
RuntimeError: Server cluster_uztbzj stopped unexpectedly with return code: -5

@srh
Copy link
Contributor

srh commented Nov 6, 2017

@lyubomyr-shaydariv You're under no obligation to write tests, I'd be happy to add the tests for you in order to get this merged.

I'll try and look into the test failure.

@lyubomyr-shaydariv
Copy link
Contributor Author

@srh It would be fantastic! Thank you!

(I do believe I just run the tests in a wrong way, and, if necessary, I'll provide the Docker files I've tried to build (succeeds) and test (fails) with.)

@srh
Copy link
Contributor

srh commented Nov 13, 2017

OK, here's some feedback about bit shifting. I think I'm going to make some changes before merging this so I want to get your opinion.

  1. To be consistent with other ReQL commands like mod, we should require that the values be integers instead of truncating them.
  2. The size of long isn't consistent across platforms. We should use int32_t and uint32_t, or int64_t and uint64_t.
  3. If the right hand argument to shift operators in C++ is out of range, we get undefined behavior, so we need to avoid that.
  4. We don't need an arithmetic / logical right shift distinction because users have the full range of integers from -2^53 to 2^53. If they want logical right shift behavior, they can just use non-negative values.

I think maybe we should make the behavior like this:

  1. Only offer the shift commands SAL and SAR. They require their parameters be integers in the -2^53<=..<2^53 range, the second parameter being non-negative, with the return value being defined to be sal(x, y) => (x * (2**y)) (handling an infinite result with error), and sar(x, y) => floor(x / (2**y)). It's the user's job to attach mod or bitAnd to sal operations if they need to simulate n-bit integers. (It's kind of uncomfortable for us to decide for the user what the sal behavior on too-negative values should be, otherwise.)

I'll note, if we ever add the datatype for unbounded integers ("bignums"), this produces the identical value to what bignum shift behavior looks like. This means we don't have multiple distinct notions of bitshifting floating around. Until we add fixed width integers.

  1. Bitwise and, or, xor, and not, should require integer parameters bounded in the range -2^53<=..<2^53, casts them to int64_t, and performs the integer bitwise operation, then converts back to double.

Having the range be 2^53<=..<2^53 is kind of weird-looking -- it lets not(2^53-1) evaluate to -2^53 and back. Treating it like an int64_t, the 53 least significant bits of the integer can be anything, and the remaining bits are either all 0 or all 1, for positive or negative values.

  1. Maybe I'll implement support for and/or/xor/not on binary objects.

  2. The Java driver should be variadic for the and/or/xor operations.

@lyubomyr-shaydariv
Copy link
Contributor Author

lyubomyr-shaydariv commented Nov 14, 2017

To be consistent with other ReQL commands like mod, we should require that the values be integers instead of truncating them.

I didn’t know that mod requires both arguments to be integer, and I mostly tried to mimic JavaScript behavior and JavaScript had real influence on how it now works in my branch for all bitwise operations. For example, it can easily operate on doubles truncating them where necessary. Why JavaScript? No clue, probably because of Data Explorer. :) Since I don’t write in C++, it may take longer to get it fixed from my side. Therefore all 8 operations should operate on explicit integers only, and if any of arguments are non-integers, an error should be thrown, right?
(EDIT: I'm not sure if I've done it right. Please see https://github.com/lyubomyr-shaydariv/rethinkdb/commits/bitwise-review for the commit with a log message "Make bitwise operations accept integer operands only" (no commit hash since I'm going to amend it if it's done wrong))

The size of long isn't consistent across platforms. We should use int32_t and uint32_t, or int64_t and uint64_t.

Oh, I knew that C/C++ int/long widths are not consistent across platforms, but I totally missed it and what you use for constant-width integers. I’ll fix it, thank you!
(EDIT: Fixed)

If the right hand argument to shift operators in C++ is out of range, we get undefined behavior, so we need to avoid that.

I guess, it should throw an error then?
(EDIT: See below)

We don't need an arithmetic / logical right shift distinction because users have the full range of integers from -2^53 to 2^53. If they want logical right shift behavior, they can just use non-negative values.

I was thinking of logical operations as operations that are consistent to their arithmetic counterparts with slightly different semantics doing exactly the same what JavaScript/node.js do. I don’t have any solid opinion here (+ neither for big numbers nor for binaries), but I think that having both arithmetic and logic bitwise operations may have sense for systems where numbers already have a specific integer range (at least from what I understood in your comment). Sorry it’s really hard to tell much about it, because I was just trying to add a simple bitwise operations scenario to get rid of r.js in my code, and now I see it requires much more design attention than I initially thought. :) You decide.

The Java driver should be variadic for the and/or/xor operations.

You’re right, I missed the , "*"] definition. Shift operations should be variadic too. I'll fix it.
(EDIT: Fixed)

@srh
Copy link
Contributor

srh commented Nov 15, 2017

It should throw an error (because C++ exceptions are used to implement errors in the ReQL interpreter). I'm implementing it right now.

@lyubomyr-shaydariv
Copy link
Contributor Author

@srh I've made some edits at the bitwise-review branch, but I'm really confused on how bitshift operations must really work. I don't believe I made it right (especially (DBL_MANT_DIG + 1) in order to make the right-shift get 0 from any value despite it seems to contradict with what the standard says).

At that branch

const P = 9007199254740992;
const N = -P;
r.range(55)
  .map(e => ({
    '#': e,
    '?<<': r.branch(e.le(53), r(1).bitSal(e), 'undefined'),
    'N>>': r(N).bitShr(e),
    'N>>>': r(N).bitSar(e),
    'P>>': r(P).bitShr(e),
    'P>>>': r(P).bitSar(e),
  }))

produces

# ?<< N>> N>>> P>> P>>>
0 1 18437736874454810000 -9007199254740992 9007199254740992 9007199254740992
1 2 9218868437227405000 -4503599627370496 4503599627370496 4503599627370496
2 4 4609434218613702700 -2251799813685248 2251799813685248 2251799813685248
3 8 2304717109306851300 -1125899906842624 1125899906842624 1125899906842624
4 16 1152358554653425700 -562949953421312 562949953421312 562949953421312
5 32 576179277326712800 -281474976710656 281474976710656 281474976710656
6 64 288089638663356400 -140737488355328 140737488355328 140737488355328
7 128 144044819331678200 -70368744177664 70368744177664 70368744177664
8 256 72022409665839100 -35184372088832 35184372088832 35184372088832
9 512 36011204832919550 -17592186044416 17592186044416 17592186044416
10 1024 18005602416459776 -8796093022208 8796093022208 8796093022208
11 2048 9002801208229888 -4398046511104 4398046511104 4398046511104
12 4096 4501400604114944 -2199023255552 2199023255552 2199023255552
13 8192 2250700302057472 -1099511627776 1099511627776 1099511627776
14 16384 1125350151028736 -549755813888 549755813888 549755813888
15 32768 562675075514368 -274877906944 274877906944 274877906944
16 65536 281337537757184 -137438953472 137438953472 137438953472
17 131072 140668768878592 -68719476736 68719476736 68719476736
18 262144 70334384439296 -34359738368 34359738368 34359738368
19 524288 35167192219648 -17179869184 17179869184 17179869184
20 1048576 17583596109824 -8589934592 8589934592 8589934592
21 2097152 8791798054912 -4294967296 4294967296 4294967296
22 4194304 4395899027456 -2147483648 2147483648 2147483648
23 8388608 2197949513728 -1073741824 1073741824 1073741824
24 16777216 1098974756864 -536870912 536870912 536870912
25 33554432 549487378432 -268435456 268435456 268435456
26 67108864 274743689216 -134217728 134217728 134217728
27 134217728 137371844608 -67108864 67108864 67108864
28 268435456 68685922304 -33554432 33554432 33554432
29 536870912 34342961152 -16777216 16777216 16777216
30 1073741824 17171480576 -8388608 8388608 8388608
31 2147483648 8585740288 -4194304 4194304 4194304
32 4294967296 4292870144 -2097152 2097152 2097152
33 8589934592 2146435072 -1048576 1048576 1048576
34 17179869184 1073217536 -524288 524288 524288
35 34359738368 536608768 -262144 262144 262144
36 68719476736 268304384 -131072 131072 131072
37 137438953472 134152192 -65536 65536 65536
38 274877906944 67076096 -32768 32768 32768
39 549755813888 33538048 -16384 16384 16384
40 1099511627776 16769024 -8192 8192 8192
41 2199023255552 8384512 -4096 4096 4096
42 4398046511104 4192256 -2048 2048 2048
43 8796093022208 2096128 -1024 1024 1024
44 17592186044416 1048064 -512 512 512
45 35184372088832 524032 -256 256 256
46 70368744177664 262016 -128 128 128
47 140737488355328 131008 -64 64 64
48 281474976710656 65504 -32 32 32
49 562949953421312 32752 -16 16 16
50 1125899906842624 16376 -8 8 8
51 2251799813685248 8188 -4 4 4
52 4503599627370496 4094 -2 2 2
53 9007199254740992 2047 -1 1 1
54 "undefined" 1023 -1 0 0

Could you please review? If you could make the necessary fixes at the bitwise-review branch, I'd appreciate it very and very much! Thank you!

@srh
Copy link
Contributor

srh commented Nov 16, 2017

Your use of DBL_MANT_DIG+1 is correct if you want it to be possible to shift 2^53 to zero. Negative numbers will stop at -1. Your code looks very good, especially for somebody that doesn't write C++. The problem that you're encountering is with bitShr -- it's converting to uint64_t (by adding 2^64) and then doing a right shift. This can make sense for 32-bit shifting (if we're copying Javascript) or 64-bit shifting (if we've only got a 64-bit signed integer type, like Java does) but it's pretty weird when the bound on integer range is 2^53.

Before your last update, I had already made a branch to improve performance and also to change the semantic behavior of the functions (to what I think they should be). You can see them in my branch here: https://github.com/srh/rethinkdb/commits/sam/bitwise

I'll go over them now. Some of this is obvious but I want to overexplain the changes because you say you're not used to C++.

The first commit avoids smart quotes in the docs, to be consistent with the rest of the docs, which also avoid smart quotes. (Maybe the docs should use smart quotes consistently instead.)

The second commit address the and/or/not/xor operators. The parameters are now restricted to the range 2^53<=..<2^53. The type of op is changed to a C-style function that works on int64_t values. This avoids the overhead of making intermediate datum_t objects and converting integers back and forth between doubles. (This does not change semantics. However, if 2^53 were a legal argument value, then semantics would have changed for the expression r.or(2^53, 2, 0), because the intermediate acc value, 2^53+2, would no longer be rejected by a call to as_int().) I also initialized op using lambdas, which does exactly the same thing as if they were implemented with static methods or global functions. The lambdas compile to a C-style function pointer because they do not capture any local variables. This just saves on lines of code: less scrolling for the reader.

The third commit changes the API so that bit_sal and bit_sar are the only functions, and they only take two parameters. I think this is better than making the functions variadic, because using a different number of parameters would be a very rare use case. In the C++ implementation, you can see that this time I didn't use lambdas, because the functions are pretty big. Also, I used nullptr (new in C++11) instead of 0. Personally I would have used NULL before, but that's just me.

The third commit also changes the behavior of shifting to that I described before.

My fourth commit is just idiotic. In my sleep-deprived stupor I looked at your most recent changes, noticed bit_not_term_t, and thought, "I didn't change this to check the range properly!" And somehow I managed to make that change to bit_not_term_t without noticing it already had bit_arith_ranged_int in there.

The fifth commit just adds a static assertion.

Edit: It still needs tests. And I might get rid of the function pointer and just use if-then (because the CPU usually runs that faster).

@srh
Copy link
Contributor

srh commented Nov 18, 2017

I added tests, and squashed/merged onto next as of 1601118.

Thank you!

There is still room for debate in terms of the question, what should the user API be. A name other than "sal" and "sar" might be appropriate for the shifting operations. Some people would like binary objects to be supported for the and/or/xor/not operations.

I didn't update the ReQL version, but that shouldn't be necessary because these are brand new commands. But... that should be investigated. Made issue #6550 to make a note of that.

@srh srh closed this Nov 18, 2017
@lyubomyr-shaydariv
Copy link
Contributor Author

@srh Wow, thank you! I didn't have much time the last days to complete everything to meet your detailed review comments, but I really appreciate how you took over all the necessary fixes and made it as it has to be keeping a sharp eye on my mistakes, typos and even tests I was not able to run. This is absolutely fantastic! It was a great time to bring a new microcontribuion to a great tool RethinkDB is and learn new stuff. Thank you very and very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants