Skip to content

Conversation

@asowani
Copy link

@asowani asowani commented Apr 3, 2017

Yes.

Description

These code changes are done for:

  1. Support for ppc64le.
  2. Use of v8 v4.7 (Can we use V8 v4.7 on Linux. #6315)

Thanks!

@srh
Copy link
Contributor

srh commented Apr 10, 2017

In my opinion the changes look great, at least the code diff has a good smell to it. (Some aspects, like build system decisions, I'm not qualified to judge.)

Why is the coroutine stack size increased from 128K to 256K on this platform? I'd like to add a comment telling people why.

@srh
Copy link
Contributor

srh commented Apr 11, 2017

Another weird thing: The change defining V8_NEEDS_BUFFER_ALLOCATOR to 1 in the middle of js_job.cc. That should presumably be done in some build config setting.

@srh
Copy link
Contributor

srh commented Apr 11, 2017

Another potential bug is that in src/build.mk the change only affects things if the compiler is GCC. What if it's another compiler?

@mundaym
Copy link
Contributor

mundaym commented Apr 11, 2017

Hi,

In the context_switching.cc change it looks like the new code saves registers below the stack pointer. All the other architectures decrement the stack pointer and then save the registers. Here are some diagrams of what the stacks look like after the artificial_stack_t constructor is called:

Existing architectures:

|   | <- unallocated memory (and maybe some allocated, but unaligned, memory)
+---+
|   | <- fake caller stack frame (of size min_frame)
+---+
|   |
|   | <- register save area (saved stack pointer points to top of register save area)
|   |
+---+ <- stack pointer

Power:

|   | <- unallocated memory (and maybe some allocated, but unaligned, memory)
+---+ <- stack pointer*
|   |
|   | <- register save area (saved stack pointer points to top of register save area)
|   |
+---+

 * sp += 24 removes the fake caller stack frame

Is there a reason power needs to behave differently? If so then it would be really good to add some comments.

Also I think there is a problem with removing the fake caller stack frame. If I understand the Power ELF V2 ABI correctly, then the min_frame constant should be 4 so that the initial_fun function can safely write LR to the LR save slot. As it stands I think initial_fun will in fact write LR into unallocated memory (beyond the top of the stack).

@asowani
Copy link
Author

asowani commented Apr 12, 2017

@srh @mundaym I appreciate your assessment and comments very much. I am working on the comments and suggestions. I will get back with appropriate code changes.

@asowani
Copy link
Author

asowani commented Apr 13, 2017

Implemented V8_NEEDS_BUFFER_ALLOCATOR related change. Pondering over the context switching discussion.

@asowani
Copy link
Author

asowani commented Apr 13, 2017

@srh About the potential bug in src/build.mk - present work used only GNU compiler hence added those GCC specific flags. Are there any specific compilers you would recommend for testing on ppc64le, apart from GNU's?

@srh
Copy link
Contributor

srh commented Apr 13, 2017

The change there adds general linking flags. I'd expect other compilers to need the same flags.

I don't expect you to test other compilers. If people see flags like that in a compiler-specific branch, they're likely to think the flags are GCC-only for some specific reason. And if Clang was used, you'd expect it to take the same flags.

Another bug in build.mk is that #if ! defined(__powerpc64__) isn't going to do anything there. It's a comment.

A better way might be to make pkg_link-flags in v8.sh output -licui18n, -licuuc, and -ldl. I believe the lines echoed out there get turned into linker parameters without any manipulation. This keeps the v8 library logic in the same place.

@asowani
Copy link
Author

asowani commented Apr 19, 2017

I have made change for V8_NEEDS_BUFFER_ALLOCATOR and the changes suggested in v8.sh, build.mk files.

@asowani
Copy link
Author

asowani commented Apr 19, 2017

Marked the review comments which are implemented.

@asowani
Copy link
Author

asowani commented Apr 21, 2017

@srh @mundaym All the review comments have been incorporated now. Could you please check and let me know if I missed out something? Also please let me know if you have additional queries, comments. Thanks!


#if defined (__powerpc64__)
// POWER ABI requires more number of registers to be saved
// in stack during context switch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. The context switch requires 168 bytes of stack space. I don't see why that would mean the stack size needs to be doubled. What error do you see? Does is still occur with the revised context switching code?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mundaym There are alloca() calls in glibc called by getifaddrs() which tries to
allocate 64KB memory in stack frame for powerpc. If I dont increase the stack
size, there are failures in test cases due to stack overflow checks in rethinkdb.
Hence increasing it for powerpc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this explanation be added as a comment?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AtnNn Added comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code comments.

sp -= 16; // r6-r13 and f8-f15.
#elif defined (__powerpc64__)
sp += 24;
sp -= 20; // r14-r31, toc, cr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking you also need to save f14-f31 and v20-v31 to meet the ABI requirements. The compiler could choose to save something in one of these registers and a coroutine switch might end up clobbering it. It's the sort of thing that is very hard to debug. At the very least there should be a comment here explaining the danger.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mundaym Considering the factors performance and being light weight, I have not saved floating point and vector registers.There are no failures seen when non volatile floating point registers
are not saved and restored during context switch . I have tested on
ubuntu/RHEL powerpc64le systems. Also we can hit any bugs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment above that indicates that a context switch "doesn't swap the floating-point registers"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's safe to do on x86 because none of the SSE/floating point registers are callee-save (and so aren't 'live' on entry to the swapcontext function). On POWER and s390x however the ABIs have callee-save floating point registers and the compiler assumes they will be restored after a function call.

In practice the compiler probably doesn't use the callee-saved floating point registers very often in code like this, so not saving/restoring them might not cause any issues. But it is a bit fragile IMO. Worth a comment anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mundaym Added code comment.

@AtnNn
Copy link
Member

AtnNn commented Apr 26, 2017

Awesome work.

Ideally, if we set use_system_icu=1 when building v8, the configure script should test for ICU.

Otherwise the v8-related changes look okay to me. They aren't uglier than what we already have. Someday we'll get around to do #5112

@AtnNn AtnNn added this to the 2.5 milestone Apr 26, 2017
@asowani
Copy link
Author

asowani commented May 4, 2017

@AtnNn Thanks for adding this PR to 2.5 milestones. Any estimate on timeframe when 2.5 can be expected? Also, any plans to create a new branch for 2.5?
Thanks!

@AtnNn
Copy link
Member

AtnNn commented May 8, 2017

@asowani The branch for 2.5 is next. We don't have a timeframe for 2.5. If I had to guess, I would say 4 to 12 months.

@asowani
Copy link
Author

asowani commented May 12, 2017

@AtnNn Thanks for letting me know about the estimated time frame for 2.5 release. I don't know much about how much effort it generally takes for new release preparation (may be creating a new branch?), so please pardon if I sound too eager. But is there any way we can have 2.5 release a bit earlier? If there is any help required, I am willing to assist. Thanks!

@AtnNn AtnNn modified the milestones: 2.4, 2.5 May 12, 2017
@AtnNn
Copy link
Member

AtnNn commented May 12, 2017

We're trying to get 2.4 ready right now. I'll try to get this patch into 2.4 if it doesn't delay the release.

@asowani
Copy link
Author

asowani commented May 16, 2017

@AtnNn - is v2.4.x the correct branch to check for merged patches for upcoming 2.4 release?

@AtnNn
Copy link
Member

AtnNn commented May 16, 2017

@asowani yes.

@asowani
Copy link
Author

asowani commented May 19, 2017

@AtnNn I have setup a Jenkins job to test v2.4.x branch whenever ppc64le code gets merged. At the moment I have disabled the job as I don't see those changes in the branch, but it can start the moment the changes get merged in the branch. I guess I will get an automated mail from github when this PR is merged, but just in case, I would highly appreciate if you could intimate here once the code is merged. Thanks!

@clnperez
Copy link

@AtnNn -- what's your feeling on whether this will make it in to the 2.4 release or not? It still has the milestone, which gives me hope that it will. 😆

@mchasal
Copy link

mchasal commented Nov 21, 2017

Just following up on the previous comment. Any outlook on whether this will be merging into 2.4 at this point? Thanks!

@srh
Copy link
Contributor

srh commented Nov 22, 2017

I'd say it's likely to make it into 2.4.

@srh
Copy link
Contributor

srh commented Dec 23, 2017

I've rebased and cleaned up the changes in this PR and put them in branches sam/24x/powerpc64 (based off v2.4.x) and sam/powerpc64 (based off next) in my repo at https://github.com/srh/rethinkdb/.

There are two things I'd still like to understand before putting this into 2.4.

First, what is the point of setting V8_NEEDS_BUFFER_ALLOCATOR?

Second, why does getifaddrs use alloca on powerpc64, but not on amd64? Is there an implementation I could look at in order to see this?

Edit: It looks like getifaddrs calls __netlink_request in glibc's sysdeps/unix/sysv/linux/ifaddrs.c, which then has

#ifdef PAGE_SIZE
  /* Help the compiler optimize out the malloc call if PAGE_SIZE
     is constant and smaller or equal to PTHREAD_STACK_MIN/4.  */
  const size_t buf_size = PAGE_SIZE;
#else
  const size_t buf_size = __getpagesize ();
#endif

after which, if buf_size <= 64K, it gets passed to alloca. And the page size could be 64KB.

I don't want to throw in these mysterious configuration options without putting an explanation in the source code.

@srh
Copy link
Contributor

srh commented Dec 23, 2017

There's a third thing. Why is v8 4.7 used for powerpc64? Is it necessary?

@srh
Copy link
Contributor

srh commented Dec 24, 2017

I've updated the branches referenced above to compute the coroutine stack size based on the page size value.

So there are now still two things I'd like to understand:

  • why V8_BUFFER_ALLOCATOR?

  • why v8 4.7?

@asowani
Copy link
Author

asowani commented Feb 5, 2018

@srh The selection of v8 4.7.x was already done in the script but was limited to Windows. Older versions of v8 are not ported to ppc64le and v8 4.7 has been ported to and tested on ppc64le, and is found to be working good on it. V8_NEEDS_BUFFER_ALLOCATOR is required at build time to have 4.7 version of v8.

@RajalakshmiS
Copy link

@srh

Second, why does getifaddrs use alloca on powerpc64, but not on amd64? Is there an implementation I >could look at in order to see this?
This really depends on the pagezise configured on that arch and for powerpc64 it is passed as libc_use_alloca(65536) and it uses alloca().

@asowani
Copy link
Author

asowani commented Feb 14, 2018

@srh Any updates about code merge? Please let me know in case there are further queries. Thanks!

@srh
Copy link
Contributor

srh commented Feb 17, 2018

But what is it about ppc64le that makes v8 4.7 need V8_BUFFER_ALLOCATOR?

@srh
Copy link
Contributor

srh commented Feb 17, 2018

If I check out v8 4.7.80.23 and grep for V8_NEEDS_BUFFER_ALLOCATOR, I don't get any matches, and if I search on Google I don't get any results outside of this issue.

@asowani
Copy link
Author

asowani commented Feb 19, 2018

@srh I think this is with reference to commit e3de540

src/extproc/js_job.cc file has following code (around line 34):

#ifdef V8_NEEDS_BUFFER_ALLOCATOR
class array_buffer_allocator_t : public v8::ArrayBuffer::Allocator {
public:
...

I think the change is required to take care of this.

@srh
Copy link
Contributor

srh commented Feb 19, 2018

Oh, I'm a retard. Will merge this when I get back to a PC.

@srh
Copy link
Contributor

srh commented Feb 21, 2018

In next as of c300a18 and v2.4.x as of 4a71acf.

@srh srh closed this Feb 21, 2018
@asowani
Copy link
Author

asowani commented Feb 22, 2018

@srh Thanks very much for merging the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants