Skip to content

C# Fix loading the native library on dotnet cli#7282

Closed
jskeet wants to merge 5 commits intogrpc:masterfrom
jskeet:fix-native
Closed

C# Fix loading the native library on dotnet cli#7282
jskeet wants to merge 5 commits intogrpc:masterfrom
jskeet:fix-native

Conversation

@jskeet
Copy link
Copy Markdown
Contributor

@jskeet jskeet commented Jul 8, 2016

Two separate commits:

  • Add a property to GrpcEnvironment to allow the user to specify the native library
  • Change how the library is loaded and used

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok to test.

3 similar comments
@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok to test.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok to test.

@grpc-kokoro
Copy link
Copy Markdown
Collaborator

Thanks for your pull request. The automated tests will run as soon as one of the admins verifies this change is ok to test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the stackoverflow post, this works for C __stdcall functions, which are common for 32 bits windows? Not sure if all of the 32bit windows libraries use __stdcall, but would there be C functions that the DLLImport could find that wouldn't match these names?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it this way: running dumpbin /exports grpc_csharp_ext.dll showed everything being done this way.

Now it could be that if we change how the library is built, that would change what's exported...

(We could try also just finding the name without any prefix/suffix, of course.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, saw here https://msdn.microsoft.com/en-us/library/aa984739(v=vs.71).aspx that DLLImport was limited to __std_call anyways, so I think its the same then.

@apolcyn
Copy link
Copy Markdown
Contributor

apolcyn commented Jul 8, 2016

Not sure why some of the tests look to be unable to load the native library

@jskeet
Copy link
Copy Markdown
Contributor Author

jskeet commented Jul 8, 2016

Will have a look over the weekend, or on Monday. Fingers crossed it's something simple...

@jskeet
Copy link
Copy Markdown
Contributor Author

jskeet commented Jul 9, 2016

I've had a thought - I suspect it's due to where the Grpc.Core csproj files copies the native library - basically the code expects to find it as it's packaged in nuget. Fingers crossed that'll be easy to fix...

@jskeet
Copy link
Copy Markdown
Contributor Author

jskeet commented Jul 9, 2016

Okay, that seems to have improved the tests - not sure why the basic and performance tests are failing though. The basic tests doesn't seem to be the C# ones anyway...

@pgrosu
Copy link
Copy Markdown

pgrosu commented Jul 9, 2016

Hi Jon,

To make it easier to read I organized this in two parts, one for each test:

The Basic tests

The reason the Basic tests failed is as follows, and mainly because of the dbg,c,linux build is failing:

  1. I will reference the log which is located here:

https://grpc-testing.appspot.com/job/gRPC_pull_requests/10767/config=dbg,language=c,platform=linux/console

  1. The main error happens because the /home/jenkins/workspace/gRPC_pull_requests/config/dbg/language/c/platform/linux directory cannot be created created:
Building remotely on grpc-jenkins-linux2 (linux) in workspace /home/jenkins/workspace/gRPC_pull_requests/config/dbg/language/c/platform/linux
java.io.IOException: Failed to mkdirs: /home/jenkins/workspace/gRPC_pull_requests/config/dbg/language/c/platform/linux
    at hudson.FilePath.mkdirs(FilePath.java:1191)
    at hudson.model.AbstractProject.checkout(AbstractProject.java:1268)
    at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604)
    at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
    at hudson.model.Run.execute(Run.java:1720)
    at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:410)

The reason for this is that the directory cannot be created for several reasons:

  • The jenkins user does not have proper permissions to create a directory somewhere along the path /home/jenkins/workspace/gRPC_pull_requests/config/dbg/language/c/platform/linux
  • The SSH keys might not be properly set up.
  • Or another reason that eventually falls back to one or both of the above two.

I do not have access to the Google Jenkins setup to help with troubleshooting it, and determine what might be the precise cause - or if it is something else - but below is a link to a detailed manual for Distributed Builds using Jenkins:

https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds

The Performance smoke test

The reason the Peformance smoke test failed is as follows:

  1. I will reference the log which is located here:

https://grpc-testing.appspot.com/job/gRPC_performance_pull_requests/4389/console

  1. The performance test is triggered by the following script:

tools/jenkins/run_performance.sh

And the actual line that runs it is this:

tools/run_tests/run_performance_tests.py -l c++ node ruby csharp python --netperf --category smoketest

If the run_performance_tests.py script fails, it will cascade up to Jenkins to return the Build step 'Execute shell' marked build as failure status.

  1. The reason why it is failing is because of the following, stemming from the run_worker_python.sh script:
++ dirname tools/run_tests/performance/run_worker_python.sh
+ cd tools/run_tests/performance/../../..
+ PYTHONPATH=src/python/grpcio_tests:src/python/gens
+ py27/bin/python src/python/grpcio_tests/tests/qps/qps_worker.py --driver_port=10500
I0709 08:34:59.354176604   21856 ev_epoll_linux.c:84]        epoll engine will be using signal: 36
D0709 08:34:59.354213858   21856 ev_posix.c:106]             Using polling engine: epoll
Traceback (most recent call last):
  File "src/python/grpcio_tests/tests/qps/qps_worker.py", line 61, in <module>
    run_worker_server(args.port)
  File "src/python/grpcio_tests/tests/qps/qps_worker.py", line 43, in run_worker_server
    server = grpc.server((), futures.ThreadPoolExecutor(max_workers=5))
  File "/home/jenkins/workspace/gRPC_performance_pull_requests/py27/local/lib/python2.7/site-packages/grpc/__init__.py", line 1230, in server
    return _server.Server(thread_pool, () if handlers is None else handlers)
  File "/home/jenkins/workspace/gRPC_performance_pull_requests/py27/local/lib/python2.7/site-packages/grpc/_server.py", line 739, in __init__
    completion_queue, server, generic_handlers, thread_pool)
  File "/home/jenkins/workspace/gRPC_performance_pull_requests/py27/local/lib/python2.7/site-packages/grpc/_server.py", line 603, in __init__
    self.generic_handlers = list(generic_handlers)
TypeError: 'ThreadPoolExecutor' object is not iterable
Exception AttributeError: "'Server' object has no attribute '_state'" in <bound method Server.__del__ of <grpc._server.Server object at 0x7f7ee618dcd0>> ignored

The reason the above is failing, is because in line 603 of the src/python/grpcio/grpc/_server.py script, it tries to make generic_handlers a list which it cannot, since it is a ThreadPoolExecutor object - (the AttributeError exception of no attribute '_state' happens in lines 756-757, and is inconsequential since the object is not built because of the first error):

self.generic_handlers = list(generic_handlers)

A ThreadPoolExecutor is a type of object as noted in the following link, under section 17.4.2:

https://docs.python.org/3/library/concurrent.futures.html

To actually create a list of objects, it might be best to append them as follows:

>>> class MyClass:
...     """A simple example class"""
...     i = 12345
...
>>> listOfObjects = []
>>> listOfObjects.append(MyClass)
>>> listOfObjects.append(MyClass)
>>> listOfObjects
[<class __main__.MyClass at 0x7ffff7f0a6b0>, <class __main__.MyClass at 0x7ffff7f0a6b0>]
>>> type(listOfObjects)
<type 'list'>

Hope it helps,
Paul

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this file is generated using a mako template
(e.g. you need to update the template and run tools/buildgen/generate_project.sh - otherwise the sanity test will fail to indicate that)

Feel free to look at my PR #7295 (I am updating templates/src/csharp/build_options.include there).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I suspect I haven't got everything I need on my Windows box to run that (as it's failing, in a way which is slightly annoying to debug due bash closing automatically on Windows...) Will look into it.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jon,

I would just open a cmd.exe window, and launch a Bash shell via Cygwin as follows:

c:\cygwin\bin\bash.exe --login -i

Then proceed to where your Bash script is located, and run it manually as on a normal Linux/Unix box. The drives on Windows are located under /cygdrive like this:

$ ls /cygdrive
c

This way when running the script, you can see the debug information without having to worry that the window will close.

Hope it helps,
~p

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I don't have cygwin installed and don't wish to go that way - I've made some progress on Linux, although the script is now failing in boringssl. I have other more urgent demands at the moment, but will come back to this when I get a chance.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, we'll be here if you need us. Not sure if it's the same thing, but I've run into issues with BoringSSL previously, where I had to make a change to the Makefile to disable hidden visibility:

$ cp Makefile Makefile-orig
$ sed s/-fvisibility=hidden//g Makefile > Makefile-not-hidden
$ cp Makefile-not-hidden Makefile

I've posted the whole thing with more details under issue #7298

~p

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very quick response - will give that a try while I'm waiting for other build services to probably fail. Today is apparently "swear at infrastructure" day :(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked it out now - I didn't have go installed. Looking better now. I suspect that CONTRIBUTING.md could do with being updated for the various dependencies required before running the toolchain. (I didn't have pyyaml or mako installed before today either...)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Glad to hear it's working out :)

@jtattermusch
Copy link
Copy Markdown
Contributor

Overall, I think the changes in this PR are reasonable.

C# tests are passing, which is good (but currently they are only running with .NET and mono, not CoreCLR).
it might be worth checking coreCLR tests manually:
tools/run_tests/run_tests.py -l csharp --compiler coreclr (and optionally --use_docker if you don't have dotnet cli installed locally).

We also have tests for distribution packages (the nugets) - but they have to be spawned manually as the build process is a bit more sophisticated:
I've started https://grpc-testing.appspot.com/view/Artifacts/job/gRPC_build_artifacts/1214/ and eventually, nuget packages will be built from the native artifacts built here and they will be tested by the https://grpc-testing.appspot.com/view/Artifacts/job/gRPC_distribtest/ build (I'll post a link there once the previous steps finish).
For dotnet CLI environment, we don't have any distribtests yet (currently only for desktop .NET 45 on windows and mono on Linux and Mac) - so with dotnet CLI the nugets need to be tested manually.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not planning to use this property anywhere in this PR, right?
I think it would make sense to add it in a separate PR, as this property is only intended for expert use and for the things we are trying to fix here, no one should be relying on it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in case you insist on adding this property here, please mark it as experimental API that can change any time - that's what I've done with other properties I wasn't 100% sure about)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move it to a separate PR, and mark it as experimental as suggested.

jskeet added 3 commits July 11, 2016 06:55
The goal of this is to fix grpc#7230. This turned out to be a little trickier than I thought,
but I'm happy with the result.

The changes here are:

- The layout in the nuget package; the files are now in /runtimes/{os-architecture}/native/{library}
- The targets file used to copy those files in msbuild-based projects; note that we now don't build up a folder structure
- The way the functions are found

Before this change, on Linux and OSX we used to find library symbols manually, and use DllImport on Windows.
With this change, the name of the library file changes based on architecture, so DllImport doesn't work. Instead, we
have to use GetProcAddress to fetch the function. This is further convoluted by the convention on Windows-x86 to prefix
the function name with _ and suffix it based on the stack size of the arguments. We can't easily tell the
argument size here, so we just try 0, 4, 8...128. (128 bytes should be enough for anyone.) This is inefficient, but
it's a one-time hit with a known number of functions, and doesn't seem to have any significant impact.

The benefit of this in code is we don't need the DllImports any more, and we don't need to conditionally use FindSymbol -
we just use it for everything, so things are rather more uniform and tidy.

The further benefit of this is that the library name is no longer tied to a particular filename format - so if someone
wanted to have a directory with the libraries for every version in, with the version in the filename, we'd handle that just fine.

I have tested this on Windows with a console app:
- Under dotnet cli, only on x64 (as running anything on x86 seems to be broken in dotnet cli at the moment)
- Under msbuild/csproj, in both x64 and x86
- Under dnx, in both x64 and x86

I have *not* tried this on Linux or OSX, but those paths haven't changed much, so I'd expect them to work fine (or
blow up immediately in CI if they fail; a failure here is unlikely to be subtle).
I suspect the differences in NativeDeps are the only ones required to
get the tests working, but we'll probably have to change everything
eventually. It's not clear why we have to specify the mappings in *so*
many places though - we should look at tidying that up.
This means we'll get both x86 and x64 versions on deployment,
which is a waste for environments where the right version would be
picked, but fixes issues where the wrong architecture is picked due
to dotnet cli tooling bugs.

Also changed the project template, but I'll need to run the generators
on a different machine as it looks like they don't work on Windows (at
least without more effort).
@apolcyn
Copy link
Copy Markdown
Contributor

apolcyn commented Jul 15, 2016

LGTM

@jskeet
Copy link
Copy Markdown
Contributor Author

jskeet commented Jul 15, 2016

@apolcyn: This isn't the PR that should be merged. See #7401. I'll close this now to avoid confusion :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants