DynamicLinking: Extend "dylink" section with [] of needed libraries by navytux · Pull Request #77 · WebAssembly/tool-conventions

navytux · 2018-11-16T13:21:46Z

This comes from my work to teach Emscripten dynamic linker to support
library -> library linkage:

The motivation is that without support for DSO -> DSO linking, it becomes
problematic in cases when there are several shared libraries that all
need to use another should-be shared functionality, while linking that
should-be shared functionality to main module is not an option for size
reasons. See above pull-request for more details.

In order to support library -> library linkages, we have to store
somewhere the information about which other libraries, a library needs.

This patch extends "dylink" section with such information, which is
similar to how ELF handles the same situation with DT_NEEDED entries.

/cc @sbc100

@sbc100

This comes from my work to teach Emscripten dynamic linker to support library -> library linkage: emscripten-core/emscripten#7512 The motivation is that without support for DSO -> DSO linking, it becomes problematic in cases when there are several shared libraries that all need to use another should-be shared functionality, while linking that should-be shared functionality to main module is not an option for size reasons. See above pull-request for more details. In order to support library -> library linkages, we have to store somewhere the information about which other libraries, a library needs. This patch extends "dylink" section with such information, which is similar to how ELF handles the same situation with DT_NEEDED entries. /cc @sbc100

navytux · 2018-11-29T10:34:11Z

( refreshed the patch with solving 1 minor conflict after db77cc7 "Fix typo in DynamicLinking.md (#80)" landed )

navytux · 2018-12-09T21:17:53Z

For the reference: the patch that teaches Emscripten about dylink section extension about DSO -> DSO dependencies had entered Emscripten tree:

emscripten-core/emscripten@6410be8c

navytux · 2018-12-11T21:44:24Z

Thanks for merging.

This updates the format of the dylink section in accordance with recent "spec" change: WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D55609 llvm-svn=348989

This updates the format of the dylink section in accordance with recent "spec" change: WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D55609 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348989 91177308-0d34-0410-b5e6-96231b3b80d8

This change adds basic support for shared library dependencies via the dylink section. See WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D59237 llvm-svn: 356102

This change adds basic support for shared library dependencies via the dylink section. See WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D59237 git-svn-id: https://llvm.org/svn/llvm-project/lld/trunk@356102 91177308-0d34-0410-b5e6-96231b3b80d8

This change adds basic support for shared library dependencies via the dylink section. See WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D59237 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356102 91177308-0d34-0410-b5e6-96231b3b80d8

This change adds basic support for shared library dependencies via the dylink section. See WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D59237 llvm-svn: 356102

This change adds basic support for shared library dependencies via the dylink section. See WebAssembly/tool-conventions#77 Differential Revision: https://reviews.llvm.org/D59237 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356102 91177308-0d34-0410-b5e6-96231b3b80d8

ghost · 2021-05-24T17:13:02Z

Why not wait for the first call to be made from a recognized function inside the library, then load in asynchronously as needed? I got dlopen working without the use of SIDE_MODULE/MAIN_MODULE making huge files with empty memory space in between. No header necessary if you know a little bit about what it is trying to load. The wiring is already specified with DEFAULT_LIBRARY_FUNCS_TO_INCLUDE which doesn't seem to do anything for me, at link time. Why is a header needed at "load-time" if this is already specified at "link-time"? Your main program should already know how much it needs to expand with the tableSize in the header. Your main should already know how many functions it is calling from a specific library. So the tableSize header is unnecessary. memorySize header is unnecessary with GROW_MEMORY turned on. memoryAlign isn't necessary if it is default, and if you know where the lib is aligned that should be specified in main too. Maybe instead of trying to fit every single use case, this could narrow down to just doing 1 use case REALLY well. The only other variable is the list of dylibs, so those could also be specified in DEFAULT_LIBRARY_FUNCS_TO_INCLUDE, so you should list both the libs your MAIN_MODULE needs and the libs your SIDE_MODULE needs inside of the mains FUNCS_TO_INCLUDE, that way when the side module loads, main already knows it's expecting more libs. My recommendation is to put as much smarts into main module and leave changing specifications around for things that have no other work-arounds, as a last resort. This is a solid work around, and I'm so glad after 6 months I finally got it working.

sbc100 · 2021-05-24T19:52:47Z

I think there are a few different points here so I will try to respond them individually. Some of these issues are probably best discussed in emscripten since they talk about emscirpten-specific concepts.

Lazy code loading.
It sounds like this is what you are asking about when you say. "Why not wait for the first call to be made from a recognized function inside the library, then load in asynchronously as needed?" Indeed lazy code loading is very interesting and very useful. We have a few different approaches to this that we are experiments with in emscripten and many users who would like it. However, I don't know that specifying module dependencies in the "dylink" section make this any easier or harder. The most obvious way to do load module loading would to provide stubs to the module, which, when invoked, will trigger the loading of another module. The primary issue with lazy loading today in emscripten is that its hard to do synchronously (without unwinding the stack) since we can't instantiate wasm modules synchronously on the main thread on the web (at least not in the general case). The second problem, is that we would actually probably require more metedata to make this work because the dynamic loader would need to know which library a given import is provided by (so it can know which of the dependencies to load). To solve this we would probably need to switch to two level system to imports/dependencies.
Why is this metadata needed at load time? The idea is that the general purpose dynamic loads should be able to load a module, without any other external or prior information. i.e. the module should be fully self describing. With emscripten, we could instead embed the metadata in the custome JS loader code which we tune on a per-module basis, but we would like to get to a place where the common/shared dynamic loader can operate of arbitrary modules.
Do we really need memorySize, memoryAlign? Again the main reason is that we want modules to be fully self describing. Indeed in emscripten you today you can call dlopen() and it need to work for modules that we not present at link time so there is no way the loading code and know this information unless the module exports it somehow. Regarding memorySize, it sounds like you are suggesting the dynamic library could just use memory.grow to allocate it own static memory at runtime. This doesn't work in emscripten the heap is managed by malloc / sbrk. It is not possible to call memory.grow to allocate memory without somehow coordinating with malloc/sbrk which generally assume they control the continuous region of memory. Of course, the dynamic library could import a malloc function and use the to allocate its static data region and then use memory.init to load the static data into this location. But I think that sounds like more complicated ABI that the one we have to day which is that a dynamic libraray exports its static data size (via metadata) and then imports __memory_base which is a location that the dynamic linker allocates for it.
Same for tableSize/tableAlign. Its not generally true that the main module knows anything up front about the dynamic libraries that is loading. At least not in the dlopen case, but even in the non-dlopen case, the idea is that dynamic libraries can be updated/re-compiled/changed after the deployment of the main module.
Regarding a simpler way to specify dependencies at link time. I've actually done a lot of work in the last couple of months of improving how emscripten's MAIN_MODULE=2 mode work. As of today you can now just list your dependencies on the command line (like emcc -s MAIN_MODULE=2 libside1.wasm libside2.wasm main.c) and emcripten will take care of including precisely the dependencies of each of the side module. So there should no longer be any need to add to either DEFAULT_LIBRARY_FUNCS_TO_INCLUDE or EXPORTED_FUNCTIONS. Both of these can now be precisely derived from looking at the side module. Of course -s MAIN_MODULE=1 still works for the conservative case and protects against future changes to the libraries by simply including everything at the cost of code size.

ghost · 2021-05-25T04:06:32Z

Again the main reason is that we want modules to be fully self describing. Ah, so you are thinking that a module could be transferable to other apps and the app writer might not know all the details. I guess I was thinking custom libraries like sub-projects.

It looks like I got this working using STANDALONE_WASM and IMPORT_MEMORY, then using the exact same dylink code as what emscripten provides except, I comments out the parseDylinkMetadata since there is none, and I added references to my main program (NOT using MAIN_MODULE) for memory: wasmMemory and __indirect_function_table: wasmTable. I also am not using LINKABLE or RELOCATABLE, because these options produce massive files (~50MB) with lots of AAAs and 000s. I am not yet using --closure because I want it to be slim without it too.

But as you explain that won't necessarily make a difference transferring a compiled module to someone else's app.

I wonder if this is worth patching as a possibly improvement to MAIN_MODULE=2, SIDE_MODULE=2 maybe? Basically, I arrived at this conclusion when looking at the comments on using USE_PTHREADS=1 being the only way to inject our own memory. Maybe this only works because my application (Quake 3) is so well isolated. But my intention was to pretend my module is a pthread, when it is actually a dylink. I also had this idea that dylibs get their own process context and memory space which they can choose to share or not. So I thought it was odd that even if a dylib doesn't want to share memory, that is what is assumed in the library_dylink.js code. Memory sharing must be on, I think.

EDIT: Also, I don't know if it's worth mentioning somewhere in examples, but I didn't know ALLOW_MEMORY_GROWTH also should be applied to side_modules.

sbc100 · 2021-05-25T04:33:39Z

Again the main reason is that we want modules to be fully self describing. Ah, so you are thinking that a module could be transferable to other apps and the app writer might not know all the details. I guess I was thinking custom libraries like sub-projects.

The model we are going for is similar to the native idea of dynamic/shared library. So the classic examples would be something like zlib which is shared between apps, where the author of the app the author of zlib are different. As you suggest the module can also be part of app that is split out, such as libGameEngine.so. Either way, I think useful if the modules self describe rather than have the dynamic loader embed knowledge about specific libraries.

It looks like I got this working using STANDALONE_WASM and IMPORT_MEMORY, then using the exact same dylink code as what emscripten provides except, I comments out the parseDylinkMetadata since there is none, and I added references to my main program (NOT using MAIN_MODULE) for memory: wasmMemory and __indirect_function_table: wasmTable. I also am not using LINKABLE or RELOCATABLE, because these options produce massive files (~50MB) with lots of AAAs and 000s. I am not yet using --closure because I want it to be slim without it too.

I believe that LINKABLE is the flag you want to avoid if you want smaller files, since that ends up including everything and not doing any DCE. This is what MAIN_MODULE=1 enables, but MAIN_MODULE=2 avoids it.

If you don't use RELOCTABLE (AKA -fPIC) I don't think its possible to link two libraries together since any static data or table slots will override each other. For example, without RELOCTABLE (AKA -fPIC) set, the static data from both modules will both end up at address 1024, but one will be corrupted by the other. Another way of putting it, without RELOCTABLE each module is built to be loaded at a fixed address, both in memory and in the table.

EDIT: Also, I don't know if it's worth mentioning somewhere in examples, but I didn't know ALLOW_MEMORY_GROWTH also should be applied to side_modules.

Are you saying that ALLOW_MEMORY_GROWTH doesn't work today with SIDE_MODULE? It doesn't seem to give any error when I use those options together:

$ emcc hello.c -s SIDE_MODULE -s ALLOW_MEMORY_GROWTH

ghost · 2021-06-09T17:50:42Z

How about something like this but ported to webassembly? https://www.usenix.org/legacy/publications/library/proceedings/usenix05/tech/general/full_papers/collberg/collberg_html/main.html

It's called SLINKY

ghost · 2021-06-15T05:48:46Z

I just had a revelation. This is going to be used to obfuscate compiled/WebAssembly code and add to mess that is already the entire tech industry.

The thing that makes me hate this idea is it promotes this idea that anyone can import functionality, without actually understanding it. No one can see the code anymore, which means they don't have to take responsibility over how well it functions. This happens everywhere already, SourceTree freezes when you copy and paste, all of JetBrains products are terrible at uploading because they imported a library that counts files before it starts uploading. I specifically fell in love with JavaScript when I was 12 years old because I could copy little mouse animations into my Geocities website. Everywhere you look there is free and open source javascript. All the libraries Emscripten imports are free and open source. I've downloaded a dozen libs over the last few months and they all compiled flawlessly on my Mac, including libvpx, libcurl, musl, libopus, RmlUI, SDL, libjpeg, freetype, zlib. All open source, but if they weren't open source, they would be a complete mystery to me.

EDIT: If no one is interested in the moral reasons for open-source. Here's just the pragmatic facts. Fact 1) Google/FAANG companies already obfuscate their code. Fact 2) Self-describing libraries make code obfuscation faster and easier. Fact 3) If society wants smarter programmers, every time someone clicks "View Source" it should be an educational experience. Fact 4) This is how I treat my code and development process, and we deserve better.

navytux mentioned this pull request Nov 16, 2018

Teach dynamic linking to handle library -> library dependencies emscripten-core/emscripten#7512

Merged

navytux force-pushed the y/dylink-needed branch from 576dff2 to 4cb9313 Compare November 29, 2018 10:33

sbc100 approved these changes Dec 11, 2018

View reviewed changes

sbc100 merged commit 864c5c9 into WebAssembly:master Dec 11, 2018

navytux deleted the y/dylink-needed branch December 11, 2018 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

DynamicLinking: Extend "dylink" section with [] of needed libraries#77

DynamicLinking: Extend "dylink" section with [] of needed libraries#77
sbc100 merged 1 commit intoWebAssembly:masterfrom
navytux:y/dylink-needed

navytux commented Nov 16, 2018 •

edited

Loading

Uh oh!

navytux commented Nov 29, 2018

Uh oh!

navytux commented Dec 9, 2018

Uh oh!

navytux commented Dec 11, 2018

Uh oh!

ghost commented May 24, 2021 •

edited by ghost

Loading

Uh oh!

sbc100 commented May 24, 2021

Uh oh!

ghost commented May 25, 2021 •

edited by ghost

Loading

Uh oh!

sbc100 commented May 25, 2021 •

edited

Loading

Uh oh!

ghost commented Jun 9, 2021

Uh oh!

ghost commented Jun 15, 2021 •

edited by ghost

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

navytux commented Nov 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

navytux commented Nov 29, 2018

Uh oh!

navytux commented Dec 9, 2018

Uh oh!

navytux commented Dec 11, 2018

Uh oh!

ghost commented May 24, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented May 24, 2021

Uh oh!

ghost commented May 25, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jun 9, 2021

Uh oh!

ghost commented Jun 15, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

navytux commented Nov 16, 2018 •

edited

Loading

ghost commented May 24, 2021 •

edited by ghost

Loading

ghost commented May 25, 2021 •

edited by ghost

Loading

sbc100 commented May 25, 2021 •

edited

Loading

ghost commented Jun 15, 2021 •

edited by ghost

Loading