Skip to content

Comments

DynamicLinking: Extend "dylink" section with [] of needed libraries#77

Merged
sbc100 merged 1 commit intoWebAssembly:masterfrom
navytux:y/dylink-needed
Dec 11, 2018
Merged

DynamicLinking: Extend "dylink" section with [] of needed libraries#77
sbc100 merged 1 commit intoWebAssembly:masterfrom
navytux:y/dylink-needed

Conversation

@navytux
Copy link
Contributor

@navytux navytux commented Nov 16, 2018

This comes from my work to teach Emscripten dynamic linker to support
library -> library linkage:

emscripten-core/emscripten#7512

The motivation is that without support for DSO -> DSO linking, it becomes
problematic in cases when there are several shared libraries that all
need to use another should-be shared functionality, while linking that
should-be shared functionality to main module is not an option for size
reasons. See above pull-request for more details.

In order to support library -> library linkages, we have to store
somewhere the information about which other libraries, a library needs.

This patch extends "dylink" section with such information, which is
similar to how ELF handles the same situation with DT_NEEDED entries.

/cc @sbc100

This comes from my work to teach Emscripten dynamic linker to support
library -> library linkage:

	emscripten-core/emscripten#7512

The motivation is that without support for DSO -> DSO linking, it becomes
problematic in cases when there are several shared libraries that all
need to use another should-be shared functionality, while linking that
should-be shared functionality to main module is not an option for size
reasons. See above pull-request for more details.

In order to support library -> library linkages, we have to store
somewhere the information about which other libraries, a library needs.

This patch extends "dylink" section with such information, which is
similar to how ELF handles the same situation with DT_NEEDED entries.

/cc @sbc100
@navytux
Copy link
Contributor Author

navytux commented Nov 29, 2018

( refreshed the patch with solving 1 minor conflict after db77cc7 "Fix typo in DynamicLinking.md (#80)" landed )

@navytux
Copy link
Contributor Author

navytux commented Dec 9, 2018

For the reference: the patch that teaches Emscripten about dylink section extension about DSO -> DSO dependencies had entered Emscripten tree:

emscripten-core/emscripten@6410be8c

@sbc100 sbc100 merged commit 864c5c9 into WebAssembly:master Dec 11, 2018
@navytux
Copy link
Contributor Author

navytux commented Dec 11, 2018

Thanks for merging.

@navytux navytux deleted the y/dylink-needed branch December 11, 2018 21:44
llvm-git-migration pushed a commit to llvm-git-prototype/llvm that referenced this pull request Dec 12, 2018
This updates the format of the dylink section in accordance with
recent "spec" change:
  WebAssembly/tool-conventions#77

Differential Revision: https://reviews.llvm.org/D55609

llvm-svn=348989
earl pushed a commit to earl/llvm-mirror that referenced this pull request Dec 13, 2018
llvm-git-migration pushed a commit to llvm/llvm-project that referenced this pull request Mar 13, 2019
This change adds basic support for shared library dependencies
via the dylink section.

See WebAssembly/tool-conventions#77

Differential Revision: https://reviews.llvm.org/D59237

llvm-svn: 356102
dtzWill pushed a commit to llvm-mirror/lld that referenced this pull request Mar 13, 2019
dtzWill pushed a commit to llvm-mirror/llvm that referenced this pull request Mar 13, 2019
earl pushed a commit to earl/llvm-mirror that referenced this pull request Mar 13, 2019
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Apr 8, 2019
This change adds basic support for shared library dependencies
via the dylink section.

See WebAssembly/tool-conventions#77

Differential Revision: https://reviews.llvm.org/D59237

llvm-svn: 356102
dylanmckay pushed a commit to dylanmckay/llvm that referenced this pull request May 16, 2019
@ghost
Copy link

ghost commented May 24, 2021

Why not wait for the first call to be made from a recognized function inside the library, then load in asynchronously as needed? I got dlopen working without the use of SIDE_MODULE/MAIN_MODULE making huge files with empty memory space in between. No header necessary if you know a little bit about what it is trying to load. The wiring is already specified with DEFAULT_LIBRARY_FUNCS_TO_INCLUDE which doesn't seem to do anything for me, at link time. Why is a header needed at "load-time" if this is already specified at "link-time"? Your main program should already know how much it needs to expand with the tableSize in the header. Your main should already know how many functions it is calling from a specific library. So the tableSize header is unnecessary. memorySize header is unnecessary with GROW_MEMORY turned on. memoryAlign isn't necessary if it is default, and if you know where the lib is aligned that should be specified in main too. Maybe instead of trying to fit every single use case, this could narrow down to just doing 1 use case REALLY well. The only other variable is the list of dylibs, so those could also be specified in DEFAULT_LIBRARY_FUNCS_TO_INCLUDE, so you should list both the libs your MAIN_MODULE needs and the libs your SIDE_MODULE needs inside of the mains FUNCS_TO_INCLUDE, that way when the side module loads, main already knows it's expecting more libs. My recommendation is to put as much smarts into main module and leave changing specifications around for things that have no other work-arounds, as a last resort. This is a solid work around, and I'm so glad after 6 months I finally got it working.

@sbc100
Copy link
Member

sbc100 commented May 24, 2021

I think there are a few different points here so I will try to respond them individually. Some of these issues are probably best discussed in emscripten since they talk about emscirpten-specific concepts.

  1. Lazy code loading.
    It sounds like this is what you are asking about when you say. "Why not wait for the first call to be made from a recognized function inside the library, then load in asynchronously as needed?" Indeed lazy code loading is very interesting and very useful. We have a few different approaches to this that we are experiments with in emscripten and many users who would like it. However, I don't know that specifying module dependencies in the "dylink" section make this any easier or harder. The most obvious way to do load module loading would to provide stubs to the module, which, when invoked, will trigger the loading of another module. The primary issue with lazy loading today in emscripten is that its hard to do synchronously (without unwinding the stack) since we can't instantiate wasm modules synchronously on the main thread on the web (at least not in the general case). The second problem, is that we would actually probably require more metedata to make this work because the dynamic loader would need to know which library a given import is provided by (so it can know which of the dependencies to load). To solve this we would probably need to switch to two level system to imports/dependencies.

  2. Why is this metadata needed at load time? The idea is that the general purpose dynamic loads should be able to load a module, without any other external or prior information. i.e. the module should be fully self describing. With emscripten, we could instead embed the metadata in the custome JS loader code which we tune on a per-module basis, but we would like to get to a place where the common/shared dynamic loader can operate of arbitrary modules.

  3. Do we really need memorySize, memoryAlign? Again the main reason is that we want modules to be fully self describing. Indeed in emscripten you today you can call dlopen() and it need to work for modules that we not present at link time so there is no way the loading code and know this information unless the module exports it somehow. Regarding memorySize, it sounds like you are suggesting the dynamic library could just use memory.grow to allocate it own static memory at runtime. This doesn't work in emscripten the heap is managed by malloc / sbrk. It is not possible to call memory.grow to allocate memory without somehow coordinating with malloc/sbrk which generally assume they control the continuous region of memory. Of course, the dynamic library could import a malloc function and use the to allocate its static data region and then use memory.init to load the static data into this location. But I think that sounds like more complicated ABI that the one we have to day which is that a dynamic libraray exports its static data size (via metadata) and then imports __memory_base which is a location that the dynamic linker allocates for it.

  4. Same for tableSize/tableAlign. Its not generally true that the main module knows anything up front about the dynamic libraries that is loading. At least not in the dlopen case, but even in the non-dlopen case, the idea is that dynamic libraries can be updated/re-compiled/changed after the deployment of the main module.

  5. Regarding a simpler way to specify dependencies at link time. I've actually done a lot of work in the last couple of months of improving how emscripten's MAIN_MODULE=2 mode work. As of today you can now just list your dependencies on the command line (like emcc -s MAIN_MODULE=2 libside1.wasm libside2.wasm main.c) and emcripten will take care of including precisely the dependencies of each of the side module. So there should no longer be any need to add to either DEFAULT_LIBRARY_FUNCS_TO_INCLUDE or EXPORTED_FUNCTIONS. Both of these can now be precisely derived from looking at the side module. Of course -s MAIN_MODULE=1 still works for the conservative case and protects against future changes to the libraries by simply including everything at the cost of code size.

@ghost
Copy link

ghost commented May 25, 2021

Again the main reason is that we want modules to be fully self describing. Ah, so you are thinking that a module could be transferable to other apps and the app writer might not know all the details. I guess I was thinking custom libraries like sub-projects.

It looks like I got this working using STANDALONE_WASM and IMPORT_MEMORY, then using the exact same dylink code as what emscripten provides except, I comments out the parseDylinkMetadata since there is none, and I added references to my main program (NOT using MAIN_MODULE) for memory: wasmMemory and __indirect_function_table: wasmTable. I also am not using LINKABLE or RELOCATABLE, because these options produce massive files (~50MB) with lots of AAAs and 000s. I am not yet using --closure because I want it to be slim without it too.

But as you explain that won't necessarily make a difference transferring a compiled module to someone else's app.

I wonder if this is worth patching as a possibly improvement to MAIN_MODULE=2, SIDE_MODULE=2 maybe? Basically, I arrived at this conclusion when looking at the comments on using USE_PTHREADS=1 being the only way to inject our own memory. Maybe this only works because my application (Quake 3) is so well isolated. But my intention was to pretend my module is a pthread, when it is actually a dylink. I also had this idea that dylibs get their own process context and memory space which they can choose to share or not. So I thought it was odd that even if a dylib doesn't want to share memory, that is what is assumed in the library_dylink.js code. Memory sharing must be on, I think.

EDIT: Also, I don't know if it's worth mentioning somewhere in examples, but I didn't know ALLOW_MEMORY_GROWTH also should be applied to side_modules.

@sbc100
Copy link
Member

sbc100 commented May 25, 2021

Again the main reason is that we want modules to be fully self describing. Ah, so you are thinking that a module could be transferable to other apps and the app writer might not know all the details. I guess I was thinking custom libraries like sub-projects.

The model we are going for is similar to the native idea of dynamic/shared library. So the classic examples would be something like zlib which is shared between apps, where the author of the app the author of zlib are different. As you suggest the module can also be part of app that is split out, such as libGameEngine.so. Either way, I think useful if the modules self describe rather than have the dynamic loader embed knowledge about specific libraries.

It looks like I got this working using STANDALONE_WASM and IMPORT_MEMORY, then using the exact same dylink code as what emscripten provides except, I comments out the parseDylinkMetadata since there is none, and I added references to my main program (NOT using MAIN_MODULE) for memory: wasmMemory and __indirect_function_table: wasmTable. I also am not using LINKABLE or RELOCATABLE, because these options produce massive files (~50MB) with lots of AAAs and 000s. I am not yet using --closure because I want it to be slim without it too.

I believe that LINKABLE is the flag you want to avoid if you want smaller files, since that ends up including everything and not doing any DCE. This is what MAIN_MODULE=1 enables, but MAIN_MODULE=2 avoids it.

If you don't use RELOCTABLE (AKA -fPIC) I don't think its possible to link two libraries together since any static data or table slots will override each other. For example, without RELOCTABLE (AKA -fPIC) set, the static data from both modules will both end up at address 1024, but one will be corrupted by the other. Another way of putting it, without RELOCTABLE each module is built to be loaded at a fixed address, both in memory and in the table.

EDIT: Also, I don't know if it's worth mentioning somewhere in examples, but I didn't know ALLOW_MEMORY_GROWTH also should be applied to side_modules.

Are you saying that ALLOW_MEMORY_GROWTH doesn't work today with SIDE_MODULE? It doesn't seem to give any error when I use those options together:

$ emcc hello.c -s SIDE_MODULE -s ALLOW_MEMORY_GROWTH

@ghost
Copy link

ghost commented Jun 9, 2021

@ghost
Copy link

ghost commented Jun 15, 2021

I just had a revelation. This is going to be used to obfuscate compiled/WebAssembly code and add to mess that is already the entire tech industry.

The thing that makes me hate this idea is it promotes this idea that anyone can import functionality, without actually understanding it. No one can see the code anymore, which means they don't have to take responsibility over how well it functions. This happens everywhere already, SourceTree freezes when you copy and paste, all of JetBrains products are terrible at uploading because they imported a library that counts files before it starts uploading. I specifically fell in love with JavaScript when I was 12 years old because I could copy little mouse animations into my Geocities website. Everywhere you look there is free and open source javascript. All the libraries Emscripten imports are free and open source. I've downloaded a dozen libs over the last few months and they all compiled flawlessly on my Mac, including libvpx, libcurl, musl, libopus, RmlUI, SDL, libjpeg, freetype, zlib. All open source, but if they weren't open source, they would be a complete mystery to me.

EDIT: If no one is interested in the moral reasons for open-source. Here's just the pragmatic facts. Fact 1) Google/FAANG companies already obfuscate their code. Fact 2) Self-describing libraries make code obfuscation faster and easier. Fact 3) If society wants smarter programmers, every time someone clicks "View Source" it should be an educational experience. Fact 4) This is how I treat my code and development process, and we deserve better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants