Skip to content

Add wasm to wat backend#4

Merged
ubaidsk merged 26 commits intocertik:masterfrom
ubaidsk:add_wasm_to_wat
Jun 1, 2022
Merged

Add wasm to wat backend#4
ubaidsk merged 26 commits intocertik:masterfrom
ubaidsk:add_wasm_to_wat

Conversation

@ubaidsk
Copy link
Copy Markdown
Collaborator

@ubaidsk ubaidsk commented May 24, 2022

This adds a wasm_to_wat backend. As this is the first version, it might not be the best/optimal in terms of performance or code size.

Notes:

  1. Currently, it uses std::vector instead of our custom Vec.
  2. It supports the operations supported by the current lfortran wasm backend.
  3. Few parts in code are commented (as they are not used currently), for example: https://github.com/Shaikh-Ubaid/test_wasm/blob/dd66795af2c832a4aaed7bf56addd7df1bc46e64/wasm_to_wat.hpp#L15-L26. They are currently present to indicate the possibility of having them.
  4. There is a WAT_DEBUG and a DEUBUG macros provided at the top. They were useful for me to to debug the decoding of wasm. In the final version (that we will add to LFortran), probably we may/should remove them.
  5. For testing purposes, wat_test.f90 and test2.wasm files are provided. Detailed steps for testing/executing the wasm_to_wat are below.

Steps to test/execute the wasm_to_wat backend

  1. Use LFortran to generate the wasm file for wat_test.f90. We can use the following command for the same (assuming that we have lfortran binary in /home/user/bin and /home/user/bin is included in the PATH variable. (more details here))
lfortran wat_test.f90 --backend wasm
  1. Rename a.out to test2.wasm.
mv a.out test2.wasm
  1. Compiile wasm_to_wat.cpp
g++ wasm_to_wat.cpp
  1. Execute generated a.out
./a.out

Output:

(module
    (func $0
        (param i32) (result i32)
        (local i32)
        local.get 0
        local.get 0
        i32.mul
        local.set 1
        local.get 1
        return
    )
    (func $1
        (param i32 i32) (result i32)
        (local i32)
        local.get 0
        local.get 1
        i32.add
        local.set 2
        local.get 2
        return
    )
    (func $2
        (param i32) (result i32)
        (local i32 i32)
        i32.const 3
        local.set 2
        local.get 2
        local.get 0
        call 0
        i32.mul
        local.set 1
        local.get 1
        return
    )
    (func $3
        (param i32 i32) (result i32)
        (local i32)
        local.get 0
        local.get 1
        i32.add
        local.set 2
        local.get 2
        return
    )
    (export "a_sqr" (func $0))
    (export "add" (func $1))
    (export "computecirclearea" (func $2))
    (export "my_add" (func $3))
)

For testing the wat printed on the console:

  1. Paste our obtained wat in the WAT box on https://webassembly.github.io/wabt/demo/wat2wasm/
  2. Use the following code in the JS box to export and test our functions that were defined in wat_test.f90
const wasmInstance =
      new WebAssembly.Instance(wasmModule, {});
const { a_sqr, add, computecirclearea, my_add } = wasmInstance.exports;

console.log(add(-12, 15));
console.log(my_add(-12, 15));
console.log(a_sqr(4));
console.log(computecirclearea(5));
console.log("Success!")
  1. The output is present in the JS LOG box and is as follows
3
3
16
75
Success!
  1. It can be verified that the output is the same as in https://gitlab.com/lfortran/lfortran/-/merge_requests/1713#note_929581644

Please possibly review and please share feedback.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented May 24, 2022

Doubts:

  1. In LFortran, where do we add the wasm_to_wat.cpp and wasm_to_wat.hpp files (or the code in them)?
  2. How are we planning to call the get_wasm() function?
  3. Is the usage of classes (and function overriding for printing) for Instruction fine?

@ubaidsk ubaidsk changed the title Add wasm to wat Add wasm to wat backend May 24, 2022
@certik
Copy link
Copy Markdown
Owner

certik commented May 24, 2022

Excellent, great job!

Let's brainstorm if this can be simplified. I think WASM is a file format, just like ELF or Mach-O. So you read the sections and you store them in some internal data structure, as you do. All the data structures are really simple, such as exports and func_types. As it gets to the instructions section however, there I would not create an intermediate structure to hold those, as there will be a lot of them --- rather why not just call a virtual function / visitor in the switch that decodes the WASM? Then you simply subclass it in the in the WAT writer, and implement. I would use the same CRTP pattern that LFortran uses, which is effectively compile time polymorphism, so no overhead. For large WASM file, I think this will be optimal --- the list of functions I think can be read efficiently. It's the function bodies (instructions) that will be very long. We effecrtively treat the WASM binary as our representation, and just do a switch over the instructions and immediately call the WAT write code.

Then in WASM->Mach-O/arm, backend, I simply implement a different writer.

@certik
Copy link
Copy Markdown
Owner

certik commented May 24, 2022

To be specific, this code:

                case 0x10: {  // call func
                    uint32_t index = read_unsinged_num(offset);
                    codes[i].instructions.push_back(
                        std::make_unique<CallInst>(cur_byte, index));
                    break;
                }

Is very slow, because make_unique is slow. Even with our fast custom allocator, it is slower than just this:

                case 0x10: {  // call func
                    uint32_t index = read_unsinged_num(offset);
                    self().visit_CallInst(cur_byte, index);
                    break;
                }

You can still create a subclass visitor where you implement visit_CallInst() as follows:

void visit_CallInst(uint32_t cur_byte, uint32_t index) {
                    codes[i].instructions.push_back(
                        std::make_unique<CallInst>(cur_byte, index));
}

If you wanted to create an intermediate representation. But you can also just simply print out WAT right away in the visitor. With the CRTP pattern, there is no overhead of this approach.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented May 24, 2022

rather why not just call a virtual function / visitor in the switch that decodes the WASM?

doubt: do we need to implement the visitor just for Instruction or for each of FuncType, Export, Instruction, Code?

I would use the same CRTP pattern that LFortran uses

doubt: do we need to write asdl code that generate the visitors? Or can we directly define the visitors ourselves without involvement of asdl?

But you can also just simply print out WAT right away in the visitor

wasm and WAT differ slightly. For example: in WAT, the function signature and its body are together, where as, in wasm, firstly the function types are declared in the type section and then they are referenced in the function section and then their code (that is their body) is defined in the code section. I am unsure if it might be possible to print WAT directly while visiting wasm. I will give it try and share my attempt/experience.

@certik
Copy link
Copy Markdown
Owner

certik commented May 24, 2022

Only for Instruction.

No asdl.

What I am imagining is that you load function_types as you do currently. But then when you iterate over instructions (for each function), it would call the visitor.

So the WASM->WAT would create the function signature, as it does now, using the function_types. But then when it is printing the instructions, it would use the visitor.

I think this design is equivalent to the Mach-O (and ELF) reader/writer: you have to read the format, you load all the metadata in memory (Mach-O has info about the sections, symbols etc.; WASM has info about function_types / signatures). But then when you disassemble the actual machine instructions (both in WASM or Mach-O or ELF), you do it via the visitor. And when we print WASM/ELF/Mach-O, I think the assembler is only used for the actual instructions, all the metadata is handled separately.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented May 26, 2022

I tried using a visitor like pattern, although I am not sure if this is the way I was supposed to do. Please possibly review and please share feedback.

If in case, this code differs majorly from what actually we needed, please do let me know.

@certik
Copy link
Copy Markdown
Owner

certik commented May 26, 2022

Rather than:

void visit_ControlInstruction(std::string &result, std::string &indent,
                              uint32_t &offset) {
    uint8_t cur_byte = wasm_bytes[offset++];
    switch (cur_byte) {
        case 0x0F: {  // return
            result += indent + "return";
            break;
        }
        case 0x10: {  // call function
            uint32_t func_index = read_unsinged_num(offset);
            result += indent + "call " + std::to_string(func_index);
            break;
        }
        default: {
            std::cout << "Control Instruction (" << std::hex << cur_byte
                      << std::dec;
            std::cout << ") Not yet supported" << std::endl;
            break;
        }
    }

    return;
}

I would have:

class WATVisitor : public ASR::BaseWASMVisitor<WATVisitor>
{
    std::string src;
...
    void visit_Return() {
        src += indent + "return";
    }
    
    void visit_FunctionCall(uint32_t func_index) {
        src += indent + "call " + std::to_string(func_index);
    }

    void visit_LocalSet(uint32_t local_index) {
        src += indent + "local.set " + std::to_string(local_index);
    }
}

@certik
Copy link
Copy Markdown
Owner

certik commented May 26, 2022

The decored of just the instructions (everything else is loaded ahead of time) would at the core contain just:

template <class Derived>
class BaseWASMVisitor
{
private:
    Derived& self() { return static_cast<Derived&>(*this); }
public:

    void decode_function_instructions(uint32_t offset, Vec<uint8_t> wasm_bytes) {
        uint8_t cur_byte = wasm_bytes[offset++];
        while (cur_byte != 0x0B) {
            switch (cur_byte) {
...
                case 0x21: {  // local.set
                    uint32_t index = read_unsinged_num(offset);
                    self().visit_LocalSet(index);
                    break;
                }
                case 0x6A: {  // i32.add
                    self().visit_I32Add();
                    break;
                }
                case 0x6B: {  // i32.sub
                    self().visit_I32Sub();
                    break;
                }
                case 0x6C: {  // i32.mul
                    self().visit_I32Mul();
                    break;
                }
                case 0x6D: {  // i32.div_s
                    self().visit_I32Div();
                    break;
                }
                case 0x10: {  // call func
                    uint32_t index = read_unsinged_num(offset);
                    self().visit_FunctionCall(index);
                    break;
                }
                case 0x0F:{
                    self().visit_Return();
                    break;
                }
                default: {
                    std::cout << "Error: Instruction " << std::to_string(cur_byte)
                              << " not supported" << std::endl;
                    exit(1);
                }
            }
    }

    void visit_Return() { throw LFortran::LFortranException("visit_Return() not implemented"); }
    void visit_FunctionCall(uint32_t /*func_index*/) { throw LFortran::LFortranException("visit_FunctionCall() not implemented"); }

};

wasm_to_wat.cpp Outdated
result += ")";

std::string inst_indent = "\n ";
visit_Instructions(result, inst_indent, codes[i].insts_start_index);
Copy link
Copy Markdown
Owner

@certik certik May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
visit_Instructions(result, inst_indent, codes[i].insts_start_index);
{
v = WASMVisitor();
v.decode_function_instructions(codes[i].insts_start_index, wasm_bytes);
result += v.src;
}

@certik
Copy link
Copy Markdown
Owner

certik commented May 26, 2022

I don't think we need to, but if we ever wanted an intermediate representation (IR), we can do it like this, by creating a separate visitor:

class IRVisitor : public ASR::BaseWASMVisitor<IRVisitor>
{
    std::string src;
...
    void visit_Return() {
        codes.push_back(ReturnInst());
    }
    
    void visit_FunctionCall(uint32_t func_index) {
        codes.push_back(FunctionCall(index))
    }

    void visit_LocalSet(uint32_t local_index) {
        codes.push_back(LocalSet(index))
    }
}

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented May 30, 2022

Most of the code in the commit Define BaseWASMVisitor Class and WATVisitor Child Class has been added using this script. If there are any repetitive changes the same script, hopefully, can be utilized to regenerate the needed/necessary code. Note: after using the script, we need to update table index names as: Mark table indices as src and des in visit_TableCopy().

Please possibly review and please possibly share feedback.

@certik
Copy link
Copy Markdown
Owner

certik commented May 30, 2022

Awesome, this looks good.

Now, why don't we extract BaseWASMVisitor into a standalone header file, let's say wasm_visitor.h, and then completely autogenerate this file using your Python file. This is perfectly in line how we autogenerate asr.h and ast.h, and in general our approach has been that if something can be generated from a simpler / smaller description, we should generate it.

Let's put the Python script in, make it reasonably "maintainable", and we can run it as part of build0.sh.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented May 30, 2022

Awesome, this looks good.

Now, why don't we extract BaseWASMVisitor into a standalone header file, let's say wasm_visitor.h, and then completely autogenerate this file using your Python file. This is perfectly in line how we autogenerate asr.h and ast.h, and in general our approach has been that if something can be generated from a simpler / smaller description, we should generate it.

Let's put the Python script in, make it reasonably "maintainable", and we can run it as part of build0.sh.

Got it. On it.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented Jun 1, 2022

Few instructions in wasm_instructions.txt are commented out (starting with --) and currently not supported by the script as they deal with list/vector. I will be adding support for them in further iterations.

Also, I collected out necessary variables/struct/functions in a new file called wasm_utils.h.

Please possibly review and please possibly share feedback.

@certik
Copy link
Copy Markdown
Owner

certik commented Jun 1, 2022

I think this looks excellent! I think we can probably merge it as is.

@certik
Copy link
Copy Markdown
Owner

certik commented Jun 1, 2022

I added you in, so you should be able to merge it.

@certik
Copy link
Copy Markdown
Owner

certik commented Jun 1, 2022

How did you create the wasm_instructions.txt file?

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented Jun 1, 2022

How did you create the wasm_instructions.txt file?

By (slightly careful) copy-pasting instructions from pages 136-147 of https://webassembly.github.io/spec/core/_download/WebAssembly.pdf.

After that the parameters were modified to possibly add their types. This is mostly done with vscode's multiple elements selection feature (find and replace/select all or using ctrl+d to capture all instances of parameters ending with idx). All indexes are u32 type except for laneidx which is of u8 type.

@ubaidsk
Copy link
Copy Markdown
Collaborator Author

ubaidsk commented Jun 1, 2022

I added you in, so you should be able to merge it.

Yes, I received an invitation over mail. I accepted it. Thank you so much for the access.

@ubaidsk ubaidsk merged commit 188d989 into certik:master Jun 1, 2022
@certik
Copy link
Copy Markdown
Owner

certik commented Jun 1, 2022

Excellent, very cool. Thanks for creating the file, that's a great simple reference and we generate things from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants