Authors: Vaibhav Aparimit & Arun Nalla
Also, compilers have a lot of cool optimizations built in. Imagine you write : s = a + b and a is 0. There is no point of performing this computation. Compilers would just reference b‘s memory wherever s is invoked. If you are not using compiler optimizations, there is a lot of overhead like reserving memory for s, then loading a and b, performing additions. Imagine doing this a million times in a loop. Not cool, not cool at all!
WebAssembly is, well, an assembly-like language that you write for the web. I mean, you don’t write assembly code directly! Instead, it acts as a compilation target for languages like C, C++ and Rust. What this means is that you can take a C++ codebase, compile it to WebAssembly, and run it in the browser at “near-native” speeds.
Who uses WebAssembly
A lot of companies use WebAssembly to power up their front end experience. Notable examples include :
- Google Earth, a large C++ codebase, now runs on the web because of WebAssembly
- AutoCAD ported their 30-year-old codebase to the web using WebAssembly
- Doom3 was ported to the web with WebAssembly
- 1Password used WebAssembly to speed up their plugin
- Figma, a prototyping tool for designers, used WebAssembly to improve load time
How does WebAssembly work
A browser can run on a number of different processors from desktop computers to smartphones and tablets. Distributing a compiled version of the WebAssembly code for each potential processor would be a pretty bad strategy.
Instead, what happens is that the high level code written in C++ is converted into an intermediate representation (IR), also known as the WebAssembly binary. This part of the compiler is known as the frontend.
The bytecode in the Wasm binary isn’t machine code yet. It’s a set of virtual instructions that browsers that support WebAssembly understand. When the wasm binary is loaded into a browser that supports WebAssembly, the binary is compiled into the machine code of the device the browser is running on.
By the way, some of you might have realised that this is the LLVM way of doing things, and different from the gcc/clang way. That’s so astute of you as Emscripten (the most matured toolkit in wasm world) is exactly based on LLVM architecture. LLVM itself was inspired by the awesomeness of the Java IR i.e. bytecodes. Ah! the beauty, when ideas cross-pollinate ❤️.
Our own WASM experiment in the NetOpt pod
The NetOpt pod within Locus is building the next generation intelligent supply chain optimization product that helps plan flow and inventory of products in a complex network of factories, container ships, warehouses and retailers, around the world, without any human intervention.
The NetOpt pod has use cases that entail large CSV (close to a million records) upload and validations. We did a small PoC and achieved CSV upload + validation of the said scale in 16 sec. Yup, 16 seconds. All on the browser!
We also didn’t use faster C++ regex or faster parsers or C++ multithreading or web workers, or else we might have brought around significant time reduction.
- Convert CSV to json, serialise json and pass to WebAssembly: We chose this approach because it seemed a simple enough approach. This approach delegated the entire validation and parsing of data to C++, which resulted in better loosely coupled systems. Also, there were a ton of libraries like Papa Parse already available that helped us implement this approach.
So, the final WASM approach we used had the following steps :
- User uploads data on the UI
- Convert CSV to JSON
- Serialize JSON and pass it to WASM
- Deserialize JSON in WASM
- Run all validations. Validation logic was written C++
- WASM returns data after validating, classifying into VALID, TEMPORARY, INVALID
One ‘gotcha’ you ought to internalize before embarking on your WASM journey is that the time taken by WASM varies based on the client hardware. Consequently, your performance numbers can never be totally deterministic. So, it’s best to benchmark your WASM application’s performance across a representative sample of your customer’s machines.
The other thing to keep in mind is the optimization flag. LLVM/Clang has a lot of cool optimizations built in. An important reason why we got so much performance improvement was because our C++ code used regex which basically can generate thousands of lines of assembly code without optimizations. So, LLVM optimizations helped a lot here.
Overall WASM gave us great offline capabilities. No need for any backend API calls.
Ultimately in this case, we did not use WASM for large CSV validations. This was only because we had to also persist the invalidated entities. We had initially thought of maintaining the dirty, invalidated entities in indexed db but that would have resulted in unintended side effects like someone clearing their local storage and losing the data.
Having said that, we thoroughly enjoyed our WebAssembly journey and have identified areas in the product where WebAssembly can dramatically improve our frontend performance.
As part of our initial research, we were surprised to learn that not many tech companies have adopted WASM as part of their front end stack and through this blog we wanted to evangelize this tech to the dev community.
One last thing, we are always looking for great engineering talent to join us. Do check out our careers page. We would love to hear your story.
Stay Tuned for More Updates!