I wrote the original version of this module on the assumption that NodeJS would act as the host environment.
This was all fine and dandy — everything worked as expected and also functioned correctly when invoked from wasmtime.
However, when I attempted to run the program from wasmer, it generated a nonsense hash value... 🤔
After some investigation it turned out that wasmer's implementation of the WASI interface to the fd_read function contained an unexpected difference.
This update accounts for that difference and with the extra SIMD acceleration coding, yields an optimized binary of only 3.2Kb (😎)
If you simply want to run this app from the published package then, assuming you have already installed wasmer, use the command:
Set the --command-name argument to sha256
wasmer run chriswhealy/sha256 --mapdir <guest_dir>::<host_dir> --command-name=sha256 <host_dir>/<some_file_name>The module name remains the same, but change the value of the --command-name argument to sha224
wasmer run chriswhealy/sha256 --mapdir <guest_dir>::<host_dir> --command-name=sha224 <host_dir>/<some_file_name>In order for the sha256 module to have access to your local file system, the host environment must pre-open the relevant files or directories on behalf of the WASM module where:
<guest_dir>is the virtual directory name used by WebAssembly, and<host_dir>is the name of actual directory in your file system
For example, let's say you have a copy of "War and Peace" in your home directory and you want to calculate this file's 256-bit hash:
wasmer run chriswhealy/sha256 --mapdir /::/Users/chris --command-name=sha256 war_and_peace.txt
11a5e2565ce9b182b980aff48ed1bb33d1278bbd77ee4f506729d0272cc7c6f7 war_and_peace.txtor the 224-bit hash:
wasmer run chriswhealy/sha256 --mapdir /::/Users/chris --command-name=sha224 war_and_peace.txt
93df4316673fc8ca9d9ab46e5804eb0101ac5bf89b15129999586f25 war_and_peace.txtThe functionality of the SHA2 algorithm is split into two distinct phases:
- Phase one
Take 64 bytes from the input file and from them, generate a 256 byte message digest - Phase two
Process the message digest to update 8, 32-bit hash values
These two phases must be repeated in strict sequential order until the entire input file has been consumed.
However, using SIMD instructions, it is possible to accelerate the processing in phase 1 by generating 4 message blocks at once.
The updated version of the code now operates like this:
- If there are at least 256 bytes remaining in the file:
- Phase one
Distribute the first 64 bytes into lane 0 of 16i32x4SIMD vectors, then the next 64 bytes into lane 1, the next 64 bytes into lane 2, etc.
From these 16i32x4SIMD vectors, generate 4, 256-byte message digests concurrently. - Phase two
Process the 4 message digests sequentially, updating the 8, 32-bit hash values.
- Phase one
- If there are fewer than 256 bytes remaining in the file, use the original sequential implementation.
Install NodeJS plus one or more of these WebAssembly Host environments:
- Wasmer: https://docs.wasmer.io/runtime
- Wasmtime: https://wasmtime.dev/
- Wazero: https://wazero.io/
If you wish to run this app locally, clone the repo into some local directory, change into that directory, then:
$ npm run build:prod
> wasm_sha256@2.4.1 build:prod
> npm run compile:prod && npm run opt:prod
> wasm_sha256@2.4.1 compile:prod
> ./utils/strip_debug.mjs && wat2wasm ./src/sha256.prod.wat -o ./bin/sha256.prod.wasm
> wasm_sha256@2.4.1 opt:prod
> wasm-opt ./bin/sha256.prod.wasm --enable-simd --enable-multivalue --enable-bulk-memory -O4 -o ./bin/sha256.prod.opt.wasmA WASM module only has access to the files or directories preopened for it by the host environment. This means that when invoking the WASM module, we must instruct the host environment which files or directories need to be preopened.
The syntax for specifying such preopened resources varies between the different runtimes.
The JavaScript module invoked by NodeJS does not use very sophisticated logic for determining the location of the target file.
Instead, it assumes the current working directory is the one containing sha256sum.mjs and the WASI instance then preopens process.cwd().
This means the target file must live in (or beneath) that directory.
By default, ./sha256sum.mjs runs the prod version of the WebAssembly module.
256-bit Hash
$ ./sha256sum.mjs sha256 ./tests/war_and_peace.txt
11a5e2565ce9b182b980aff48ed1bb33d1278bbd77ee4f506729d0272cc7c6f7 ./tests/war_and_peace.txt224-bit Hash
$ ./sha256sum.mjs sha224 ./tests/war_and_peace.txt
93df4316673fc8ca9d9ab46e5804eb0101ac5bf89b15129999586f25 ./tests/war_and_peace.txtIf present in the CWD, wasmer will read wasmer.toml to discover which WASM module is to be run.
In such cases, you need only specify wasmer run . where the meaning of . will be derived from the contents of wasmer.toml.
wasmer, has both a --dir and a --mapdir argument, but you should always use the --mapdir argument.
See below for why this is the case.
The value passed to the --mapdir argument is in the form <guest_dir>::<host_dir>.
IMPORTANT
You cannot specify shortcuts such . as the value of the <guest_dir>, nor ~ as the value of the <host_dir>.
Such shortcuts are only replaced by the shell, not wasmer.
Since <guest_dir> identifies the name of the WebAssembly module's virtual root directory, you would typically identify this as /.
For the <host_dir>, wasmer does not evaluate the shell shortcut to your home directory (~).
Instead, to grant access to your home directory, use the fully qualifiied path name.
E.G. /Users/chris/.
In this example, the CWD contains the directory ./tests which then contains war_and_peace.txt.
Since ./tests becomes WASM's virtual root directory, the file name war_and_peace.txt does not need to be prefixed with the directory name.
256-bit Hash
$ wasmer run . --mapdir /::./tests --command-name=sha256 -- war_and_peace.txt
11a5e2565ce9b182b980aff48ed1bb33d1278bbd77ee4f506729d0272cc7c6f7 war_and_peace.txt224-bit Hash
$ wasmer run . --mapdir /::./tests --command-name=sha224 -- war_and_peace.txt
93df4316673fc8ca9d9ab46e5804eb0101ac5bf89b15129999586f25 war_and_peace.txtThe same logic used by wasmer applies when wasmtime creates WASM's virtual root directory.
In this example, the --dir <host_dir> argument uses ./tests as the virtual root and from within WASM, / is implied.
256-bit Hash
$ wasmtime --dir ./tests ./bin/sha256.prod.opt.wasm -- sha256 war_and_peace.txt
11a5e2565ce9b182b980aff48ed1bb33d1278bbd77ee4f506729d0272cc7c6f7 war_and_peace.txt224-bit Hash
$ wasmtime --dir ./tests ./bin/sha256.prod.opt.wasm -- sha224 war_and_peace.txt
93df4316673fc8ca9d9ab46e5804eb0101ac5bf89b15129999586f25 war_and_peace.txtWhen using wazero, the --mount argument uses a syntax similar to wasmer's --mapdir argument.
256-bit Hash
$ wazero run -mount=.:. ./bin/sha256.prod.opt.wasm sha256 ./tests/war_and_peace.txt
11a5e2565ce9b182b980aff48ed1bb33d1278bbd77ee4f506729d0272cc7c6f7 ./tests/war_and_peace.txt224-bit Hash
$ wazero run -mount=.:. ./bin/sha256.prod.opt.wasm sha224 ./tests/war_and_peace.txt
93df4316673fc8ca9d9ab46e5804eb0101ac5bf89b15129999586f25 war_and_peace.txt-
NodeJS passes a minimum of two values as command line arguments to the WASM module, but host environments such as
wasmerorwasmtimepass a minimum of one.This program therefore assumes that the algorithm name (
"sha256"|"sha224") will be the second last argument, and the filename will be the last. -
When calling this module via the Wasmer CLI, the
--dirargument does not pre-open the directory in which the target files live. See here for an explanation of this behaviour.Instead, you need to use the
--mapdirargument. -
When calling
fd_read, some WebAssembly host environments such as NodeJS orWasmtimeallow you to specify a buffer size up to 4Gb. This means that the entire file will be returned in a single call tofd_read.However,
wasmerimposes a 2Mb upper limit on the buffer size.1 Therefore, in order to read files larger than 2Mb, multiple calls tofd_readare required.
Any calls to functions such as $hexdump, $write_msg or $write_step etc are delimited by the preprocessor markers ;;@debug-start and ;;@debug-end.
To compile for production, such function calls can be removed from the source code by first running ./utils/strip-debug.mjs.
This then produces a "production" version of the WAT source code (./src/sha256.prod.wat) from which these delimiters and all the coding between them has been removed.
In order to understand the inner workings of the SHA256 algorithm itself, take a look at this excellent SHA256 Algorithm website. Thanks @manceraio!
The SHA224 algorithm performs exactly the same calculation as the SHA256 algorithm, but is uses different starting values for the eight internal hash values, and then when the calculation has finished, prints out only the first 7 hash values, not all 8.
An explanation of how this updated version has been implemented can be found here
1) I have only tested this on macOS