One Metadata to rule them all - Or how to implement offline signers

The Bybit "Hack" made it very clear again why blind-signing of transactions on offline signers is a very, very, very bad idea. In the Polkadot ecosystem we have implemented a solution that enables offline signers to decode transactions into some human-readable form leveraging the runtime metadata. Let's dive a little bit into how the Polkadot ecosystem achieves human-readable transactions.

What is the Metadata?

Polkadot provides a SDK for implementing parachains/rollups. The SDK separates the business logic (the runtime) from the node side. The runtime, aka the state transition function, is the defining part of your chain. As the functionality on every chain is different, these runtimes also look different. They expose different transactions, events, and storage items, and use different types for balances. To enable DApp builders to interact with all these different chains, the runtime exposes metadata about itself. The metadata exposes so much information that you can even feed an AI with it to auto-generate UIs and interactions with the chain (writing intents in human-readable format and letting the AI overlord do its job).

How to decode transactions on offline signers with Metadata?

One of the biggest problems of the metadata is its size. It grows easily to multiple megabytes per chain. In contrast, you have offline signers like Ledger that have only a few kilobytes of main memory. So, they could never load the entire metadata to decode a transaction. Other solutions like Polkadot Vault require holding your old phone steady for several minutes to transfer the entire metadata to the device via QR codes. It works and they can decode the transactions this way, but the UX of transferring these metadata updates to the device is not the best.

Since the metadata size is too big, it needed to lose some weight. So, all the information about storage entries, events, etc., was stripped to reduce the size of the metadata. As a next step, the metadata was chunked at the transaction level. The idea behind this is that you don't need the metadata for transaction X when you want to decode transaction Y. The chunks represent the individual types required to decode a transaction. The chunking also enables the possibility to stream these chunks piece by piece to decode the transaction, show it to the user, and then repeat until the entire transaction has been decoded and shown to the user. This is very important, as batch transactions in Polkadot can get quite big.

Secure offline signing

Now you may want to ask: how does the offline device ensure that the chunks it receives from the online wallet are correct? If the offline device cannot ensure that the chunks are correct, the online wallet can construct chunks that cause the offline device to decode a transaction as Y for the user, but on-chain it will be executed as X. This is obviously a huge security hole and is basically what happened to the Bybit guys.

To solve this issue, there needs to exist a trustless way to verify that the metadata chunks presented by the online wallet to the offline signer are indeed correct. These checks need to happen on-chain before executing a transaction. We have solved this by putting all the metadata chunks into one Merkle tree. Thus, the entire metadata required to decode all the possible transactions of a runtime can be represented by one hash (the metadata root hash). The offline signers require the online wallet to send inclusion proofs of each chunk to them and will verify that these proofs and chunks are correct. The offline signer includes this metadata root hash in the signed data of the transaction. So, the offline signer not only signs the transaction itself but also the metadata hash. Before applying such a signed transaction, it will be verified that the metadata root hash (computed at compile time for each runtime) matches. Otherwise, the transaction will be rejected by the runtime. This still means that an offline signer could be tricked into showing some invalid data (as the online wallet still sends the hash and all the proofs), but it would be rejected on-chain. The runtime would ensure that the user is not tricked into executing something different from what they approved.

All that is required is that the user trusts the offline signer device. If the device does not check the proofs correctly, the user could still be tricked into signing operations that are different from what they intended.

Can you be more technical?

A more technical explanation of how it works can be found in RFC#78, which specifies how the technical implementation should look.

Other implementations

Other ecosystems also provide solutions to the problem of blind signing. I'm aware of the following two:

Cosmos approached the problem of blind-signing with a textual representation of transactions. They have standardized some pretty-printing functions for their transactions. The downside of this solution is that every node on the network, when applying such a transaction, will need to create the textual representation. This is quite wasteful when you take into account the number of nodes in a blockchain network, all of which are required to redo this operation.
Sovereign SDK implemented a solution based on what we have implemented for the Polkadot ecosystem.

Future work

While the current implementation enables users to avoid blind-signing a transaction on their offline signer, the shown transaction is not really understandable by my grandmother. The information shown by the offline signer uses the same naming as the on-chain functions and doesn't provide any contextual information. So, a user doesn't know if a type is, for example, a balance or just some other integer value. It requires some technical understanding to be able to read these transactions. A future iteration that is already being discussed will include contextual information. This way, the developer can express to the offline signer that parameter_a is a Balance and parameter_b is something else.

After adding contextual information, it would be nice to attach some text for the user that explains what a function is doing. This already kind of exists today, because documentation can be attached. However, the documentation right now is developer-focused and not tailored for end users.

Recap

In the Polkadot ecosystem, we have implemented a solution that leverages the metadata of a runtime to decode transactions on offline devices. Alongside the generic decoding of transactions, we also introduced a way to prove that individual chunks of the metadata actually belong to the metadata of this particular runtime. The runtime then checks on-chain, before applying a transaction, that the transaction was built with the correct metadata. This solution also enabled the Polkadot ecosystem to have one unified Ledger app for all the different chains.