From the Labs: Handling Blockchain State

Every blockchain has some state associated with it. That state can be thought of as a collection of key/value pairs. This state might represent something simple, like how many tokens each account has. The state might represent something more complicated like the code of a smart contract. No matter what this state represents, everyone on the blockchain has to reach agreement. If I thought that you had ten tokens, but you thought that you had a million tokens, we obviously couldn’t get much done until we figured out which of us was right.

Unfortunately reaching agreement about the state is quite a big problem. One issue is how to set things up so that bad participants can never trick trustworthy participants into accepting a bad state. The other issue is that the state is very large. How can participants reach agreement quickly if even transmitting the state takes a long time? In this article we will talk about one possible solution to the latter issue, the Merkle trie.

What is a Merkle Trie?

A Merkle trie is a data structure that combines the features of two other data structures. The first is a Merkle tree and the second is a radix trie.

What is a Merkle Tree?

A Merkle tree is a special type of tree. The special feature of Merkle trees is that they generate “fingerprints” for each part of the tree, as well as the whole tree. Just like a real fingerprint, the ones generated by the tree are fairly unique. Whenever you change any data in the tree, the fingerprint changes as well.

This provides us with a solution to the problem of communicating what each participant thinks the state currently is. You can take the fingerprint that represents the entire tree and send it to other blockchain participants. If everyone has generated the same fingerprint for the state data, then they must have the same state. If the fingerprints are different then the state must have been different.

What is a Radix Trie?

A radix trie is a special type of trie (a trie is a special type of tree) that makes storing and retrieving key/value pairs easy. A Merkle tree is really good at allowing you to prove that some data is part of the tree, but it does not make it easy to look up the values associated with keys, so we combine the radix trie and the Merkle tree to get something that can do both.

How does it all work?

The exact details of how a Merkle trie works can vary as there are many different ways to implement similar functionality. Trying to explain all of these different ways is beyond the scope of this article, so the rest of this article will describe just one particular implementation.

It’s nodes all the way down

The Merkle trie is made up of just one object, nodes. Each node is a little chunk of data that is stored within the trie. The data each node contains is key information, value information (if one exists for the node’s key), and the fingerprints of any children nodes.

A visual representation of a node that will be used in future diagrams.

The nodes directly “under” a node are called children nodes. The node “above” a node is called the parent node. Each node has just one parent, but can have many children. There is one special node in the trie that has no parent. This is the node at the “top” of the trie and it is called the root node.

A visual representation of a Merkle trie. The node at the top is the root node. In this example, it has two children, marked Node A and Node B. Node A has two children, while Node B has three children. The parent of Node C is Node B. The parent of Node B is the root node. [Note: In this discussion the maximum number of children will be 4 to make the diagrams simpler. In reality most implementations would choose a larger number of children.]
Generating fingerprints

How do you use a Merkle trie to generate the fingerprint of each node? The most common way is to use a cryptographic hash function. A lot of different cryptographic hash functions exist, but they all share two features important to Merkle tries.

The first feature of a hash function is that it takes in data as input and outputs a number of fixed size that represents that data. Usually the number that is output is significantly smaller in size than the input data. For example, SHA-256, a popular hash function, takes inputs of any size and generates an output number 256 bits long. The function is very sensitive, so small changes in the input will lead to large differences in the output.

An example of hashing two inputs. In this case, the 400+ character long texts get turned into 64 characters. The difference between the first input and the second is only the first letter, L to M, but the SHA-256 outputs are completely different.

The number that the hash function outputs is the fingerprint that we want. For each node, we apply the hash function to the data in the node and use the generated number as the node’s fingerprint. The fingerprints of the node’s children are part of a node’s data, so the fingerprint for a node also covers all of the data in the trie that is under the node. We can now represent all of the data in the trie with the fingerprint of the root node.

The second feature of hash functions is that it needs to be very difficult to “reverse” the function, i.e. given some output number, find any inputs that would generate that number. If the hash function were easy to reverse, someone could find different state data that generated the same number. They could use this to trick other participants and cause problems. Luckily, this is not easy to do, which allows blockchain participants to trust that if they generated the same fingerprint as other participants, they have the same data.

One issue with cryptographic hash functions is that they tend to be somewhat slow to calculate. On one hand, this slowness is a security feature. If you wanted to create a fake state with the same hash function output, you could repeatedly try different inputs until you found something that worked. The slower it is to execute the hash function, the longer that search will take. On the other hand, slow calculation speed can make using the Merkle trie slower since it makes a lot of calls to the hash function in normal usage.

Finding a key/value pair

Recall that the data in this Merkle trie is the current blockchain state and that the state is made up of key/value pairs. One of the common actions that you have to do is look up the value associated with a key. This is done by turning the key into a path through the trie. By starting at the Root node and then following this key path, you will end up at the node that stores the value you want.

Turning the key into a path

Somehow we need to go from a key to a set of steps to take through the trie. So we need to break the key up into smaller parts and each part will determine our next step. To make this process easier, let’s represent keys as binary data (a number made up of just 0 and 1). This makes it easy to break up the key into smaller chunks and then convert the chunks into numbers. These numbers tell us which node to visit next.

In this implementation, let us limit each node to having at most 4 children. If we split up the key into pairs of bits, then each pair will tell us which of the 4 children to visit next. This correspondence between bits and child nodes is possible because there are the same number of both. The 4 different possible pairs of bits [00, 01, 10, 11] correspond to the 4 different child nodes [0, 1, 2, 3].

The four children of a node with their child number and corresponding bits in brackets.

For example, in the diagram below, you have key 110110. First you would break it up into pairs 11|01|10. Turning those into child numbers would be 3|1|2. All paths start at the root node, so to follow the path to the node with key 110110 we begin at the root node. Then you would move to child 3 of the root. Then you would move to child 1 of that node. Finally you would move to child 2 of that node. That last node that would then contain the value for key 110110.

Visualization of the search for key 110110. The green nodes represent the nodes that were visited during the search and the blue node is the node containing the value for key 110110.
Compressing the path

If you construct the key path exactly as described above, you end up creating a lot of empty nodes. These empty nodes are created because every pair of bits corresponds to a node. If the key had 1000 bit pairs, the path would have 1000 nodes in it. This slows down the search since each of the 1000 nodes would be visited before finding the node with the value we wanted.

To prevent searches from being slow, we can compress the extra nodes into a single node. For each node that gets compressed, we record which child number it was. Then we put that list of numbers into the combined node. While traveling along a key path, if the current node has a compressed path that matches the current key, you can skip ahead.

Visualization of path compression. The green nodes have the key path [child 2 -> child 1 -> child 3]. These 3 nodes can be combined into the blue node. Since the path that was combined was [child 2 -> child 1 -> child 3], this gets recorded in the blue node as 2->1->3.

With compressed paths in the mix, searching changes a little. If you come across a node with a compressed path during a search, then for each remaining step in the key path, check if it matches the next step from the compressed path. If they match, then move onto the next step in the key path and the compressed path. If they don’t then the key you were looking for wasn’t in the trie.

For example, let’s say you have been traveling along a key path and the node we have reached has a compressed path of [3->2] and the remaining steps of the key path are [child 3 -> child 2 -> child 0]. In this scenario, the next step of the key path is [child 3] and the next step of the compressed path is [child 3]. These match, so we move onto the next step. The next step from the key path is [child 2] and the next step of the compressed path is [child 2], these match so again we move onto the next step. The next step of the key path is [child 0] and the compressed path has no more steps, so we go to child 0 of the current node and we are done.

Visualization of searching when there is path compression. The blue node has the value for key 1110011101. The path for that key is [root -> child 3 -> child 2 -> child 1 -> child 3 -> child 1]. We begin following that key path by going from the root to child 3 of the root, the orange node. The orange node has a compressed path of [2->1->3] which means the path of [child 2 -> child 1 -> child 3] has been compressed into it. The next three steps of the key path match the three steps of the compressed path of the node. The three matching steps “cancel out” and we are left with the last step of [child 1]. Going from the orange node -> child 1, takes us to the blue node, which is the node that holds the value we were looking for.
Inserting a key/value pair

Inserting a key/value pair into the trie is very similar to searching for an existing key. The main difference is the key you are looking for might not be in the trie. While following the key path, one of several situations will occur. You will either find the node that corresponds to that key, run out of nodes to search, or you hit a node that has a compressed path that does not match the key path.

  • Scenario 1: Find a node corresponding to the key
    If there is a node with the key in the trie, then it will be at the end of the key path. Once we have found that node, simply add the new value to it.
  • Scenario 2: Ran out of nodes while following the key path
    When there is no node that corresponds to the key, we will get to a step in the key path that we can’t take since no node exists. In this case, we simply create a new node and add it as a child of that last node that did exist along the key path.
An illustration of what the trie looks like in scenarios 1 and 2

Scenario 3: Conflicting Compressed Path Found
While traversing the key path, you may hit a node that has a compressed path that does not match the key being inserted. This means that a new node that can allow the path to “branch” needs to be inserted along with the new key/value pair node. The entire process follows these steps:

​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​1. Create a new “Branch” node. Its compressed path will be equal to​ any common path steps between the new key path and the existing ​node’s compressed path.

​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​2. Move the existing node to be a child of the “Branch” node

​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​​ ​ ​​3. Create a new node for the key/value pair and add that as a child of the “Branch” node

An illustration of what the trie looks like in scenario 3 before and after adding a branch node. The orange node is the new branch node which has the red existing node and the blue new node as children.
Sharing the state

When someone joins the blockchain for the first time, they don’t have any of the blockchain’s state. Without that state, they won’t be able to interpret and verify the transactions that occur. One way to get the state is to start at the very beginning of the blockchain and then play through every transaction that has ever occurred. If the blockchain has existed for a while, this might take a really long time.

An alternative is to request the state from other blockchain participants. Unfortunately in a blockchain, you cannot trust the other participants. The Merkle trie helps solve this trust problem as well. The Merkle trie can be used to “prove” that a key/value pair is in it. By sending along a proof for the key/value, it is possible to be sure that the data is accurate and trustworthy.

Proving the value of a Single Key

Given a Merkle Trie, a proof that a given key/value pair is in the trie can be formed by providing all of the data in the nodes along its key path.

Just like in earlier diagrams, the green nodes are the nodes along the key path. All of the information in these green nodes, plus the blue node, act as the proof.

The data from these nodes act as a proof of the blue node’s value. To verify the proof, you insert all of that data into an empty trie. That trie’s root will have some fingerprint. If that fingerprint matches the one that all of the blockchain participants have agreed upon, then the proof is considered verified. It is very difficult to generate fake data that will produce the same fingerprint, so this proof allows us to be confident that the key/value pair that was sent really is part of the blockchain’s state.

Proving values for a Range of Keys

We need to get all of the state data, but it would be inefficient to have to generate and send a proof for every pair. Instead, we can extend the proof concept to a proving a range of key/value pairs.

This range proof consists of three parts. The first is the proof of a starting key, the second is a proof of the ending key. These work just as described above, by sending all the nodes on the path to finding the starting and ending keys. The third and final part is all of the key/value pairs that fall between the start and end. Since only the key/value pairs are sent for the nodes between the state and end, this proof is much smaller than constructing proofs for each key/value pair individually. This reduced size allows for more efficient transmission of the data within the trie.

In this diagram, the green nodes represent the proof path of the starting key, the red nodes represent the proof path of the ending key, and the orange nodes are all of nodes whose key/value pairs would be included in the proof. All of the data in the green and red nodes gets sent, but only the key and value of the orange nodes gets sent.

Similar to the verification of the single key/value pair proof, the data from the start proof nodes, end proof nodes, and all of the key/value pairs can be inserted into an empty trie. If the fingerprint of the root node of this trie matches the agreed upon fingerprint, then all of the key/value pairs sent in the proof are considered valid state data.

Conclusion

With a Merkle trie you can insert/retrieve key/values pairs, reach agreement about the blockchain’s state, and share proofs of the state with other participants. These features cover a large portion of the functionality required to run a trustless blockchain. This is still an active area of research though, so maybe you can come up with something even better!

Where to go from here?

If you are still curious, there are a lot of options for further reading. If you are code inclined, a go implementation of a Merkle Trie similar to the one described above can be viewed here. If you are curious about other implementations of Merkle tries, check out this wiki entry on Ethereum’s Merkle implementation. If you are more academically inclined, this paper on polynomial commitments provides an interesting alternative for generating the node fingerprints and proofs of the blockchain state.

SHARE //
NEXT UP//
Enterprise

Avalanche to Power SI Tickets’ NFT Platform, Box Office

NFT

Looty and Inspect Launch Loyalty Platform with Loot Crate Rewards, Boosting Avalanche NFT Season

Platform

Durango: Avalanche Warp Messaging Comes to the EVM

Community

Avalanche DeFi Saga with Rep3

Institutions

Citi Tests Benefits of Private Markets Tokenization With Avalanche Evergreen Subnet ‘Spruce’

Enterprise

Avalanche Named Exclusive Sponsor of Collider on the Lot Startup Accelerator

Platform

Cortina: X-Chain Linearization

Avalanche Watch: January 2024

Gaming

Owned Blends SocialFi and Gaming on Avalanche with Battle Tech

Enterprise

The Empire State Building Launches NFT Loyalty Program on Avalanche Using Uptop

Institutions

Intain Launches Avalanche Subnet to Usher in New Era for Multi-Trillion Dollar Securitized Finance Market

Institutions

How Avalanche Uses Account Abstraction to Improve the Web3 Experience for Institutions

Institutions

Institutional Products, Pilots Signal Growing Interest in Tokenization

Enterprise

Edgevana to Provide Infrastructure to Avalanche Network, Expanding Validator Decentralization

Community

Avalanche Foundation: Eligibility Criteria Framework for Community Coins

Institutions

South Korean Digital Asset Custodian BDACS to Support Avalanche

DEFI

Struct Finance Joins Avalanche Rush with an Incentive Program of up to $1M

DEFI

Fonbnk Builds Avalanche On-Ramp for Cross-Border Payments in Emerging Markets

Platform

Avalanche Wallet Phase-Out Guide

Platform

Avalanche Watch: December Edition

Institutions

What is Asset Tokenization: Why & Why Now?

DEFI

Sub-Saharan Africa: A Land Of DeFi Opportunity

NFT

NFT-TiX Migrates to Avalanche and Announces Global Festival Partnerships

Enterprise

AR Platform Really to Upgrade Entertainment Using Avalanche

Gaming

Tiltyard Gives Web3 Games Tournaments and Fantasy Sports Features

DEFI

Hubble Exchange Launches Order Book DEX Built on a Custom Avalanche Subnet

Platform

Avalanche Watch: November Edition

Developers

NodeKit Raises a $1.2M Pre-Seed Round to Build a Shared Sequencer L1 with HyperSDK

Gaming

Mirai Labs Blends SocialFi and Web3 Gaming, Migrates to an Avalanche Subnet

Institutions

Republic Selects Avalanche for its Profit-Sharing Note, Gains Vista Support

Platform

Avalanche Watch: October Edition

Institutions

Avalanche Supports Citi FX Solution Under Project Guardian

Institutions

Onyx by J.P. Morgan Leverages Avalanche To Explore a New Paradigm for Portfolio Management

Platform

Avalanche Foundation Mission Statement and Reminder About System Unlocks

Developers

Web2 and Web3 Leaders Launch Codebase by Avalanche, an Accelerator Supporting Early-Stage Avalanche Projects

DEFI

Avalanche Expands Forex Market in Africa With Canza Finance’s Baki Launch

Platform

Avalanche Powers Metaverse Experience at Hong Kong FinTech Week

Gaming

5 Great Web3 Games Coming Soon

Institutions

Beneath the Surface: The Infrastructure Driving Tokenization Forward

Developers

Avalanche is Advancing Off-Chain Computation Services for Developers

Enterprise

Neal Stephenson’s LAMINA1 to Reimagine the Open Metaverse with New Layer 1 Built on Avalanche

Ava Labs Accelerates Push in India with Key Senior Hires

Platform

CCRI Finds Avalanche Emits 12x Less CO2 Than Ethereum, 300,000x Less CO2 Than Bitcoin

DEFI

The Tangible Benefits of Bringing Non-USD Stablecoins and FX On-Chain

Institutions

On-Chain Cash Management: Thematic Takeaways From Partner Discussions

Platform

7 Avalanche Use Cases

NFT

Builder Spotlight: Zeroone is Rethinking How NFT Enjoyers Create and Collect

Institutions

Navigating Tokenized Asset Investments for Institutional Buyers

NFT

Blockticity Mints $275M in Hemp and Other Product Certifications on Avalanche, Disrupting a $4.5T Industry

Developers

Introducing Firewood: A Next-Generation Database Built for High-Throughput Blockchains

Community

Builder Spotlight: Steven Gates Wants to Bring Subnets to the World

Community

Avalanche Foundation Launches Ted Yin Grant Program to Expand Open Source Technology Development

Enterprise

Korean Entertainment Giant Powers Ticketing Platform with Avalanche

Developers

Developer Spotlight: NodeKit’s Cofounder on Why Avalanche is “Unrivaled”

Platform

New Avalanche C-Chain Explorer Launches

Developers

Time to Finality (TTF): The Ultimate Metric for Blockchain Speed

Platform

Avalanche Watch: August Edition

Developers

Movement Labs Raises Pre-Seed Round, Launches Movement SDK To Reignite Web3’s Interoperable Future

Gaming

Builder Spotlight: Kam Punia’s Quest to Level Up Web3 RPGs

DEFI

Multiswap Launches with Plan to Unlock Swaps of 300+ Assets in a Single Transaction on Avalanche

Gaming

Korean Game Publisher Neowiz and Ava Labs Form Partnership

NFT

NFT Marketplace Hyperspace Makes EVM Debut on Avalanche

Enterprise

PlayThink and Loyalty Marketing Announce Plans for Web3 Ecosystem for 100M Japanese Users on Avalanche

NFT

Artist Spotlight: Three Creators Drawing the NFT Future

NFT

Artist Spotlight: Creators Expanding NFT Possibilities

DEFI

Dexalot Subnet Earns Avalanche Multiverse Incentives of up to $3 Million

Developers

Elastic Era: Considerations for Subnet Builders

Developers

Sigma Prime to Expand Security Tooling for Avalanche

Platform

Avalanche Watch: July Edition

Platform

Builder Spotlight: Blockpay Aims to Bring Blockchain Payments to the Masses

Platform

Avalanche Watch: April Edition

Core Mobile Now Supports iOS Devices

NFT

Momentum Accelerates in Avalanche NFTs

Platform

Ava Labs Announces AvaCloud: Empowering Businesses to Launch Custom, Fully Managed Blockchains in Minutes

Developers

Bware Labs’ Blast API uses Avalanche Network to Deploy Their Staking Protocol

Gaming

Avalanche Arcad3 Powers Up Web2 Gaming Studios, Arming Them to Thrive in Web3

DEFI

Struct Finance Launches Interest Rate Products for Institutions and Retail on Avalanche

NFT

Artist Roundtable: Mentors Changing the NFT World

DEFI

Uniswap, AMM Pioneer and the Largest DEX, Launches on Avalanche

NFT

Global Artist Network HUG Embraces Avalanche Foundation

NFT

Artist Spotlight: The Next Generation of NFT Photographers

DEFI

Balancer Deploys on Avalanche to Fuel Liquid Staking Growth and New DeFi Opportunities

NFT

Artist Spotlight: The Evolution of Music NFTs

Gaming

Solert Games Launches Dedicated Subnet, Debuts ‘Legends At War’ On Avalanche

Institutions

Avalanche Foundation Launches Avalanche Vista, a $50M Initiative to Pioneer the Future of Asset Tokenization

Developers

Builder Spotlight: Eric Forgy Unveils CavalRe’s Plan to Architect the Future of Capital Markets

Platform

Avalanche Watch: June Edition

Ava Labs and AWS Bring Scalable Blockchain Solutions to Enterprises and Governments

Education

Coinbase Users Now Have the Fastest Path from Cash to DeFi with Native USDC on Avalanche

Millions of Shopify Merchants Can Now Use Avalanche NFTs through Venly

Gaming

The International Chess Federation Brings Chess into Web3 on Avalanche

Artist Spotlight: Celine Manetta Helps Airbus Celebrate with Nearly 130,000 Employees

Platform

Avalanche Warp Messaging (AWM) Launches with the First Native Subnet-to-Subnet Message on Avalanche Mainnet

NFT

Numbers Protocol: Ushering in a New Era of Digital Media Trust with Subnets

Platform

Alibaba Cloud Expands to Support Validators and Infrastructure on Avalanche Public Blockchain

Platform

Avalanche Explorer Gets a Brand New Upgrade for All P-Chain Data

Developers

Developer Spotlight: Usman Asim has Big Plans for Rust and ZKs on Avalanche

Gaming

Japanese Media Giant GREE to Build Web3 Games and Run Nodes on Avalanche

Education

Robinhood to Educate Millions of Users on All Things Avalanche

Gaming

Monsterra, P2E Game with 50M Battles Fought in First 3 Months, Expands to Avalanche