How to Approach True Finality For Block DAG?

Importance of Finality

Most blockchain topologies today have probabilistic finality, in that you are never 100% sure whether or not the transactions are truly set in stone. For example, in BTC, you get an exponential falloff in the probability that an attacker will be able to catch up to the rest of the network and reorganize the blocks as time (or in the world of BTC, # of blocks) goes forward. This exponential risk falloff gave rise to the “6 block” rule of a thumb whereby once you’ve observed 5 blocks built on top of the block containing your transaction, making your transaction “6 blocks deep”, then it is statistically highly unlikely that your transaction would be part of some reordering attack.

In many cases, having a probabilistic finality is fine. But in cases where you’re executing a large number of transactions that are dependent on previous outcomes, or if you’re executing one extremely valuable transaction where you need to be absolutely sure that the transaction won’t be reversed, then finality becomes critically important.

In its native state, the ordering mechanism in Taraxa’s Block DAG is also one that provides probabilistic finality. To achieve true finality, we’ll need something extra.

What does Finality Mean in Taraxa?

Recall that reordering risk from our earlier ordering mechanism is primarily incurred when the anchor chain changes.

So as long as we can make sure that, periodically, an anchor chain can be nailed down, then the order for the blocks along that anchor chain will have finalized ordering no matter what happens.

How do we do that? The network will periodically take a vote, in parallel with the Block DAG’s construction, to place infinite weight on a specific block near the frontier of the DAG.

When a block has been given infinite weight, that means all the blocks it points to directly and indirectly via the ghost pointer now all have infinite weight, meaning it’s now impossible to override the ordering with an attack.

In the figure above, we selected the orange block to have infinite weight, and we can see that infinity has carried through backwards into the Block DAG to give infinite weights to all the blocks (marked in dark gray) along the anchor chain. Now we have effectively finalized the ordering of this anchor chain and all the blocks in its associated epochs.

So how do we select which block to give infinite weight to in the first place?

Period Block Selection via PBFT-like Algorithm

To select a block inside the DAG to finalize, we leverage a process that’s similar to the Practical Byzantine Fault Tolerance (PBFT) algorithm. Because PBFT is a well-studied and widely-deployed algorithm, we won’t spent too much time describing every detail and potential faults that could occur in this article, just the broad strokes.

There are roughly four (4) steps in our PBFT period block finalization process, as described in the highly simplified diagram.

Let’s go through this one by one. In each phase, a node …

1. Proposes New Block

Computes his eligibility via VRF (SK, previous_PBFT_block_hash, current_vote_type, current_round_number, current_step_number) = (e, π), where e is the eligibility value, and π is the proof of correct VRF calculation

Decides if e < threshold then he is eligible to propose a PBFT block this round

Picks a DAG block candidate to finalize, somewhere near the frontier but not at the frontier, this is the current Period block candidate Pt

Constructs a Period (more on this later in this article) between the Pt and P(t-1), find all the blocks included within the Period

Constructs a Concurrent Schedule, CS

Constructs a PBFT block candidate (Pc) that includes (Pt, CS), among other information (e.g., timestamp, signature, beneficiary)

Computes the hash of Pc

Broadcasts hash(Pc), Pc, along with his eligibility proof (e, π) to his peers

2. Votes on Leader

Computes his eligibility via VRF again, with current_round_number incremented by 1 and the current vote type, producing another (e, π)

Decides if e < threshold, then he is eligible to participate in this round

Waits for 2λ, where λ is the network diameter — the shortest distance (in time) between the two most distant nodes in the network

Computes the lowest e observed for which π is also correct, the creator if the lowest e is the “leader” — this node is the candidate to propose the next PBFT block

Broadcasts his vote for the hash(Pc) that corresponds to the lowest e to be the leader, along with his eligibility proof (e, π) to his peers

3. Votes on Block

Computes his eligibility via VRF again, with current_round_number incremented by 1 and the current vote type, producing another (e, π)

Decides if e < threshold, then he is eligible to participate in this round

Waits for 2λ

Computes if he has received 2T+1 votes for any given e_min, where T is 1/3 of the eligible voting nodes for the last round

Polls peers for the Pc that corresponds to the e_min and the associated hash(Pc) if he does not yet have the PBFT block

Validates Pc to be correctly constructed

Broadcasts his vote for Pc, along with his eligibility proof (e, π) to his peers

4. Votes to Move Forward

Computes his eligibility via VRF again, with current_round_number incremented by 1 and the current vote type, producing another (e, π)

Decides if e < threshold, then he is eligible to participate in this round

Waits for 2λ

Computes if he has received 2T+1 votes for any given Pc

Validates the winning Pc to be correctly constructed

Computes the newly validated Pc and commits results to permanent storage

Broadcasts his vote to move forward to proposing the next PBFT block, along with his eligibility proof (e, π) to his peers

A Few More Words on the PBFT

That was an extremely simplified description

That was an extremely simplified description of what happens in our PBFT process. It is simplified because we haven’t described all the ways things could go wrong, e.g., no node computed an e that was below the threshold, votes do not reach 2T+1 quorum, large numbers of nodes crashing during the middle of a round, etc.

This PBFT process is highly secure and scalable

Note that every time a node tries to speak, it computes a VRF eligibility value to make sure it is allowed to speak during that round. The eligibility threshold is set and dynamically adjusted to ensure that 2 things,

The nodes that participate in each round is random and likely different, meaning that once an attacker observes a specific node is a participant and marks him for attack, it’s likely that during the next round he is no longer eligible. This is in contrast to many other algorithms that keep the participant (or committee member) around for an extended period of time, making them prime targets for targeted attack. Only a subset of eligible nodes participates in any given round, making this PBFT process highly scalable. This means that even as the network size scales up and the number of eligible participants increase, the number of actual participants (committee size) of these PBFT rounds can easily be set up to scale sub-linearly vs. the network size. Smaller number of participants means faster voting processes.

Combine randomly selected participants and sub-linearly increasing committee size, we get a highly secure and scalable PBFT process.

The Parallel PBFT Chain

Taraxa’s PBFT process creates a linear chain of PBFT blocks alongside the existing Block DAG.

Each PBFT block has two primary purposes,

1. Finalize a DAG block into a Period blockHost a concurrent schedule that prescribes how transactions are to be computed

2. Finalize a DAG block.

This PBFT process confirms a single block inside the Block DAG. Hence, it is not used, like in most other networks that leverage the PBFT process, as the primary consensus algorithm which gates the progress of the entire blockchain. This is why in Taraxa the PBFT process happens in parallel and largely asynchronously vs. the Block DAG construction process.

Each time a new DAG block is finalized, we create a finalized anchor chain and an associated set of blocks along that anchor chain (via epochs along each anchor block). This entire set of blocks is called a Period, which can be thought of a snapshot of a cluster of blocks that have achieved finalized ordering.

Each Period contains many DAG blocks, which leads us to the other task of the PBFT block, to define the order in which transactions are to be computed, via the concurrent schedule.

The Concurrent Schedule

The concurrent schedule defines order in several ways:

The order of the blocks, as defined by the ordering mechanism, since there are many blocks within a Period.

Filtering out redundant transactions between blocks. Because we are working in a DAG data structure, it could very well happen that multiple block proposers would have included the same set of transactions into separate DAG blocks, leading to a certain amount of transaction overlap. Taraxa has a transaction jurisdiction mechanism (in a later article) to help tune this overlap — we want to minimize it but it cannot be kept at zero, as that could lead to excessive transaction orphaning. Splitting the transactions into concurrent vs. sequential sets. This is a critical part of our concurrent EVM design (a topic for a later article), whereby a set of speculative execution algorithms split a set of transactions into those that could be safely parallel-executed vs. those that must be sequentially executed.

Ultimately, you can think of the concurrent schedule as the combined result of all the individual DAG blocks into a single block, embedded into each PBFT block.

We'll be sharing more of our ongoing work in the realm of blockchain networks design beyond consensus only, and would love to hear your thoughts and critique.