Agoric Testnet Technical Analysis
The objective of this blog post is to find out why Most of the validators were missing blocks in Testnet phase 4.5, This same issue also happened back in phase 4.
We had already published a report after phase 4 listing possible reasons
In the above issue I explained the issue could be: 1. Hardware Specs - Some people ran twice the listed specs but to no avail
- Number of Peers - Ruled out
- Two Consensus Heights - Still to be explored
This boils down to two things either the hardware estimation for agoric-sdk is really off or there is some other serious problems. Let's ignore the hardware spec for now.
To investigate the issue I think we need to find the blocks which had drastic change in count of validator signatures. To do that I broke down my work plan to three steps:
- Get Block and Signature Data
- Plot the Data to a Dynamic Chart to identify the culprit blocks
- Analyse those blocks and compare them to others
1. Get Block and Signature Data
We got the signature data for each block starting from the begining to the latest block height, to do that I wrote a simple shell script
for block in {2..75736}
do
ag-cosmos-helper query block $block | jq .block.last_commit.signatures[].validator_address | wc -l >> block_data.txt
done
This script get the number of signatures for each block starting from 2 (left 1 - as it was the first block with no signatures) to 75736 (latest block height atb the time of experiment)
2. Plot the Data to a Dynamic Chart to identify the culprit blocks
Then I wrote some code 😎 to plot the data I got onto a dynamic chart, and to my surprise there were indeed some blocks which result in sudden drop in number of signature, meaning after these blocks many validators started missing blocks.
I have highlighted these sudden cliff with these approx block numbers in the diagram below.
You can find this chart live at the below link
Its a dynamic chart you can zoom into different section of charts and it will scale accordingly.
3. Analyse those blocks and compare them to others
Unfortunately I couldn't complete this step as while performing the restart task our node crashed due to the known issue 33
I had to do a unsafe reset-all
to solve this and now my database folder is empty and my node is syncing again.
But We do hope the chart and inputs We have provided will help team to investigate these block height more precisely