How we built a State Machine to keep up with a 1200+ Txs/second blockchain protocol

Anomit Ghosh (~anomit)


Description:

Introduction to what we do

BlockVigil is an API gateway that allows developers to write code for blockchain protocols without worrying about managing redundancies, integrity checks, synchronization, parsing protocol communication primitives etc. It does so by

  • exposing methods written in native smart contract language (eg. Solidity) as REST API endpoints
  • reading and monitoring protocol specific transactions over webhooks, websockets and quite a few other integration tools like Zapier, Slack, Email (limited by a developer's creativity)

Introduction to the problem at hand

Blockchain is a state machine


  • At their core, most modern blockchain protocols include a virtual machine that maintains a global state of accounts, transactions etc.
  • This state is agreed upon by consensus among peers participating in the network
  • hence we can think of the blockchain protocol itself as a global state machine. For eg. the Ethereum project is sometimes referred to as a "World Computer".

The state is subject to change according to the consensus algorithm


The implications cascade

  • the linked lists of blocks accepted as the global truth change
  • the transactions that were part of the currently invalid chain of blocks might have been processed by certain peers as final
  • the processing of these transactions would have caused further changes in business logic, for eg. updating a balance sheet against which there exists no valid transaction on the chain
    • often, transactions that signal certain state changes on the smart contract (or any programmable unit of the blockchain protocol) trigger subsequent transactions

As an API gateway that monitors and allows read capabilities on blockchain protocols, we needed to come up with a solution that would:

  • maintain a state as close as possible with the global state of the protocol
  • roll back the state in case of a situation as described above
  • alert consumers of the API services about such incidents in case transaction data was consumed from a section of the chain which is currently abandoned

How did we solve it with Python

A core process monitor/co-ordinator that launches, kills, respawns child processes which in turn have the following functions

  • fetch latest blocks from the chain
  • caches past blocks
  • pass down transactions contained in blocks down messaging queues and data streams
  • listens to chain reorganization events

The above has been achieved with the standard multiprocessing library included with Python 3.6 that takes care of coordinating the rollback and restart of the processes for state resynchronization

A pipeline of messaging queues and data streams feed microservices that function as dispatchers, loggers to serve registered clients with the specific contract/transaction/event data they want to monitor. This section will be glanced over with a few architecture diagrams since it is not the subject of this talk.

What can one learn from the talk

  • Use the standard multiprocessing library along with psutils to spawn, suspend, kill child processes and detect any zombie/orphan processes
  • develop a smart backoff/delay capability when it comes to respawning processes that crash
  • use synchronized data structures, locks exposed by SyncManager to avoid deadlocks
  • handling unexpected exceptions/signals to safely shut down a process, communicate the same back to the coordinator

Prerequisites:

A basic understanding of

  • the UNIX/Linux process model
  • the need for parallelization and concurrency

Content URLs:

The rough outline of the presentation content can be found here on a shareable Google Slides link

The accompanying code can be found here on Github -- this points to the dev branch.

A talk at a Blockchained India meetup on BlockVigil's research into Layer 2 scaling solutions on Ethereum.

I have written a fairly descriptive article regarding the issue and implications of chain reorganization on an engineering blog . It describes things at an architectural level.

Speaker Info:

I had been a prolific contributor to the open source ecosystem between the years of 2007-2010. At the time, I was one of the few founding members of Linux User Group, Manipal Institute of Technology, Manipal, Karnataka. I went on to work with a college senior and mentor, Swaroop Hegde and was building job queues, message passing, caching, indexing systems for web services that powered Racked Hosting LLC when there were no mature libraries or frameworks accepted as standards in the developer ecosystem. The language of choice? Yes, Python.

We launched a social calendar product, Planga that empowered the community in college campuses to participate and be involved closely with events, discussions, special interest groups etc. From the Wayback machine : http://web.archive.org/web/20110207214907/http://planga.com/home.html.

From July 2010- Apr 2012 I worked at SAP Labs, Bangalore specifically on the topic of transactional integrity and consumption of database objects by UI/client frameworks. I hold a patent on enabling the same over API gateways that would enable SAP Business Suite applications on a mobile interface.

I was on sabbatical and off coding for the next half a decade as I went on to devote my time and energy into martial arts, yoga, philosophy, off-the-grid sustainable living and pretty much anything that would help me become a well-rounded human. On 1st Jan, 2018 I launched BlockVigil with Swaroop with the singular vision of bringing blockchain development to the masses as well as reducing the complexity associated with enterprise scale applications that need to run on blockchain.

My last gig prior to launching BlockVigil was as an MMA coach at Cult.fit till the end of 2017.

Speaker Links:

Github

BlockVigil engineering blog on Medium

Some examples of the low level hacking I have indulged in (from the Wayback machine):

Section: Core Python
Type: Talks
Target Audience: Advanced
Last Updated: