2 min read

Google made the Bitcoin dataset publicly available for analysis in Google BigQuery in February, this year. On the same lines, it announced Ethereum dataset availability in BigQuery, recently, on August 29th for smart contract analytics.

Ethereum blockchain is considered as an immutable distributed ledger similar to its predecessor, Bitcoin. However, Vitalik Buterin, Ethereum’s creator, extended Ethereum’s set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

Need for Ethereum blockchain data availability on Google Cloud

Ethereum blockchain peer-to-peer software has an API for a subset of commonly used random-access functions, for instance, checking transaction status, looking up wallet-transaction associations, and checking wallet balances.

API endpoints neither exist for easy access to the data stored on-chain, nor for viewing the blockchain data in aggregate.  Given below is an example chart showing the total Ether transferred, and average transaction cost, aggregated by day:

Source: Google

Such a visualization, underpinned with a database query aids in making business decisions, such as prioritizing improvements to the Ethereum architecture itself to balance sheet adjustments.

BigQuery has strong OLAP capabilities to support such an analysis during ad-hoc and in general situations. Also, this does not require additional API implementation.

Accordingly, Google built a software system on Google Cloud that:

  • Synchronizes the Ethereum blockchain to computers running Parity in Google Cloud.
  • Performs a daily extraction of data from the Ethereum blockchain ledger, including the results of smart contract transactions, such as token transfers.
  • De-normalizes and stores date-partitioned data to BigQuery for easy and cost-effective exploration.

Google has also demonstrated a number of interesting queries and visualizations based on the Ethereum dataset. The analysis focus on three topics:

  1. Smart contract function calls
  2. On-chain transaction time-series and transaction networks
  3. Smart contract function analytics

The Ethereum blockchain dataset is also available on Kaggle. You can query the live data in Kernels, Kaggle’s no charge in-browser coding environment, using the BigQuery Python client library. The Ethereum ETL project on GitHub contains all source code used to extract data from the Ethereum blockchain and load it into BigQuery.

Read more about this news in detail on Google Cloud blog.

Read Next

Vitalik Buterin’s new consensus algorithm to make Ethereum 99% fault tolerant

How to set up an Ethereum development environment [Tutorial]

Everything you need to know about Ethereum

A Data science fanatic. Loves to be updated with the tech happenings around the globe. Loves singing and composing songs. Believes in putting the art in smart.


Please enter your comment!
Please enter your name here