Skip to main content

Hubble

What is Hubble?

Hubble is an open-source, publicly available dataset that provides a complete historical record of the Stellar network. It ingests and presents the data produced by the Stellar network in a format that is easier to consume than the performance-oriented data representations used by Stellar Core. The dataset is hosted on BigQuery–meaning it is suitable for large, analytic workloads, historical data retrieval and complex data aggregation. Hubble should not be used for real-time data retrieval and cannot submit transactions to the network. For real time use cases, we recommend running an API server.

This guide describes when to use Hubble and how to connect. To view the underlying data structures, queries and examples, use the Viewing Metadata and Optimizing Queries tutorials.

Why Use Hubble?

Some questions are hard to answer with the RPC API as it's API is quite minimal. This is because its infrastructure is optimized for quick reads and writes so that it can process online transactions.

This is where Hubble comes in. It is optimized to execute complex queries and scan large amounts of data. Hubble can store orders of magnitude more data than RPC and will not run into storage constraints. Queries that require pagination in RPC or timeout can be returned in a single query. Hubble empowers users to explore, analyze, and derive meaningful conclusions from the data without the burden of maintaining a database.

Users should be aware of the following limitations:

  • Hubble is read-only; it cannot interact with the Stellar Network.
  • The database is updated in intraday batches. There is no guarantee for same-day data availability.
  • The SDF hosts a public instance of Hubble, and end users incur the cost to execute queries. Visit the BigQuery Pricing Page to learn more.

Why We Chose BigQuery

BigQuery is Google Cloud’s data warehouse that comes with some key features that fulfill Stellar’s analytic needs.

First, BigQuery allows anyone to make a dataset publicly available. This means that the SDF can contribute open source repositories to build and maintain a data warehouse and also host a public instance.

BigQuery also separates storage from compute, which makes it sustainable to host a public instance. The maintainer only has to pay the cost of storage without incurring the cost of the analytics running on the dataset.

Most importantly, BigQuery is the de facto platform for blockchain datasets. By selecting BigQuery, Stellar Network data is located with other blockchain data, which allows for cross-chain analytics.