What is Hubble?
Hubble is an open-source, publicly available dataset that provides a complete historical record of the Stellar network. Similar to Horizon, it ingests and presents the data produced by the Stellar network in a format that is easier to consume than the performance-oriented data representations used by Stellar Core. The dataset is hosted on BigQuery–meaning it is suitable for large, analytic workloads, historical data retrieval and complex data aggregation. Hubble should not be used for real-time data retrieval and cannot submit transactions to the network. For real time use cases, we recommend running an API server.
Why Use Hubble?
Some questions are hard to answer with the Horizon API and its underlying PostgreSQL database. This is because its infrastructure is optimized for quick database reads and writes so that it can process online transactions. Horizon can accurately store the results of these smaller transactions, however it sacrifices the ability to execute complex queries easily. The Stellar Network’s data footprint has also increased exponentially, which is creating space constraints and performance issues for Horizon instances that store the full historical record.
This is where Hubble comes in. It is optimized to execute complex queries and scan large amounts of data. Hubble can store orders of magnitude more data than Horizon and will not run into the same storage constraints. Queries that require pagination in Horizon or timeout can be returned in a single query. Hubble empowers users to explore, analyze, and derive meaningful conclusions from the data without the burden of maintaining a database.
Users should be aware of the following limitations:
- Hubble is read-only; it cannot interact with the Stellar Network.
- The database is updated in intraday batches. There is no guarantee for same-day data availability.
- The SDF hosts a public instance of Hubble, and end users incur the cost to execute queries. Visit the BigQuery Pricing Page to learn more.
Why We Chose BigQuery
BigQuery is Google Cloud’s data warehouse that comes with some key features that fulfill Stellar’s analytic needs.
First, BigQuery allows anyone to make a dataset publicly available. This means that the SDF can contribute open source repositories to build and maintain a data warehouse and also host a public instance.
BigQuery also separates storage from compute, which makes it sustainable to host a public instance. The maintainer only has to pay the cost of storage without incurring the cost of the analytics running on the dataset.
Most importantly, BigQuery is the de facto platform for blockchain datasets. By selecting BigQuery, Stellar Network data is located with other blockchain data, which allows for cross-chain analytics.