This document outlines methods to perform inital backfill when setting up hubble.
Use SDF ETL Data as a Source
1. Export Data to Cloud Storage and Load Data into BigQuery
- Use the EXPORT command provided by Google Cloud Platform (GCP) to export your dataset in the required format (e.g., Avro, Parquet).
- Use the LOAD command to load the exported files into your BigQuery dataset.
Example:
EXPORT DATA OPTIONS(
uri='gs://my-bucket/history-transactions/*',
format='PARQUET',
overwrite=true) AS
SELECT * FROM crypto-stellar.crypto_stellar.history_transactions;
LOAD DATA INTO mydataset.transactions
FROM FILES(
format='PARQUET',
uris = ['gs://my-bucket/history-transactions/*']
)
WITH PARTITION COLUMNS;
2. Use BigQuery API / Console to Mirror SDF's Dataset
The Stellar Development Foundation provides public access to fully transformed Stellar network data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the Connecting section.
Use the Create Table Copy statement to copy data across datasets.
CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] table_name
COPY source_table_name
...
[OPTIONS(table_option_list)]
Use Galexie as a Source
Galexie is a tool for extracting, processing, and exporting Stellar ledger metadata to external storage, creating a data lake of pre-processed ledger metadata. This is an upstream data source to Hubble, useful when you have a custom Stellar-ETL pipeline.
Steps
- Ensure you provide the correct value for datastore-path in Stellar-ETL command flags. This represents the name of the bucket where your Galexie instance outputs ledger close metadata.
- Set up an orchestration system like Airflow.
- In your Airflow instance, trigger the history_table_export DAG for older dates.