This document outlines methods to perform inital backfill when setting up hubble.

Use SDF ETL Data as a Source

1. Export Data to Cloud Storage and Load Data into BigQuery

Use the EXPORT command provided by Google Cloud Platform (GCP) to export your dataset in the required format (e.g., Avro, Parquet).
Use the LOAD command to load the exported files into your BigQuery dataset.

Example:

EXPORT DATA OPTIONS(
  uri='gs://my-bucket/history-transactions/*',
  format='PARQUET',
  overwrite=true) AS
SELECT * FROM crypto-stellar.crypto_stellar.history_transactions;

LOAD DATA INTO mydataset.transactions
  FROM FILES(
    format='PARQUET',
    uris = ['gs://my-bucket/history-transactions/*']
  )
  WITH PARTITION COLUMNS;

2. Use BigQuery API / Console to Mirror SDF's Dataset

The Stellar Development Foundation provides public access to fully transformed Stellar network data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the Connecting section.

Use the Create Table Copy statement to copy data across datasets.

CREATE [ OR REPLACE ] TABLE [ IF NOT EXISTS ] table_name
COPY source_table_name
...
[OPTIONS(table_option_list)]

Use Galexie as a Source

Galexie is a tool for extracting, processing, and exporting Stellar ledger metadata to external storage, creating a data lake of pre-processed ledger metadata. This is an upstream data source to Hubble, useful when you have a custom Stellar-ETL pipeline.

Steps

Ensure you provide the correct value for datastore-path in Stellar-ETL command flags. This represents the name of the bucket where your Galexie instance outputs ledger close metadata.
Set up an orchestration system like Airflow.
In your Airflow instance, trigger the history_table_export DAG for older dates.

1. Export Data to Cloud Storage and Load Data into BigQuery​

Example:​

2. Use BigQuery API / Console to Mirror SDF's Dataset​

Use Galexie as a Source

1. Export Data to Cloud Storage and Load Data into BigQuery

Example:

2. Use BigQuery API / Console to Mirror SDF's Dataset