Backfill
Once your scheduling and orchestration are set up, you might encounter the following scenarios:
Use Case | Description | Solution |
---|---|---|
Initial Backfill | You have just setup hubble and would like to ingest historical data | - Option 1. Data Import Pros: Cheap and fast - Option 2: Re-trigger DAGs for past dates Cons: Slow and expensive |
Bug Fix | You resolved a bug and need to re-ingest a specific data column/s or back fix a data column | - Option 1. JS UDF Pros: Cheap and fast Cons: May need optimized query writing and running in batches - Option 2: Re-trigger DAGs for past dates Cons: Slow and expensive |
New data column extraction | You added a new data column/s as part of a feature request and need to backfill data for the newly added column/s | - Option 1. JS UDF Pros: Cheap and fast Cons: May need optimized query writing and running in batches - Option 2: Re-trigger DAGs for past dates Cons: Slow and expensive |
📄️ Backfill using JS UDF
This document outlines methods to extract required fields from the XDR of raw data.
📄️ Data Import
This document outlines methods to perform inital backfill when setting up hubble.