The Ingestion Filtering feature is now released for public beta testing available from Horizon version 2.18.0 and up.
Ingestion Filtering enables Horizon operators to drastically reduce storage footprint of their Horizon DB by whitelisting Assets and/or Accounts that are relevant to their operations. This feature is ideally suited for private Horizon operators who do not need full history for all assets and accounts on the Stellar network.
Why is it useful:
Previously, the only way to limit data storage is by limiting the amount of history Horizon ingests, either by configuring the starting ledger to be later than genesis block or via rolling retention (ie: last 30 days). This feature allows users to store the full history of assets and accounts (and related entities) that they care about.
For further context, running a full history Horizon instance currently takes ~ 15TB of disk space (as of June 2022) with storage growing at a rate of ~ 1TB / month. As a benchmark, filtering by even 100 of the most active accounts and assets reduces storage by over 90%. For the majority of users who care about an even more limited set of assets and accounts, storage savings should be well over 99%. Other benefits are reducing operating costs for maintaining storage, improved DB health metrics and query performance.
How does it work:
This feature provides an ability to select which ledger transactions are accepted at ingestion time to be stored in Horizon’s historical database. Filter whitelists are maintained via an admin REST API (and persisted in the DB). The ingestion process checks the list and persists transactions related to Accounts and Assets that are whitelisted. Note that the feature does not filter the current state of the ledger and related DB tables, only history tables.
Whitelisting can include the following supported entities:
- Account id
- Asset id (canonical)
Given that all transactions related to the white listed entities are included, all historical time series data related to those transactions are saved in horizon's history db as well. For example, whitelisting an Asset will also persist all Accounts that interact with that Asset and vice versa, if an Account is whitelisted, all assets that are held by that Account will also be included.
The filters and their configuration are optional features and must be enabled with horizon command line or environmental parameters:
As Environment properties:
These should be included in addition to the standard ingestion parameters that must be set also to enable the ingestion engine to be running, such as
ingest=true, etc. Once these flags are included at horizon runtime, filter configurations and their rules are initially empty and the filters are disabled by default. To enable filters, update the configuration settings, refer to the Admin API Docs which are published as Open API 3.0 doc on the Admin Port at
http://localhost:<admin_port>/. You can paste the contents from that url into any OAPI tool such as Swagger which will render a visual explorer of the API endpoints. Follow details and examples for endpoints:
Adding and Removing Entities can be done by submitting PUT requests to the
To add new filtered entities, submit an
HTTP PUT request to the admin API endpoints for either Asset or Account filters. The PUT request body will be JSON that expresses the filter rules, currently the rules model is a whitelist format and expressed as JSON string array. To remove entities, submit an
HTTP PUT request to update the list accordingly. To retrieve what is currently configured, submit an
HTTP GET request.
The OAPI doc published by the Admin Server can be pulled directly from the Github repo here.
Disable both Asset and Account Filter config rules via the Admin API by setting
enabled=falsein each filter rule, or set
--exp-enable_ingestion_filtering=false, this will open up forward ingestion to include all data again. It is then your choice whether to run a Re-ingestion to capture older data from past that would have been dropped by filters but could now be re-imported with filters off, e.g.
horizon db reingest <from_ledger> <to_ledger>
If you have a DB backup:
- restore the DB
- run a Reingestion Gap Fill command to fill in the gaps to current tip of the chain
- resume Ingestion Sync
- Start over with a fresh DB (or see Patching Historical Data below)
Patching Historical Data:
If new Assets or Accounts are added to the whitelist and you would like to patch in its missing historical data, Reingestion can be run. The Reingestion process is idempotent and will re-ingest the data from the designated ledger range and overwrite or insert new data if not already on current DB.
Sample Use Case:
As an Asset Issuer, I have issued 4 assets and am interested in all transaction data related to those assets including customer Accounts that interact with those assets and the following:
- Claimable balances
I would like to store the full history of all transactions related from the genesis of those assets.
You have an existing Horizon installed, configured and has forward ingestion enabled at a minimum to be able to successfully sync to the current state of the Stellar network. Bonus if you are familiar with running re-ingestion.
Configure 4 whitelisted Assets via the Admin API. Also check the
HISTORY_RETENTION_COUNTand set it to
0if you don’t want any history purged anymore now that you are filtering, otherwise it will continue to reap all data older than the retention.
Decide if you want to wipe existing history data on the DB first before the filtering starts running, you can effectively clear the history by running
HISTORY_RETENTION_COUNT=1 stellar-horizon db reap
or drop/create the db and run
stellar-horizon db init.
Alternatively, if you do not need to free up old history tables, you can effectively stop here, anytime changes or enablement of filter rules are done, the history tables will immediately reflect filtered data per those latest rules from the time the filter config is updated and forward.
- If starting with a fresh DB, decide if you want to re-run ingestion from the earliest ledger # related to the whitelisted entities to populate history for just the allowed data from filters.
- Tip: To find this ledger number, you can check for the earliest transaction of the Account issuing that asset.
- Also consider running parallel workers to speed up the process.
Optional: When re-ingestion is finished, run an ingestion gap fill
stellar-horizon db fill-gapsto fill any gaps that may have been missed.
Verify that your data is there
- Do a spot check of Accounts that should be automatically be ingested against a full history Horizon instance such as SDF Horizon