Skip to main content

Ingestion

Horizon provides access to both current and historical state on the Stellar network through a process called ingestion.

Horizon provides most of its utility through ingested data, and your Horizon server can be configured to listen for and ingest transaction results from the Stellar network. Ingestion enables API access to both current (e.g. someone's balance) and historical state (e.g. someone's transaction history).

Ingestion Types

There are two primary ingestion use-cases for Horizon operations:

  • ingesting live data to stay up to date with the latest, real-time changes to the Stellar network, and
  • ingesting historical data to peek how the Stellar ledger has changed over time

Ingesting Live Data

Though this option is disabled by default, in this guide we've assumed you turned it on. If you haven't, pass the --ingest flag or set INGEST=true in your environment.

For a serious setup, we highly recommend having more than one live ingesting instance, as this makes it easier to avoid downtime during upgrades and adds resilience to your infrastructure, ensuring you always have the latest network data.

Ingesting Historical Data

Providing API access to historical data is facilitated by a Horizon subcommand:

stellar-horizon db reingest range <start> <end>

(The command name is a bit of a misnomer: you can use reingest both to ingest new ledger data and reingest old data.)

You can run this process in the background while your Horizon server is up. It will continuously decrement the history.elder_ledger in your /metrics endpoint until the <end> ledger is reached and the backfill is complete. If Horizon receives a request for a ledger it hasn't ingested, it returns a 503 error and clarify that it's Still Ingesting (see below).

Deciding on how much history to ingest

You should think carefully about the amount of ingested data you'd like to keep around. Though the storage requirements for the entire Stellar network are substantial, most organizations and operators only need a small fraction of the history to fit their use case. For example,

  • If you just started developing a new application or service, you can probably get away with just doing live ingestion, since nothing you do requires historical data.

  • If you're moving an existing service away from reliance on SDF's Horizon, you likely only need history from the point at which you started using the Stellar network.

  • If you provide temporal guarantees to your users--a 6-month guarantee of transaction history like some online banks do, or history only for the last thousand ledgers (see below), for example--then you similarly don't have heavy ingestion requirements.

Even a massively-popular, well-established custodial service probably doesn't need full history to service its users. It will, however, need full history to be a Full Validator with published history archives.

Reingestion

Regardless of whether you are running live ingestion or building up historical data, you may occasionally need to _re_ingest ledgers anew (for example on certain upgrades of Horizon). For this, you use the same command as above.

Parallel ingestion

Note that historical (re)ingestion happens independently for any given ledger range, so you can reingest in parallel across multiple Horizon processes:

horizon1> stellar-horizon db reingest range 1 10000
horizon2> stellar-horizon db reingest range 10001 20000
horizon3> stellar-horizon db reingest range 20001 30000
# ... etc.

Managing storage

Over time, the recorded network history will grow unbounded, increasing storage used by the database. Horizon needs sufficient disk space to expand the data ingested from Stellar Core. Unless you need to maintain a history archive, you should configure Horizon to only retain a certain number of ledgers in the database.

This is done using the --history-retention-count flag or the HISTORY_RETENTION_COUNT environment variable. Set the value to the number of recent ledgers you wish to keep around, and every hour the Horizon subsystem will reap expired data. Alternatively, Horizon provides a command to force a collection:

stellar-horizon db reap

Common Issues

Ingestion is a complicated process, so there are a number of things to look out for.

Some endpoints are not available during state ingestion

Endpoints that display state information are not available during initial state ingestion and will return a 503 Service Unavailable/Still Ingesting error. An example is the /paths endpoint (built using offers). Such endpoints will become available after state ingestion is done (usually within a couple of minutes).

State ingestion is taking a lot of time

State ingestion shouldn't take more than a couple of minutes on an AWS c5.xlarge instance or equivalent.

It's possible that the progress logs (see below) will not show anything new for a longer period of time or print a lot of progress entries every few seconds. This happens because of the way history archives are designed.

The ingestion is still working but it's processing entries of type DEADENTRY. If there is a lot of them in the bucket, there are no active entries to process. We plan to improve the progress logs to display actual percentage progress so it's easier to estimate an ETA.

If you see that ingestion is not proceeding for a very long period of time:

  1. Check the RAM usage on the machine. It's possible that system ran out of RAM and is using swap memory that is extremely slow.
  2. If above is not the case, file a new issue in the Horizon repository.

CPU usage goes high every few minutes

This is by design. Horizon runs a state verifier routine that compares state in local storage to history archives every 64 ledgers to ensure data changes are applied correctly. If data corruption is detected, Horizon will block access to endpoints serving invalid data.

We recommend keeping this security feature turned on; however, if it's causing problems (due to CPU usage) this can be disabled via the --ingest-disable-state-verification/INGEST_DISABLE_STATE_VERIFICATION parameter.

Ingesting Full Public Network History

In some (albeit rare) cases, it can be convenient to (re)ingest the full Stellar Public Network history into Horizon (e.g. when running Horizon for the first time). Using multiple Captive Core workers on a high performance environment (powerful machines on which to run Horizon + a powerful database) makes this possible in ~1.5 days.

The following instructions assume the reingestion is done on AWS. However, they should be applicable to any other environment with equivalent capacity. In the same way, the instructions can be adapted to reingest only specific parts of the history.

Prerequisites

Before we begin, we make some assumptions around the environment required. Please refer to the Prerequisites section for the current HW requirements to run Horizon reingestion for either historical catch up or real-time ingestion (for staying in sync with the ledger). A few things to keep in mind:

  1. For reingestion, the more parallel workers are provisioned to speed up the process, the larger the machine size is required in terms of RAM, CPU, IOPS and disk size. The size of the RAM per worker also increases over time (14GB RAM / worker as of mid 2022) due to the growth of the ledger. HW specs can be downsized once reingestion is completed.

  2. Horizon latest version installed on the machine from (1).

  3. Core latest version installed on the machine from (1).

  4. A Horizon database where to reingest the history. Preferably, the database should be empty to minimize storage (Postgres accumulates data during usage, which is only deleted when VACUUMed) and have the minimum spec's for reingestion as outlined in Prerequisites.

As the DB storage grows, the IO capacity will grow along with it. The number of workers (and the size of the instance created in (1), should be increased accordingly if we want to take advantage of it. To make sure we are minimizing reingestion time, we should watch write IOPS. It should ideally always be close to the theoretical limit of the DB.

Parallel Reingestion

Once the prerequisites are satisfied, we can spawn two Horizon reingestion processes in parallel:

  1. One for the first 17 million ledgers (which are almost empty).
  2. Another one for the rest of the history.

This is due to first 17 million ledgers being almost empty whilst the rest are much more packed. Having a single Horizon instance with enough workers to saturate the IO capacity of the machine for the first 17 million would kill the machine when reingesting the rest (during which there is a higher CPU and memory consumption per worker).

64 workers for (1) and 20 workers for (2) saturates instance with RAM and 15K IOPS. Again, as the DB storage grows, a larger number of workers and faster storage should be considered.

In order to run the reingestion, first set the following environment variables in the configuration (updating values to match your database environment, of course):

export DATABASE_URL=postgres://postgres:[email protected]:5432/horizon
export APPLY_MIGRATIONS=true
export HISTORY_ARCHIVE_URLS=https://s3-eu-west-1.amazonaws.com/history.stellar.org/prd/core-live/core_live_001
export NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015"
export STELLAR_CORE_BINARY_PATH=$(which stellar-core)
export ENABLE_CAPTIVE_CORE_INGESTION=true
# Number of ledgers per job sent to the workers.
# The larger the job, the better performance from Captive Core's perspective,
# but, you want to choose a job size which maximizes the time all workers are
# busy.
export PARALLEL_JOB_SIZE=100000
# Retries per job
export RETRIES=10
export RETRY_BACKOFF_SECONDS=20

# Enable optional config when running captive core ingestion

# For stellar-horizon to download buckets locally at specific location.
# If not enabled, stellar-horizon would download data in the current working directory.
# export CAPTIVE_CORE_STORAGE_PATH="/var/lib/stellar"

# For stellar-horizon to use local disk file for ledger states rather than in memory(RAM), approximately
# 8GB of space and increasing as size of ledger entries grows over time.
# export CAPTIVE_CORE_USE_DB=true

(Naturally, you can also edit the configuration file at /etc/default/stellar-horizon directly if you installed from a package manager.)

If Horizon was previously running, first ensure it is stopped. Then, run the following commands in parallel:

  1. stellar-horizon db reingest range --parallel-workers=64 1 16999999
  2. stellar-horizon db reingest range --parallel-workers=20 17000000 <latest_ledger>

(Where you can find <latest_ledger> under SDF Horizon's core_latest_ledger field.)

When saturating a database instance with 15K IOPS capacity:

(1) should take a few hours to complete.

(2) should take about 3 days to complete.

Although there is a retry mechanism, reingestion may fail half-way. Horizon will print the recommended range to use in order to restart it.

When reingestion is complete it's worth running ANALYZE VERBOSE [table] on all tables to recalculate the stats. This should improve the query speed.

Monitoring reingestion process

This script should help monitor the reingestion process by printing the ledger subranges being reingested:

#!/bin/bash
echo "Current ledger ranges being reingested:"
echo
I=1
for S in $(ps aux | grep stellar-core | grep catchup | awk '{print $15}' | sort -n); do
printf '%15s' $S
if [ $(( I % 5 )) = 0 ]; then
echo
fi
I=$(( I + 1))
done

Ideally we would be using Prometheus metrics for this, but they haven't been implemented yet.

Here is an example run:

Current ledger ranges being reingested:
99968/99968 199936/99968 299904/99968 399872/99968 499840/99968
599808/99968 699776/99968 799744/99968 899712/99968 999680/99968
1099648/99968 1199616/99968 1299584/99968 1399552/99968 1499520/99968
1599488/99968 1699456/99968 1799424/99968 1899392/99968 1999360/99968
2099328/99968 2199296/99968 2299264/99968 2399232/99968 2499200/99968
2599168/99968 2699136/99968 2799104/99968 2899072/99968 2999040/99968
3099008/99968 3198976/99968 3298944/99968 3398912/99968 3498880/99968
3598848/99968 3698816/99968 3798784/99968 3898752/99968 3998720/99968
4098688/99968 4198656/99968 4298624/99968 4398592/99968 4498560/99968
4598528/99968 4698496/99968 4798464/99968 4898432/99968 4998400/99968
5098368/99968 5198336/99968 5298304/99968 5398272/99968 5498240/99968
5598208/99968 5698176/99968 5798144/99968 5898112/99968 5998080/99968
6098048/99968 6198016/99968 6297984/99968 6397952/99968 17099967/99968
17199935/99968 17299903/99968 17399871/99968 17499839/99968 17599807/99968
17699775/99968 17799743/99968 17899711/99968 17999679/99968 18099647/99968
18199615/99968 18299583/99968 18399551/99968 18499519/99968 18599487/99968
18699455/99968 18799423/99968 18899391/99968 18999359/99968 19099327/99968
19199295/99968 19299263/99968 19399231/99968

Reading Logs

In order to check the progress and status of ingestion you should check your logs regularly; all logs related to ingestion are tagged with service=ingest.

It starts with informing you about state ingestion:

INFO[...] Starting ingestion system from empty state...  pid=5965 service=ingest temp_set="*io.MemoryTempSet"
INFO[...] Reading from History Archive Snapshot pid=5965 service=ingest ledger=25565887

During state ingestion, Horizon will log the number of processed entries every 100,000 entries (there are currently around 10M entries in the public network):

INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=100000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot ledger=25565887 numEntries=200000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot ledger=25565887 numEntries=300000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot ledger=25565887 numEntries=400000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot ledger=25565887 numEntries=500000 pid=5965 service=ingest

When state ingestion is finished, it will proceed to ledger ingestion starting from the next ledger after the checkpoint ledger (25565887+1 in this example) to update the state using transaction metadata:

INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=5400000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot ledger=25565887 numEntries=5500000 pid=5965 service=ingest
INFO[...] Processed ledger ledger=25565887 pid=5965 service=ingest type=state_pipeline
INFO[...] Finished processing History Archive Snapshot duration=2145.337575904 ledger=25565887 numEntries=5529931 pid=5965 service=ingest shutdown=false
INFO[...] Reading new ledger ledger=25565888 pid=5965 service=ingest
INFO[...] Processing ledger ledger=25565888 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger ledger=25565888 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger duration=0.086024492 ledger=25565888 pid=5965 service=ingest shutdown=false transactions=14
INFO[...] Reading new ledger ledger=25565889 pid=5965 service=ingest
INFO[...] Processing ledger ledger=25565889 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger ledger=25565889 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger duration=0.06619956 ledger=25565889 pid=5965 service=ingest shutdown=false transactions=29
INFO[...] Reading new ledger ledger=25565890 pid=5965 service=ingest
INFO[...] Processing ledger ledger=25565890 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger ledger=25565890 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger duration=0.071039012 ledger=25565890 pid=5965 service=ingest shutdown=false transactions=20

Managing Stale Historical Data

Horizon ingests ledger data from a managed, pared-down Captive Stellar Core instance. In the event that Captive Core crashes, lags, or if Horizon stops ingesting data for any other reason, the view provided by Horizon will start to lag behind reality. For simpler applications, this may be fine, but in many cases this lag is unacceptable and the application should not continue operating until the lag is resolved.

To help applications that cannot tolerate lag, Horizon provides a configurable "staleness" threshold. If enough lag accumulates to surpass this threshold (expressed in number of ledgers), Horizon will only respond with an error: stale_history. To configure this option, use the --history-stale-threshold/HISTORY_STALE_THRESHOLD parameter.

Note: Non-historical requests (such as submitting transactions or checking account balances) will not error out if the staleness threshold is surpassed.