Skip to content
Docs
Submit bugs here.

Please submit any typos you come across to GitHub issues.


Ingestion

Horizon provides access to both current and historical state on the Stellar network through a process called ingestion.

Horizon provides most of its utility through ingested data, and your Horizon server can be configured to listen for and ingest transaction results from the Stellar network. Ingestion enables API access to both current (e.g. someone’s balance) and historical state (e.g. someone’s transaction history).

Ingestion Types

There are two primary ingestion use-cases for Horizon operations:

  • ingesting live data to stay up to date with the latest, real-time changes to the Stellar network, and
  • ingesting historical data to peek how the Stellar ledger has changed over time

Ingesting Live Data

Though this option is disabled by default, in this guide we’ve assumed you turned it on. If you haven’t, pass the --ingest flag or set INGEST=true in your environment.

For a serious setup, we highly recommend having more than one live ingesting instance, as this makes it easier to avoid downtime during upgrades and adds resilience to your infrastructure, ensuring you always have the latest network data.

Ingesting Historical Data

Providing API access to historical data is facilitated by a Horizon subcommand:

stellar-horizon db reingest range <start> <end>

(The command name is a bit of a misnomer: you can use reingest both to ingest new ledger data and reingest old data.)

You can run this process in the background while your Horizon server is up. It will continuously decrement the history.elder_ledger in your /metrics endpoint until the <end> ledger is reached and the backfill is complete. If Horizon receives a request for a ledger it hasn’t ingested, it returns a 503 error and clarify that it’s Still Ingesting (see below).

Deciding on how much history to ingest

You should think carefully about the amount of ingested data you’d like to keep around. Though the storage requirements for the entire Stellar network are substantial, most organizations and operators only need a small fraction of the history to fit their use case. For example,

  • If you just started developing a new application or service, you can probably get away with just doing live ingestion, since nothing you do requires historical data.

  • If you’re moving an existing service away from reliance on SDF’s Horizon, you likely only need history from the point at which you started using the Stellar network.

  • If you provide temporal guarantees to your users—a 6-month guarantee of transaction history like some online banks do, or history only for the last thousand ledgers (see below), for example—then you similarly don’t have heavy ingestion requirements.

Even a massively-popular, well-established custodial service probably doesn’t need full history to service its users. It will, however, need full history to be a Full Validator with published history archives.

Reingestion

Regardless of whether you are running live ingestion or building up historical data, you may occasionally need to _re_ingest ledgers anew (for example on certain upgrades of Horizon). For this, you use the same command as above.

Parallel ingestion

Note that historical (re)ingestion happens independently for any given ledger range, so you can reingest in parallel across multiple Horizon processes:

horizon1> stellar-horizon db reingest range 1 10000
horizon2> stellar-horizon db reingest range 10001 20000
horizon3> stellar-horizon db reingest range 20001 30000
# ... etc.

Managing storage

Over time, the recorded network history will grow unbounded, increasing storage used by the database. Horizon needs sufficient disk space to expand the data ingested from Stellar Core. Unless you need to maintain a history archive, you should configure Horizon to only retain a certain number of ledgers in the database.

This is done using the --history-retention-count flag or the HISTORY_RETENTION_COUNT environment variable. Set the value to the number of recent ledgers you wish to keep around, and every hour the Horizon subsystem will reap expired data. Alternatively, Horizon provides a command to force a collection:

stellar-horizon db reap

If you configure this parameter, we also recommend configuring the CATCHUP_RECENT value for your Captive Core instance.

Common Issues

Ingestion is a complicated process, so there are a number of things to look out for.

Some endpoints are not available during state ingestion

Endpoints that display state information are not available during initial state ingestion and will return a 503 Service Unavailable/Still Ingesting error. An example is the /paths endpoint (built using offers). Such endpoints will become available after state ingestion is done (usually within a couple of minutes).

State ingestion is taking a lot of time

State ingestion shouldn’t take more than a couple of minutes on an AWS c5.xlarge instance or equivalent.

It’s possible that the progress logs (see below) will not show anything new for a longer period of time or print a lot of progress entries every few seconds. This happens because of the way history archives are designed.

The ingestion is still working but it’s processing entries of type DEADENTRY. If there is a lot of them in the bucket, there are no active entries to process. We plan to improve the progress logs to display actual percentage progress so it’s easier to estimate an ETA.

If you see that ingestion is not proceeding for a very long period of time:

  1. Check the RAM usage on the machine. It’s possible that system ran out of RAM and is using swap memory that is extremely slow.
  2. If above is not the case, file a new issue in the Horizon repository.

CPU usage goes high every few minutes

This is by design. Horizon runs a state verifier routine that compares state in local storage to history archives every 64 ledgers to ensure data changes are applied correctly. If data corruption is detected, Horizon will block access to endpoints serving invalid data.

We recommend keeping this security feature turned on; however, if it’s causing problems (due to CPU usage) this can be disabled via the --ingest-disable-state-verification/INGEST_DISABLE_STATE_VERIFICATION parameter.

Ingesting Full Public Network History

In some (albeit rare) cases, it can be convenient to (re)ingest the full Stellar Public Network history into Horizon (e.g. when running Horizon for the first time). Using multiple Captive Core workers on a high performance environment (powerful machines on which to run Horizon + a powerful database) makes this possible in ~1.5 days.

The following instructions assume the reingestion is done on AWS. However, they should be applicable to any other environment with equivalent capacity. In the same way, the instructions can be adapted to reingest only specific parts of the history.

Prerequisites

Before we begin, we make some environment assumptions:

  1. An instance with 32 CPU cores, 64 GB of RAM and at least 200 GB of disk capacity from which to run Horizon (for example AWS m5.8xlarge). This is needed to fit 20 Horizon parallel workers (each with their own Captive Core instance). Each Core instance can take up to 3GB of RAM and a full CPU core (more on why 20 workers below). If the number of workers is increased, you may need a larger machine.

  2. Horizon version 1.6.0 or newer (ideally, 2.x) installed on the machine from (1).

  3. Core version 17 or newer installed on the machine from (1), given the network is on Protocol 17.

  4. A Horizon database where to reingest the history. Preferably, the database should have at least 32 CPUs, 256 GB of RAM and at least 15K IOPS (for example AWS RDS r5.8xlarge) and should be empty to minimize storage (Postgres accumulates data during usage, which is only deleted when VACUUMed).

As the DB storage grows, the IO capacity will grow along with it. The number of workers (and the size of the instance created in (1), should be increased accordingly if we want to take advantage of it. To make sure we are minimizing reingestion time, we should watch write IOPS. It should ideally always be close to the theoretical limit of the DB.

Parallel Reingestion

Once the prerequisites are satisfied, we can spawn two Horizon reingestion processes in parallel:

  1. One for the first 17 million ledgers (which are almost empty).
  2. Another one for the rest of the history.

This is due to first 17 million ledgers being almost empty whilst the rest are much more packed. Having a single Horizon instance with enough workers to saturate the IO capacity of the machine for the first 17 million would kill the machine when reingesting the rest (during which there is a higher CPU and memory consumption per worker).

64 workers for (1) and 20 workers for (2) saturates instance with 64GB of RAM and 15K IOPS. Again, as the DB storage grows, a larger number of workers and faster storage should be considered.

In order to run the reingestion, first set the following environment variables in the configuration (updating values to match your database environment, of course):

export DATABASE_URL=postgres://postgres:[email protected]:5432/horizon
export APPLY_MIGRATIONS=true
export HISTORY_ARCHIVE_URLS=https://s3-eu-west-1.amazonaws.com/history.stellar.org/prd/core-live/core_live_001
export NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015"
export STELLAR_CORE_BINARY_PATH=$(which stellar-core)
export ENABLE_CAPTIVE_CORE_INGESTION=true
# Number of ledgers per job sent to the workers.
# The larger the job, the better performance from Captive Core's perspective,
# but, you want to choose a job size which maximizes the time all workers are
# busy.
export PARALLEL_JOB_SIZE=100000
# Retries per job
export RETRIES=10
export RETRY_BACKOFF_SECONDS=20

(Naturally, you can also edit the configuration file at /etc/default/stellar-horizon directly if you installed from a package manager.)

If Horizon was previously running, first ensure it is stopped. Then, run the following commands in parallel:

  1. stellar-horizon db reingest range --parallel-workers=64 1 16999999
  2. stellar-horizon db reingest range --parallel-workers=20 17000000 <latest_ledger>

(Where you can find <latest_ledger> under SDF Horizon’s core_latest_ledger field.)

When saturating a database instance with 15K IOPS capacity:

(1) should take a few hours to complete.

(2) should take about 3 days to complete.

Although there is a retry mechanism, reingestion may fail half-way. Horizon will print the recommended range to use in order to restart it.

When reingestion is complete it’s worth running ANALYZE VERBOSE [table] on all tables to recalculate the stats. This should improve the query speed.

Monitoring reingestion process

This script should help monitor the reingestion process by printing the ledger subranges being reingested:

#!/bin/bash
echo "Current ledger ranges being reingested:"
echo
I=1
for S in $(ps aux | grep stellar-core | grep catchup | awk '{print $15}' | sort -n); do
    printf '%15s' $S
    if [ $(( I % 5 )) = 0 ]; then
      echo
    fi
    I=$(( I + 1))
done

Ideally we would be using Prometheus metrics for this, but they haven’t been implemented yet.

Here is an example run:

Current ledger ranges being reingested:
    99968/99968   199936/99968   299904/99968   399872/99968   499840/99968
   599808/99968   699776/99968   799744/99968   899712/99968   999680/99968
  1099648/99968  1199616/99968  1299584/99968  1399552/99968  1499520/99968
  1599488/99968  1699456/99968  1799424/99968  1899392/99968  1999360/99968
  2099328/99968  2199296/99968  2299264/99968  2399232/99968  2499200/99968
  2599168/99968  2699136/99968  2799104/99968  2899072/99968  2999040/99968
  3099008/99968  3198976/99968  3298944/99968  3398912/99968  3498880/99968
  3598848/99968  3698816/99968  3798784/99968  3898752/99968  3998720/99968
  4098688/99968  4198656/99968  4298624/99968  4398592/99968  4498560/99968
  4598528/99968  4698496/99968  4798464/99968  4898432/99968  4998400/99968
  5098368/99968  5198336/99968  5298304/99968  5398272/99968  5498240/99968
  5598208/99968  5698176/99968  5798144/99968  5898112/99968  5998080/99968
  6098048/99968  6198016/99968  6297984/99968  6397952/99968 17099967/99968
 17199935/99968 17299903/99968 17399871/99968 17499839/99968 17599807/99968
 17699775/99968 17799743/99968 17899711/99968 17999679/99968 18099647/99968
 18199615/99968 18299583/99968 18399551/99968 18499519/99968 18599487/99968
 18699455/99968 18799423/99968 18899391/99968 18999359/99968 19099327/99968
 19199295/99968 19299263/99968 19399231/99968

Reading Logs

In order to check the progress and status of ingestion you should check your logs regularly; all logs related to ingestion are tagged with service=ingest.

It starts with informing you about state ingestion:

INFO[...] Starting ingestion system from empty state...  pid=5965 service=ingest temp_set="*io.MemoryTempSet"
INFO[...] Reading from History Archive Snapshot          pid=5965 service=ingest ledger=25565887

During state ingestion, Horizon will log the number of processed entries every 100,000 entries (there are currently around 10M entries in the public network):

INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=100000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=200000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=300000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=400000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=500000 pid=5965 service=ingest

When state ingestion is finished, it will proceed to ledger ingestion starting from the next ledger after the checkpoint ledger (25565887+1 in this example) to update the state using transaction metadata:

INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=5400000 pid=5965 service=ingest
INFO[...] Processing entries from History Archive Snapshot  ledger=25565887 numEntries=5500000 pid=5965 service=ingest
INFO[...] Processed ledger                              ledger=25565887 pid=5965 service=ingest type=state_pipeline
INFO[...] Finished processing History Archive Snapshot  duration=2145.337575904 ledger=25565887 numEntries=5529931 pid=5965 service=ingest shutdown=false
INFO[...] Reading new ledger                            ledger=25565888 pid=5965 service=ingest
INFO[...] Processing ledger                             ledger=25565888 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger                              ledger=25565888 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger                    duration=0.086024492 ledger=25565888 pid=5965 service=ingest shutdown=false transactions=14
INFO[...] Reading new ledger                            ledger=25565889 pid=5965 service=ingest
INFO[...] Processing ledger                             ledger=25565889 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger                              ledger=25565889 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger                    duration=0.06619956 ledger=25565889 pid=5965 service=ingest shutdown=false transactions=29
INFO[...] Reading new ledger                            ledger=25565890 pid=5965 service=ingest
INFO[...] Processing ledger                             ledger=25565890 pid=5965 service=ingest type=ledger_pipeline updating_database=true
INFO[...] Processed ledger                              ledger=25565890 pid=5965 service=ingest type=ledger_pipeline
INFO[...] Finished processing ledger                    duration=0.071039012 ledger=25565890 pid=5965 service=ingest shutdown=false transactions=20

Managing Stale Historical Data

Horizon ingests ledger data from a managed, possibly-remote, pared-down Captive Stellar Core instance. In the event that Captive Core crashes, lags, or if Horizon stops ingesting data for any other reason, the view provided by Horizon will start to lag behind reality. For simpler applications, this may be fine, but in many cases this lag is unacceptable and the application should not continue operating until the lag is resolved.

To help applications that cannot tolerate lag, Horizon provides a configurable “staleness” threshold. If enough lag accumulates to surpass this threshold (expressed in number of ledgers), Horizon will only respond with an error: stale_history. To configure this option, use the --history-stale-threshold/HISTORY_STALE_THRESHOLD parameter.

Note: Non-historical requests (such as submitting transactions or checking account balances) will not error out if the staleness threshold is surpassed.

Last updated Jul. 27, 2021

Next Up: Monitoring
Page Outline