Skip to main content

Backfill using JS UDF

This document outlines methods to extract required fields from the XDR of raw data.

Use Cases:

  • Backfix for a bugfix
  • Backfix for a new column added
  • Temporary extraction of fields not parsed by stellar-etl

We'll take the example of extracting the fee_account_muxed field from a transaction envelope (tx_meta XDR). However, this method can be adapted to other fields as well. It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the Connecting section.

We will use the js-stellar-base library to parse the XDR and employ JavaScript UDFs (User Defined Functions) in BigQuery to apply the transformation to the dataset.

For a deeper understanding, check out the Medium article on using NPM libraries in Google BigQuery UDFs.

Step 1: Setting Up JS UDF in BigQuery

To set up the JS UDF in BigQuery, follow these steps:

1. Clone the js-stellar-base repository

First, clone the repository to your local machine and install the dependencies:

git clone https://github.com/stellar/js-stellar-base.git
cd js-stellar-base
yarn
yarn build:prod

This process will generate the following file in the js-stellar-base/dist/ directory:

  • stellar-base.min.js

2. Upload the JS file to Google Cloud Storage (GCS)

Once the build process is completed, upload the stellar-base.min.js file to a Google Cloud Storage bucket to be used in the UDF.

Step 2: Writing the JavaScript Function

Here is an example JavaScript function to extract the fee_account_muxed field from the transaction envelope (tx_meta):

let tx_meta =
"AAAABQAAAQAAABYMYQ4r9W/uB9X6q6VU6feQhS2kQoRy9CjvwtYXdPRSih2hZeSSAAAAAAAAAZAAAAACAAAAAJwLL0Ul/CyRZdXuenmdXrzVyX9X56m4kYPYmgppVIj8AAAAZAAF9PwAAAABAAAAAAAAAAAAAAABAAAAAQAAAQAAAFj7+8N85JwLL0Ul/CyRZdXuenmdXrzVyX9X56m4kYPYmgppVIj8AAAAAAAAAADN5igtu93OKhkj2NrSHuPEJktU+0gJ0LiNavJirLAmRwAAAAAF9eEAAAAAAAAAAAFpVIj8AAAAQElnt70S4sGicHyhsN1S29DEREZ7i2HU96+8DfyshlFLCoQudDIxThnVEg2KQDrW61R19M7Ms9IAsznURc5y3wIAAAAAAAAAAaFl5JIAAABAIf9/ecA3id1mbHzJ2S9W5bRVqrjQr/c2+jHEuDNZevt3LDVSc+DmRMYie0eQ+vE7B3D+fRPb9yFzpfx4meTfBg==";

let txe = StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, "base64");
let tx = txe.feeBump();
let sourceAccount = StellarBase.encodeMuxedAccountToAddress(
tx.tx().feeSource(),
);
console.log(sourceAccount);

This script will output the fee_account_muxed value: MBX64B6V7KV2KVHJ66IIKLNEIKCHF5BI57BNMF3U6RJIUHNBMXSJEAAACYGGCDRL6UFO2

Step 3: Wrapping the JavaScript Function as a UDF

CREATE TEMP FUNCTION getFeeBumpAccountIfExists(tx_meta STRING)
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://stellar-test-js-udf/stellar-base.min.js"] -- Path to JS library in GCS
)
AS r"""
return StellarBase.encodeMuxedAccountToAddress(
StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, 'base64')
.feeBump()
.tx()
.feeSource()
);
""";

WITH fee_bump_transactions AS (
SELECT batch_run_date, transaction_hash, tx_envelope AS tx_meta
FROM `test_crypto_stellar.history_transactions`
WHERE batch_run_date BETWEEN DATETIME("2024-07-01") AND DATETIME_ADD("2024-07-20", INTERVAL 1 MONTH)
AND inner_transaction_hash IS NOT NULL -- filter in fee bump transactions
),
calculated_fee_account AS (
SELECT batch_run_date, transaction_hash, getFeeBumpAccountIfExists(tx_meta) AS fee_account
FROM fee_bump_transactions
),
calculated_fee_muxed_account AS (
SELECT batch_run_date, transaction_hash, fee_account
FROM calculated_fee_account
WHERE fee_account LIKE 'M%' -- muxed accounts
)
SELECT batch_run_date, transaction_hash, fee_account AS fee_account_muxed
FROM calculated_fee_muxed_account

Sample Output for the JS UDF

After running the above query, you should receive output similar to the following:

Rowtransaction_hashfee_account_muxed
1f5f5b0aaf758896ef8c5b4807f41c77d15c11977eecf2b0e4769d777324a2d11MCBD54KAHHA4AK4DOZWOSX5O5OZ4OI54N24QITDSFLPD7EG2WY2AMAAACYGGCDRL6UBUA
2a9e49dff6202663633b83f3645fbf8c2cfeb915db99b2b884a86791b9f8eae2fMBX64B6V7KV2KVHJ66IIKLNEIKCHF5BI57BNMF3U6RJIUHNBMXSJEAAACYGGCDRL6UFO2
300dba50c8689477e6990103338a0eb326725e07a7b7ff187359abf11c23c582aMC5BEU3DCIMHOHRQDVDAPEPZGMBBALPJ3IQY23VTXC3454SQMNWVSAAACYGGCDRL6UX42
42e1c53a9fe1d48ddc493febe467178994e669e3eebf3a4cca646b3cb666616deMAMYAUW45TC54C3QORQP7OOFYKOXCJTXOG2WIV5LP2HDMR67MWP6IAAACYGGCDRL6VCZM

Step 4: Updating Column Values Using UDF

You can also use this UDF to update values in a BigQuery table. Here’s an example of how to do this:

CREATE TEMP FUNCTION getFeeBumpAccountIfExists(tx_meta STRING)
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://stellar-test-js-udf/stellar-base.min.js"]
)
AS r"""
let txe = StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, 'base64');
let tx = txe.feeBump();
let sourceAccount = StellarBase.encodeMuxedAccountToAddress(tx.tx().feeSource());
return sourceAccount
""";

MERGE `crypto_stellar.history_transactions` AS target
USING (
WITH fee_bump_transactions AS (
SELECT batch_run_date, transaction_hash, tx_envelope AS tx_meta
FROM `crypto_stellar.history_transactions`
WHERE batch_run_date > '2020-08-03'
AND batch_run_date < '2020-08-05'
AND inner_transaction_hash IS NOT NULL
),
calculated_fee_account AS (
SELECT batch_run_date, transaction_hash, getFeeBumpAccountIfExists(tx_meta) AS fee_account
FROM fee_bump_transactions
),
calculated_fee_muxed_account AS (
SELECT batch_run_date, transaction_hash, fee_account
FROM calculated_fee_account
WHERE fee_account LIKE 'M%' -- muxed accounts
)
SELECT batch_run_date, transaction_hash, fee_account AS fee_account_muxed
FROM calculated_fee_muxed_account
) AS source
ON target.batch_run_date = source.batch_run_date
AND target.transaction_hash = source.transaction_hash
WHEN MATCHED THEN
UPDATE SET target.fee_account_muxed = source.fee_account_muxed;

This query will update the fee_account_muxed field in the crypto_stellar.history_transactions table using values calculated from the UDF.

Conclusion

By following these steps, you can effectively extract and manipulate XDR data fields in BigQuery using JavaScript UDFs, making it easier to process and analyze your Stellar network data. You can also apply bugfix for existing dataset.