Backfill using JS UDF
This document outlines methods to extract required fields from the XDR of raw data.
Use Cases:
- Backfix for a bugfix
- Backfix for a new column added
- Temporary extraction of fields not parsed by stellar-etl
We'll take the example of extracting the fee_account_muxed
field from a transaction envelope (tx_meta
XDR). However, this method can be adapted to other fields as well. It is worth noting that most users will not need to standup and run their own Hubble. The Stellar Development Foundation provides public access to the data through the public datasets and tables in GCP BigQuery. Instructions on how to access this data can be found in the Connecting section.
We will use the js-stellar-base library to parse the XDR and employ JavaScript UDFs (User Defined Functions) in BigQuery to apply the transformation to the dataset.
For a deeper understanding, check out the Medium article on using NPM libraries in Google BigQuery UDFs.
Step 1: Setting Up JS UDF in BigQuery
To set up the JS UDF in BigQuery, follow these steps:
1. Clone the js-stellar-base
repository
First, clone the repository to your local machine and install the dependencies:
git clone https://github.com/stellar/js-stellar-base.git
cd js-stellar-base
yarn
yarn build:prod
This process will generate the following file in the js-stellar-base/dist/ directory:
- stellar-base.min.js
2. Upload the JS file to Google Cloud Storage (GCS)
Once the build process is completed, upload the stellar-base.min.js file to a Google Cloud Storage bucket to be used in the UDF.
Step 2: Writing the JavaScript Function
Here is an example JavaScript function to extract the fee_account_muxed field from the transaction envelope (tx_meta):
let tx_meta =
"AAAABQAAAQAAABYMYQ4r9W/uB9X6q6VU6feQhS2kQoRy9CjvwtYXdPRSih2hZeSSAAAAAAAAAZAAAAACAAAAAJwLL0Ul/CyRZdXuenmdXrzVyX9X56m4kYPYmgppVIj8AAAAZAAF9PwAAAABAAAAAAAAAAAAAAABAAAAAQAAAQAAAFj7+8N85JwLL0Ul/CyRZdXuenmdXrzVyX9X56m4kYPYmgppVIj8AAAAAAAAAADN5igtu93OKhkj2NrSHuPEJktU+0gJ0LiNavJirLAmRwAAAAAF9eEAAAAAAAAAAAFpVIj8AAAAQElnt70S4sGicHyhsN1S29DEREZ7i2HU96+8DfyshlFLCoQudDIxThnVEg2KQDrW61R19M7Ms9IAsznURc5y3wIAAAAAAAAAAaFl5JIAAABAIf9/ecA3id1mbHzJ2S9W5bRVqrjQr/c2+jHEuDNZevt3LDVSc+DmRMYie0eQ+vE7B3D+fRPb9yFzpfx4meTfBg==";
let txe = StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, "base64");
let tx = txe.feeBump();
let sourceAccount = StellarBase.encodeMuxedAccountToAddress(
tx.tx().feeSource(),
);
console.log(sourceAccount);
This script will output the fee_account_muxed
value: MBX64B6V7KV2KVHJ66IIKLNEIKCHF5BI57BNMF3U6RJIUHNBMXSJEAAACYGGCDRL6UFO2
Step 3: Wrapping the JavaScript Function as a UDF
CREATE TEMP FUNCTION getFeeBumpAccountIfExists(tx_meta STRING)
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://stellar-test-js-udf/stellar-base.min.js"] -- Path to JS library in GCS
)
AS r"""
return StellarBase.encodeMuxedAccountToAddress(
StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, 'base64')
.feeBump()
.tx()
.feeSource()
);
""";
WITH fee_bump_transactions AS (
SELECT batch_run_date, transaction_hash, tx_envelope AS tx_meta
FROM `test_crypto_stellar.history_transactions`
WHERE batch_run_date BETWEEN DATETIME("2024-07-01") AND DATETIME_ADD("2024-07-20", INTERVAL 1 MONTH)
AND inner_transaction_hash IS NOT NULL -- filter in fee bump transactions
),
calculated_fee_account AS (
SELECT batch_run_date, transaction_hash, getFeeBumpAccountIfExists(tx_meta) AS fee_account
FROM fee_bump_transactions
),
calculated_fee_muxed_account AS (
SELECT batch_run_date, transaction_hash, fee_account
FROM calculated_fee_account
WHERE fee_account LIKE 'M%' -- muxed accounts
)
SELECT batch_run_date, transaction_hash, fee_account AS fee_account_muxed
FROM calculated_fee_muxed_account
Sample Output for the JS UDF
After running the above query, you should receive output similar to the following:
Row | transaction_hash | fee_account_muxed |
---|---|---|
1 | f5f5b0aaf758896ef8c5b4807f41c77d15c11977eecf2b0e4769d777324a2d11 | MCBD54KAHHA4AK4DOZWOSX5O5OZ4OI54N24QITDSFLPD7EG2WY2AMAAACYGGCDRL6UBUA |
2 | a9e49dff6202663633b83f3645fbf8c2cfeb915db99b2b884a86791b9f8eae2f | MBX64B6V7KV2KVHJ66IIKLNEIKCHF5BI57BNMF3U6RJIUHNBMXSJEAAACYGGCDRL6UFO2 |
3 | 00dba50c8689477e6990103338a0eb326725e07a7b7ff187359abf11c23c582a | MC5BEU3DCIMHOHRQDVDAPEPZGMBBALPJ3IQY23VTXC3454SQMNWVSAAACYGGCDRL6UX42 |
4 | 2e1c53a9fe1d48ddc493febe467178994e669e3eebf3a4cca646b3cb666616de | MAMYAUW45TC54C3QORQP7OOFYKOXCJTXOG2WIV5LP2HDMR67MWP6IAAACYGGCDRL6VCZM |
Step 4: Updating Column Values Using UDF
You can also use this UDF to update values in a BigQuery table. Here’s an example of how to do this:
CREATE TEMP FUNCTION getFeeBumpAccountIfExists(tx_meta STRING)
RETURNS STRING
LANGUAGE js
OPTIONS (
library=["gs://stellar-test-js-udf/stellar-base.min.js"]
)
AS r"""
let txe = StellarBase.xdr.TransactionEnvelope.fromXDR(tx_meta, 'base64');
let tx = txe.feeBump();
let sourceAccount = StellarBase.encodeMuxedAccountToAddress(tx.tx().feeSource());
return sourceAccount
""";
MERGE `crypto_stellar.history_transactions` AS target
USING (
WITH fee_bump_transactions AS (
SELECT batch_run_date, transaction_hash, tx_envelope AS tx_meta
FROM `crypto_stellar.history_transactions`
WHERE batch_run_date > '2020-08-03'
AND batch_run_date < '2020-08-05'
AND inner_transaction_hash IS NOT NULL
),
calculated_fee_account AS (
SELECT batch_run_date, transaction_hash, getFeeBumpAccountIfExists(tx_meta) AS fee_account
FROM fee_bump_transactions
),
calculated_fee_muxed_account AS (
SELECT batch_run_date, transaction_hash, fee_account
FROM calculated_fee_account
WHERE fee_account LIKE 'M%' -- muxed accounts
)
SELECT batch_run_date, transaction_hash, fee_account AS fee_account_muxed
FROM calculated_fee_muxed_account
) AS source
ON target.batch_run_date = source.batch_run_date
AND target.transaction_hash = source.transaction_hash
WHEN MATCHED THEN
UPDATE SET target.fee_account_muxed = source.fee_account_muxed;
This query will update the fee_account_muxed
field in the crypto_stellar.history_transactions table using values calculated from the UDF.
Conclusion
By following these steps, you can effectively extract and manipulate XDR data fields in BigQuery using JavaScript UDFs, making it easier to process and analyze your Stellar network data. You can also apply bugfix for existing dataset.