Fix/bitcoin block null check#9003
Conversation
Commit 40dd780 switched from pull_bitcoin_tx to pull_bitcoin_tx_only but did not add a null check, causing a segfault (FATAL SIGNAL 11) if pull_bitcoin_tx_only returns NULL due to a malformed block response from bitcoind. Add null check consistent with existing patterns in the codebase and with pull_bitcoin_tx itself. Fixes: 40dd780 bitcoin_block_from_hex: avoid creating PSBT wrappers for finalized block txs
a8528ac to
38c5e56
Compare
|
Thanks @Schnema1, that's great work hunting down the issue. I'm afraid however the resolution is a bit more complicated, and could involve upgrading the libwally library, to not fail when decoding the block. Your current code returns lightning/lightningd/bitcoind.c Lines 458 to 461 in b7c05f6 This in turn calls |
|
Ok, you are the pro regarding the coding and it's next steps. I am not 100% sure it was always the same block causing the error. But I am pretty sure it was. I am sorry, but most logs are deleted as my directory was full of crash logs. Here some hints I found during my research: After the AI found the SIGNAL 11 error, I added the suggested and recompiled, the SIGNAL 6 error appeared. It apparently accelerated the troubleshooting. After suggesting some smart grep commands we arrived here: After digging more: AI: Check the block stats which use a different code path This then returned: Check the block stats which use a different code path `bitcoin-cli getblockchaininfo | grep -E "pruned|pruneheight|prune" bitcoin-cli getblockchaininfo | grep -E "blocks|headers|pruneheight" bitcoin-cli getblockfrompeer 0000000000000000000151b3a6e293f443602e1ad770b3578feeffd1d6eb8fe9 0 So the block is supposedly downloaded but not on disk — that's a corrupted or incomplete bitcoind block database. This is the real root cause of everything. Nailed down error: AI: There it is! The block data at file 5066, position 63528873 is corrupted on disk. This is the root cause of everything — not a CLN bug at all (well, the missing null check is still a real bug, but it's not your actual problem). Interesting enough, I did not touch the bitcoin chain. The blocks directory has a symlink to the sata drive. This one was not touched during OS cloning If the fix really causes trouble elsewhere, this is not good. But at least after that we got SIGNAL 6, leading to the solution. Take this with a grain of salt as it is really out of my knowledge. I could follow the AI suggestions, and learned a lot during this process. Let me know if you need more information. |
Problem described in #9002 Found during investigation of #8973
Commit 40dd780 switched from pull_bitcoin_tx to pull_bitcoin_tx_only
but did not add a null check, causing a segfault (FATAL SIGNAL 11) if
pull_bitcoin_tx_only returns NULL due to a malformed block response
from bitcoind.
Add null check consistent with existing patterns in the codebase and
with pull_bitcoin_tx itself.
This commit adds the corrections.
Please review as AI found this fix.
This is my first ever commit, please excuse errors and feel free to edit.
Checklist
Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:
tools/lightning-downgrade