Core Lightning node recovery on FreeBSD

Core Lightning node recovery on FreeBSD after unclean shutdown

This guide documents the recovery procedure for a Core Lightning (CLN) node running on FreeBSD after an unclean shutdown corrupted the gossip database and caused lightningd to reinitialise with a fresh identity. No funds were at risk as this was a test node, but the procedure applies equally to production nodes.


Symptoms

  • Node starts but shows a different public key than expected
  • num_peers: 0 and num_active_channels: 0 after restart
  • Connections to peers fail with general SOCKS server failure or peer closed connection
  • Log shows Creating database and created new hsm_secret file on startup — these lines only appear on first-ever initialisation
  • lightningd.sqlite3 is unexpectedly small (a few MB instead of several GB)

Root cause

The double-nested bitcoin/bitcoin/ directory was created by a previous manual invocation of lightningd with an incorrect --lightning-dir flag that pointed one level too deep into the data directory. CLN dutifully appended the network name bitcoin/ to whatever path it was given, creating the nested structure. When the node was later restarted via the rc script — which used the correct top-level path — CLN found that directory empty and created a fresh database and new hsm_secret, giving the node a completely new identity.


Recovery procedure

Step 1 — Verify Tor is running and healthy

CLN on a Tor-only node cannot connect to peers if Tor is down. Check both status and connectivity:

service tor status lightning
sockstat -l | grep tor
curl --socks5-hostname 127.0.0.1:9050 https://check.torproject.org/api/ip

The curl command should return {"IsTor":true,...}. If Tor is not running:

service tor start

Ensure it starts on boot by confirming /etc/rc.conf contains:

tor_enable="YES"

Step 2 — Locate all hsm_secret files and databases

find /var/db -name "hsm_secret" 2>/dev/null
find /var/db -name "lightningd.sqlite3" -ls 2>/dev/null

Compare the checksums of all hsm_secret files found:

sha256 /path/to/each/hsm_secret

Identify the correct database by size and modification date — the real one will be significantly larger and dated before the problematic reboot. The correct hsm_secret must match the database you intend to use.

Step 3 — Stop lightningd if running

lcli stop

Wait for the process to exit fully before proceeding.

Step 4 — Restore the correct database and hsm_secret

Back up the current (incorrect) database first, then replace it with the correct one:

cp /var/db/lightning-ssd/bitcoin/lightningd.sqlite3 \
   /var/db/lightning-ssd/bitcoin/lightningd.sqlite3.bak

cp /var/db/lightning-ssd/bitcoin/bitcoin/lightningd.sqlite3 \
   /var/db/lightning-ssd/bitcoin/lightningd.sqlite3

If the hsm_secret was also replaced, restore it from the known-good copy:

cp /path/to/correct/hsm_secret \
   /var/db/lightning-ssd/bitcoin/hsm_secret

The hsm_secret and database must be from the same node or the wallet sanity check will fail on startup.

Step 5 — Start lightningd and verify identity

service lightningd start
service lightningd status
lcli getinfo | grep -E "id|alias|num_peers"

Confirm the public key matches your original node identity. If you see Wallet sanity check failed, the hsm_secret and database are still mismatched — recheck step 4.

Step 6 — Manually connect to a peer to bootstrap gossip

After a recovery, the gossip store will be empty or minimal. CLN will eventually find peers on its own, but with always-use-proxy=true and an empty graph this can take a long time. Manually connect to a well-known reliable node to accelerate the process.

First fetch a current onion address for the target node (ACINQ in this example):

curl --socks5-hostname 127.0.0.1:9050 \
  https://mempool.space/api/v1/lightning/nodes/03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f \
  | grep sockets

Then connect using the .onion address returned:

lcli connect 03864ef025fde8fb587d989186ce6a4a186895ee44a926bfc370e2c366597a3f8f@<onion-address>:9735

Once connected, gossip will begin flowing. Monitor progress:

lcli getinfo | grep num_peers
lcli listchannels | grep -c short_channel_id

The channel count should climb over the following 30–60 minutes as the gossip graph is rebuilt.


Preventing recurrence

Set lightning-dir explicitly in config

Add this line to /usr/local/etc/lightningd-bitcoin.conf to prevent the double-nesting problem:

lightning-dir=/var/db/lightning-ssd

CLN will then append bitcoin/ to produce the correct path /var/db/lightning-ssd/bitcoin/.

Fix pid-file path

Ensure the pid-file line in your config is a plain path and not mangled by a markdown renderer (a common issue when copying config snippets from web pages or chat interfaces):

pid-file=/var/db/lightning-ssd/bitcoin/lightningd-bitcoin.pid

The file must be owned by the c-lightning user:

chown c-lightning:c-lightning /var/db/lightning-ssd/bitcoin/lightningd-bitcoin.pid

Never start lightningd manually with ad-hoc flags

This entire incident was triggered by a manual invocation of lightningd with an incorrect --lightning-dir flag. Always start, stop, and restart the node exclusively through the rc script:

service lightningd start
service lightningd stop
service lightningd restart

This guarantees that the options used at runtime are always identical to those in your config file. If you need to test a different configuration, use a completely separate instance with a clearly separate data directory — never override flags on a node that has real data.

Check rc.conf for lightning-dir override

grep lightning_dir /etc/rc.conf
cat /usr/local/etc/rc.d/lightningd

Ensure the rc script and config agree on the directory so they do not conflict after a reboot.


Notes on always-use-proxy

The always-use-proxy=true setting forces all outgoing connections through the Tor SOCKS proxy, including connections to clearnet peers. This protects your IP address but has tradeoffs: some clearnet peers actively reject connections from Tor exit nodes, and DNS seed bootstrapping via Tor is slower. For a privacy-focused production node the setting is appropriate. For a test node where peer connectivity matters more than IP privacy, consider removing it temporarily until the gossip graph is populated.