← All posts

Why We Chose a Hash Chain Over a Merkle Tree

When we designed the MFTPlus audit chain, Merkle trees were the obvious choice. They're elegant. They're used in Bitcoin, in certificate transparency logs, in distributed databases. If you're building anything with "cryptographic audit" in the requirements, the Merkle tree is the first thing you reach for.

We didn't use one.

The reason is specific: Merkle trees solve a problem that MFT audit doesn't have.

What Merkle trees are actually for

A Merkle tree lets you prove that a single leaf belongs to a large set without fetching the whole set. You get a root hash and a proof path — a small number of sibling hashes — and you can verify inclusion without downloading all the data.

This matters for Bitcoin light clients: you have a 80-byte block header and you want to verify a specific transaction is in that block without downloading the full 1MB. The Merkle proof is ~230 bytes. The same logic applies to certificate transparency logs, where browsers verify certificate inclusion without fetching the entire log.

That's the use case: sparse verification at scale. Prove one leaf without fetching the tree.

What MFT audit actually needs

When you run mftctl audit verify TXFR-2847, you fetch the full chain for that transfer. Not a proof path. Not a subset. The complete sequence of entries, from the genesis entry to the latest append.

There is no sparse verification problem in MFT audit. You're not trying to prove one transfer exists in a set of a billion. You're verifying that a specific chain of 30, 50, or 500 entries is internally consistent and unmodified. The verifier needs all of it.

A Merkle tree optimizes for a constraint that doesn't exist. What you get in return is significant complexity: tree construction, sibling proof generation, proof serialization, root hash management, tree reconstruction on verify. That complexity has to live in both the server (TypeScript) and the CLI verifier (Go) and stay identical across both.

The hash chain alternative

A linear SHA-256 hash chain is simpler by an order of magnitude. Each entry computes a hash over its own fields plus the previous entry's hash. The genesis entry uses a fixed 64-character zero string as its prevHash. That's the entire data structure.

Verification: fetch entries in sequence, recompute each hash, compare to stored value. If every hash matches, output PROVEN. If any hash doesn't match, output TAMPERED at seq N. The sequence number is the location of the tamper.

Merkle Tree Hash Chain
Verification output "Proof verified against root" PROVEN or TAMPERED at seq N
Sparse verification Yes (the point) Not needed for MFT audit
Implementation complexity Tree construction, sibling proofs, proof serialization 10 lines of canonical serialization + SHA-256
Cross-language verification Hard to keep identical Feasible: same field order, same SHA-256
Tamper localization Root hash changes; proof paths for each entry Exact sequence number of tamper

Why simpler matters for a verifiable system

The MFTPlus audit chain is verified by an independent CLI client written in Go. The server is TypeScript. Both implement the same canonical serialization: pipe-separated fields in a fixed order, the same genesis convention, SHA-256 throughout.

For a Merkle tree, keeping both implementations identical across two languages and two development timelines is a real maintenance burden. A bug in one sibling proof implementation produces silent divergence: the server says verified, the client rejects it, or vice versa. Debugging which implementation is wrong requires understanding the full tree construction.

For a hash chain: canonical serialization is a list of fields and a separator. SHA-256 is the same everywhere. If the implementations diverge, the verification fails on any entry where they do, and the mismatch is immediately visible.

Simpler is more auditable. Not just for us — for anyone who wants to implement their own verifier. The serialization format is documented. SHA-256 runs on every platform. An auditor can verify the chain with nothing but the spec and a SHA-256 implementation.

Two other properties worth noting

PII separation. The chain stores only opaque transfer references and content hashes. Sender email, recipient, filename, and IP address live in a separate AuditChainContext table that can be deleted independently. Deleting PII doesn't break the chain. GDPR right-to-erasure compliance doesn't require modifying the audit record.

Per-org advisory locking. PostgreSQL pg_advisory_xact_lock(hashtext(companyId)) serializes chain appends per organization. Concurrent transfers within one org append in order, no chain fork possible. Across organizations, appends run fully parallel. The lock is scoped to the org, not the table.

Boring is better for your audit trail

Merkle trees are the right tool for proving leaf inclusion in large sets. Certificate transparency needs them. Bitcoin needs them. MFT audit doesn't.

The hash chain gives you O(n) verification, a single unambiguous output, a 10-line implementation that any developer can audit, and cross-language verification that actually holds up. That's the right trade for what mftctl audit verify needs to do.

A
Armin Marxer

Building MFTPlus. Spent years managing file transfer infrastructure before deciding there had to be a better way.

FAQ

Why not use a Merkle tree for the audit chain?

Merkle trees optimize for sparse verification: proving a single leaf belongs to a large set without fetching the whole set. MFT audit always fetches the full chain for a transfer. The Merkle tree solves a constraint that doesn't exist in this use case, and adds significant implementation complexity in return.

How does tamper detection work with a hash chain?

Each entry's hash is computed over its own fields plus the previous entry's hash. Any modification to any entry changes that entry's hash, which cascades to every subsequent entry. The verifier recomputes all hashes in sequence and reports the exact sequence number where the first mismatch occurs.

Can I verify the chain without MFTPlus tooling?

Yes. The canonical serialization format is documented: pipe-separated fields in a fixed order, 64-character zero string as prevHash for the genesis entry, SHA-256 throughout. Any SHA-256 implementation can verify the chain given a chain export.

How does GDPR erasure work if names are in the chain?

PII (sender email, recipient, filename, IP) is stored in a separate AuditChainContext table, not in the chain entries themselves. The chain stores only opaque transfer references and content hashes. Deleting the context table satisfies right-to-erasure without modifying the chain.

Read the audit chain documentation

Audit chain docs →