Bitcoin lightweight clients are wallets that do not download and store the whole blockchain locally. Currently (Oct 2015), the Bitcoin blockchain is about 45GB and growing. Downloading the whole blockchain onto a smart phone makes no sense. Satoshi envisioned lightweight clients in the original whitepaper [1]. The whitepaper introduced Simple Payment Verification – a way to verify payments without having to download the complete blockchain. A thin client (another name for a lightweight client) only downloads the block headers by connecting to a full node. Then it requests transactions matching its own addresses. To be clear a lightweight client sends its addresses to a full node requesting all the transaction that mach those addresses. The full node responds with those transactions and also the Merkle tree branch where the transactions appear. Recall that a block header includes the root of the Merkle tree of transactions included in that block. As a result the lightweight node can verify the transaction without needing the whole Merkle tree and the full contents of the block. Lightweight clients are implemented in bitcoinj [2]. As one can see, sending all of your addresses to a full node is not very desirable from a privacy point of view. To alleviate that concern BIP-37 [3] introduces “Connection Bloom filtering”.
Bloom Filters
A bloom filter [4] is a probabilistic data structure that allows to test whether a particular element is a member of a set. Bloom filters do not store the elements themselves, but instead they store patterns that match the elements. A bloom filter is a binary array, an array that stores in each of its cells either a zero or a one. It is initialized to all zeros (Fig. 1).
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Fig.1 An empty bloom filter
To add elements (technically patterns) we use a number of hash functions, through which we run the element. Each result from hashing is recorded as a one in the array. To make this concrete, let’s say that we have four hash functions, the results of hashing our first element through those functions are 0, 4, 7, 9. We set the array at indices 0, 4, 7, 9 to a one (Fig 2).
1 |
0 |
0 |
0 |
1 |
0 |
0 |
1 |
0 |
1 |
0 |
0 |
Fig.2 Adding a pattern to our bloom filter, indices 0, 4, 7, 9 are set to one
Let’s say when adding a second element, running it through the hash functions would yield 1, 4, 8, 9. Our array would then look like figure 3.
1 |
1 |
0 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
0 |
0 |
Fig.3 Adding a second pattern to our bloom filter, indices 1, 4, 8, 9 are set to one
We continue this for all other elements that we want to include. As we add more and more patterns, we may be creating matches for elements that we did not explicitly include in the filter. That is not a problem. In fact a bloom filter is designed so that it may give us false positives, that is if we ask is some element included in this filter, we may get Yes as an answer even if the element is not included. This is sometimes (as we will shortly see) a desirable property. On the other hand, bloom filters do not give false negatives. That is, if an element is included in a bloom filter querying the filter will never yield a no answer for an element that was included.
In our case a lightweight client adds all of its addresses to a bloom filter and sends them to a full node, requesting all transactions matching the addresses.
Next time we will take a look at the privacy implications of using bloom filters in SPV nodes.
References:
[1]Nakamoto, Satoshi. “Bitcoin: A peer-to-peer electronic cash system.” https://bitcoin.org/bitcoin.pdf
[2] https://en.bitcoin.it/wiki/Bitcoinj
[3] https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki
[4] https://en.wikipedia.org/wiki/Bloom_filter