Can you identify me from a crowd?

A modern “Where’s Waldo?” problem

How much data do you need to identify a single person?

A sample of your data

Does this sample uniquely identify you?

Probability P[all] that you are identifiable in a population of S=10⁶ with K bits of information.

In a population of S individuals, you become identifiable once there is at least K=1.5 log2(S) bits of information.

How realistic is this?

Back to Maths — a Compression Problem

Statistical Modelling and Probability Distributions


Suppose a service has collected a database of background data for its users. The service then makes an observation of user activity. How much background and observation data does it need to identify that user in a given population?

