Can you identify me from a crowd?

A modern “Where’s Waldo?” problem

How much data do you need to identify a single person?

A sample of your data

Does this sample uniquely identify you?

Probability P[all] that you are identifiable in a population of S=10⁶ with K bits of information.

In a population of S individuals, you become identifiable once there is at least K=1.5 log2(S) bits of information.

How realistic is this?

Back to Maths — a Compression Problem

Statistical Modelling and Probability Distributions

Conclusions

Suppose a service has collected a database of background data for its users. The service then makes an observation of user activity. How much background and observation data does it need to identify that user in a given population?

An excited researcher of life and everything. Associate Professor in Speech and Language Technology at Aalto University, Finland.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store