How computers generate (not really) random data ft. pseudo-random generators and the quantum band.
If computers couldn’t generate random data we’d be screwed. There’d be no security of communications, no privacy and no data security. But how can a logical (deterministic, if you know what I mean) machine like a computer generate random data?
It can’t. That’s why we have pseudo-random generators (PRG). A PRG takes random data as input and generates more “random” data. It’s like a Petri dish, at the perfect conditions for life. You put five bacteria, you find 500 the next day. PRGs won’t infect you, and will only generate a set number of pseudorandom numbers before they become predictable.
Note: I’ll put on my 🎩 when I talk about technicalities.
Another note: By string, I mean a sequence of 0s and 1s.
🎩 What is a PRG?
A Pseudo-Random Generator, denoted by G is a function that maps an s-bit string (seed, this should be random) to an n-bit string (output, this is the generated string).
n is also much larger than s (seed is much shorter than the generated random data).
Hat off.
A PRG is like a machine that takes random numbers and outputs more artificially random numbers with really high efficiency (takes in one number, outputs a gigabyte of random data).
By artificially random I mean computer-generated-not-really-random data.
Numbers for everything
Numbers, numbers, numbers. I can hear you saying, how can you use them to encrypt the important stuff, like your messaging history with your mom. All text, video, basically anything that you can put inside a computer is represented by numbers. For text, there are standards for conversion. ASCII is like the old man in the room, supporting only 128 characters. UTF-8, on the other hand, supports multiple alphabets and even emojis 🚀.
Following is an ASCII table, the decimals correspond to the number representing the letter.
Cryptographic Randomness
A cryptographically random PRG can be safely used in a cipher (to encrypt something) as a random number generator. To achieve this it should be
- indistinguishable from a random string,
- unpredictable.
In the following section I’ll explain what these mean in detail.
Statistical Testing
That out of the way let’s get into the fun stuff. How do we determine how random some data is? Please welcome to the stage, statistical tests 🎉.
Think of a real random sequence of 0s and 1s. It has certain statistical properties, right? For example, its 0 to 1 ratio is roughly one (there are about as many 0s as 1s).
These properties are used to test how distinguishable the generated string is from a random one. The generated string should be indistinguishable from a random string.
This is the first requirement of cryptographic randomness.
Predictability
Given the first some number of bits, there must not exist an algorithm that will compute (in reasonable time) the next bit.
🎩Example
Consider 01 010011 010011000111 01001100011100001111 …
You start with 01, then add 0011 then add 000111, increase the number of repetitions every round.
1st round 01
2nd round 01 0011
3rd round 01 0011 000111
This satisfies the statistical tests, but it is very predictable. A generator that outputs this would be useless in cryptography. Because any attacker could predict the pseudorandom numbers which would allow them to break the encryption.
…
Hat off.
A random string should not be predictable. Here’s what this means: given the first n -1 bits, you should not be able to predict the nth bit.
This is the second requirement of cryptographic randomness.
The king of random: quantum computers
Disclaimer: I’m not an expert on quantum computers, merely an amateur.
You’ve probably heard that a qubit could be between two states: 0 and 1. This is called quantum superposition. When you measure which one it is, it collapses into one state, either 0 or 1. Here comes the good part: RANDOMLY.
This is because quantum computers have grey areas, not only black and white. Meaning quantum computers are not completely deterministic machines.
So generating a 50 (qu)bit string of randomness is as easy as putting qubits in superposition and measuring their state.
I can hear you saying “well if it’s easy then do it!”
Experiment
Note: I’ve used IBM Q for all of this.
Here’s my plan: I’ll put as many qubits as I can in superposition as possible, measure their state and go over this cycle a few times.
In the IBM Q Circuit Composer, there are horizontal lines, which represent qubits and there are gates that are located on the lines. The positions (left-right) of these gates determine when they’ll be applied to the qubit.
There are two components in this circuit: the blue H and the speedy z.
The Blue H
This is what puts the qubits into superposition.
The Speedy z
This is what measures the state of the qubit.
Expected result
There are 2⁵ = 32 possible configurations of 5 bits, since every qubit will be one of 2 states when they collapse. The experiment will run 8192 times in IBM Q’s quantum computer for higher statistical accuracy. I am expecting about 100/32 = 3.125 probability for each of the 32 configurations. Let’s see what happens.
Result
Soo, the average is as I expected, although a bit fuzzy. Thus, I conclude that the random number generation algorithm I have implemented above does work.
It’s been a fun time writing this. I hope you had as much if not more excitement reading ;)
Key Takeaways
- Classical computers can’t generate random data, they just create it artificially from random data.
- There are 2 requirements for a PRG to be considered cryptographically random: it should be indistinguishable from a random string and should be unpredictable.
- Quantum computers will completely replace PRGs for cryptographically random data generation. You’ll probably still see PRGs in games and programming tutorials though.