Skip to main content

UUID

Preparation

v1 - based on {timestamp} x {random} + {mac address}

v2 - failed version,

v3 - non random, {string} x {uuid}. Also known as "name and namespace". Determenistic! Based on an MD5 hash of the name and namespace.

v4 - totally random

v5 - non random, {string} x {uuid}. Also known as "name and namespace" Determenistic! Based on an SHA-1 hash of the name and namespace.

v6 - timestamp based, but fully backwards compatible with V1

v7 - timestamp based

v8 - general specification. should be timestamp based, but depending on implementation

v1, v2, v3, v5 - https://www.uuidtools.com/uuid-versions-explained

v6, v7, v8 - https://blog.scaledcode.com/blog/analyzing-new-unique-id/

Title

  • Why UUID?
  • Why FAANG uses UUID?
  • This is stupid

Script

  • When we start learning relational DBs, we know that there is a prepared solution for ID - serial ints.

  • DBs have already nice instrument for generating IDs - serial ints.

  • But when I started working with real systems I noticed, that quite often UUIDs were used for Primary Key.

  • Why? UUID has bigger size, it's hard to compare UUIDs by humans, they are randomly generated with low, but still real probability of collision?

  • Why not to pick simple, compact serial Integer which 100% will not have collisions?

  • reasons: scalability, handling unexpected errors

  • indeed, if you are creating site for you local library - ints work just fine!

  • If something goes wrong - users will repeat registering process

  • although, INTs will expose amount of users in your system

  • let's imagine, we are building system for transferring money from 1 account to another.

  • Transferring is a complex process, it takes lots of checks, so it may take up to several minutes to fully complete.

  • Thus, we can't do it while user's HTTP Request is happening. instead, we need to create asynchronous process.

  • It looks like this: HTTP request for transfer goes to the first frontal service, but instead of just starting transfer inside the original request, it just creates a job for money transfer. Second, money transfering service, responses to first service "Success, I've created a job for your transfer, here is ID which you can use for checking its status."

  • Ok, but that is happy path. What if second, transferring service does not respond to a request?

  • That can happen, it's possible to make a request to a node which starts failing, or garbage collection pause. Reason can be any. Ideally, we want to retry the request couple more time, to see if we can get to a healthy node which will take our request for creating transfer job.

  • but first service did not receive response from the second one not because the job was not created. Job for the transfer was actually created, but there was some issue on the response part. It may be troubles with network, or endpoint contract was changed without backwards compatibility, so first service couldn't parse the response.

  • as result, we created several dublicates of the same transfer. Instead of 100$ user just sent several thousand dollars.

  • to solve it - provide unique ID during the creation of the job.

  • we can't use serial int for the ID, because we may have several clients who make such requests

  • Thus, we need randomly generated ID, thus we use UUID

  • UUIDs are different, but really we need only 2 versions

  • there've been 1 very successful version - v4, but it is too random for DB.

  • B-trees

  • Thus use v7

  • In order to have better control over

Let's say, we are building money transfering service.

You are accepting requests, which contain information, how much money is being transfered, from which and to what account number. When we are receiving a request for a transfer, we would just create a new record in DB, and ID of the transaction would be big serial, so it would be created automatically. Success! Right?

But let's imagine situation, when you receive one transfer request. You accept it and start processing. But after 5 second you have received new request with identical information - amount of money, "from" and "to" accounts. What is happeing? Is it identical, but valid new transaction? Or it is dublicate, and it should be discarded?

How to solve this situation? Well, we could just ask requesting service to generate transaction ID on their side and include it in the request. Since we don't have

In order to understand this let's walk through history.

You want to automatically generate ID. You decide that ID is enough.

Personal experience

Why would you use SERIAL and BIGSEREAL

When you start being held by SERIAL IDs

  • When you need DB sharding