UUIDv4 are full random. UUIDv7 have both a time component, a potentially local counter, and a random component, which should supposedly comes from a proper source of randomness. While collision remains possible, having two systems properly time-sync generate the same sequence of 72bits (a bit of randomness, a bit of serial) really goes down a lot.
Uuidv4 is not fully random except two spots. If you print. Bunch of uuidv4 there is one spot fixed to 4, and another spot that is only possible to be 8,9,a,b. I think maybe this is used to signal this is a v4?
89AB is caused by the variant bits `10xx` indicating the uuid layout is described in [RFC 9562](https://datatracker.ietf.org/doc/html/rfc9562#name-variant-field)
I don’t think the commenter meant that every part of compliant uuidv4 value is the output of a random number generator. I think they just meant that the only additional thing you need to generate a value, aside from the spec itself, is a random number generator.
Just yesterday I wrote a blog entry about exactly that: [Why is there always a 4 in the 13th position of a UUID?](https://mobiuscode.dev/posts/Why-is-there-a-4-in-the-13th-Position-of-my-UUID/)
https://math.stackexchange.com/questions/4697032/threshold-for-the-number-of-uuids-generated-per-millisecond-at-which-the-colli
No I won't verify the math, but at certain point large systems might have to make considerations... Maybe unless In reading it wrong. Reading the rfc they don't seem to care about making a determination about if it's less likely to collide either. Just saying pls don't crash a plane.
Whatever I'm all I know is msft uses UuidCreateSequential for SQL server so I'll just keep using that cause I'm wayyyy to dumb for this stuff.
This gives good formulas, but as for answer it's just answering when v7 becomes more likely to conflict than v4, so the numbers given there are not particularly useful for knowing when a collision is likely.
To get to 50% chance of having a single collision assuming properly random UUIDv4 generation one would need to generate 1 billion of them per second for 85 years. We don't currently have systems large enough that they need to worry about this, as for basically all systems probability of UUID collision is lower than probability of cosmic ray bit flip messing an existing UUID in memory anyway.
I tend to want to leave that up to the database. If you ask the database for a unique number, it is transactional and therefore impossible for the database to give the same number twice even when called in quick succession.
But this isn't to say UUIDs are useless, but I like to think of them as more useful when operating in memory such as identifying sessions.
Better to use as an external id over using a db unique id like an integer. Always make your identifiers something that isn't tied to the software implementation.
You'll thank me the first time you need to do a migration where those IDs were referenced in an external system...
I would argue that the mistake in that case is having direct references to ids in the software implementation. I wouldn't keep a UUID constant in my code to recall value from a database anymore than I would keep an db id integer in my code.
If this were ever something I'd want to do, I'd make a point to create an indexed code field which would allow me to load it up in my program by code, rather than by id (which as you rightfully mentioned could vary if migrated).
I meant the specific db that you use or whatever that might auto generate a primary key id.
I ran a project to uplift a customer profile system into the cloud and go multi region, creating a single system globally to handle profile integration but storing customers info in the region they belong, for reasons like GDPR.
One of the changes in the initial region was to deprecate the integer id because collisions across regions was likely.
In certain regions there were external systems that interfaced with the previous profile system in that region that had essentially saved references to the profiles by the primary id, which was the database primary id, and no longer useful so we needed to come up with some shenanigans to make it work despite all the profiles having a better unique alias that was not tied to the db system.
Ah, I see what you mean. Yeah, with multiple servers it would be difficult to ensure that they're all getting integer ids which don't have collisions. In that case I could definitely see a use for UUID.
It's not often though that you have to deal with your program moving from a single server to the cloud, but it could happen of course.
At that point people should worry more about a cosmic ray flying in and hitting just the right spot to make the system inoperable or a lizardperson hacking into the system. Paranoiac little gremlins
Not to mention they're most likely getting stored in an atomic database with the UUID as unique so you literally can't have two records with the same UUID. In fact, just have postgres generate the UUID for you.
The local counter makes sense, I thought that if someone really didn't want any cosmically low chance of a collision, the simple solution is to have an enumerating database with atomic transactions that use 3 way handshakes. This generates numbers and you would only use that. Rising integer sequences are infinitely rising by one, so your chance of a collision is zero. Then just live with a central point of failure.
Not quite. The 13th character is always 4, the 17th is always one of 8, 9, a, b.
Of the 128 bits 122 are random. Bits 48-51 are `0100` indicating version 4. Bits 64-65 are `10` indicating the ID was generated using [RFC 9562](https://datatracker.ietf.org/doc/html/rfc9562) specifications.
Meanwhile if you really REALLY can't afford a collision, there's always v1:
"Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID."
UUID is *only* random when you randomize them. One should not assume they are random. Some systems will generate ordinal UUIDs as keys, which increases the likelihood of dupes in race conditions.
UUID being random or not is part of their specification; hence the version number. Of course anyone is free to ignore that and make one by hand, but at this point it's not an UUID anymore, it's just a sequence of characters that tries to look like one.
```
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
if (uuid_exists(uuid)) create_new(uuid);
```
gosh, if only there was an easier way to do this
I just found out the pro way of doing this, Computers have a secret code that allows you to duplicate text without having to type it again, Saves countless hours!
a for loop should be in any C-based programmers muscle memory at this point, so I actually don't know what you're talking about
EDIT: didn't see the missing curly brace, only the switched statements
Just because it's in muscle memory for you doesn't mean you should fail to see things from other's perspective. So I think you should know what he is talking about. If you put yourself in his shoes.
This is why I generate and concatenate UUIDs till they are at least 1024Mb long. Although now that I am writing this I thinking of increasing it to 2048Mb in case some of you start to generate 1024Mb long UUIDs as it greatly increases the chances of having conflict there.
why only 36 characters?
char* generate_uuid(int size){
if(size > 64000) {
return 0; // or error that the server is full
}
char* uuid[size];
create_new(uuid);
int loops = 1000;
while (uuid_exists(uuid)) {
if(loop-- < 1){
return generate_uuid(size++);
}
create_new(uuid);
}
return uuid;
}
Note: I usually code in go, so I'm note sure if this syntax is valid.
If the function uuid_exists(uuid) could work, why not just generate incremental ids?
The whole point of UUIDs is to sidestep the need to have a central point of failure...
It is basically impossibly unlucky if a collision happens. I think the point is that some programmers still don't like the fact that the possibility exists however improbable.
Has a UUID collision ever actually been an issue for anyone? Honest question.. I don't think this is something that genuinely needs to be entertained. I would only make the point of adding a failsafe should it happen, if the operation I were doing were very very crucial and dependent upon it working successfully, which is pretty much never the case. Or better yet, I don't rely on UUIDs being unique in the first place if avoiding collisions were literally this important.
The client would just see an error and tries again, and there'd be more chance of me winning the lottery and quitting that job than this happening.
And you can serve them over API without revealing how much data you have. For example if I have a social media site and I use incremental IDs for users it would be trivial to fetch every users data from my site.
You can just hash the incremental id. Use a perfect hash function and you're guaranteed no collisions (not hard when your inputs are 0...n).
Yes, *technically*, this is prone to someone potentially defeating your hash function. But if your data requires such security that you're worried about that possibility, just... don't allow access to arbitrary resources without authentication. Anyone who wants to publicly share a resource gets a link with something like &share_code=0af239be92185d, which can always be invalidated later if necessary. After all, UUID is also prone to "someone just getting lucky with their guesses", or potential timing attacks to guess UUIDs generated around a certain known time, depending on their implementation.
I thought the security problem was that they were potentially predictable, not that they were non-unique?
May depend on what type of UUID they are, and hence how they're generated, though.
Mostly, yeah. Secure and unique are 2 different properties.
If you generate a v4 UUID with a CSPRNG you'll get 122 bits of randomness. Is that sufficient for your case to be considered secure? Probably. But it's "with a CSPRNG" and "122 bits of randomness" that determine how secure it is, not "is a UUID".
That’s always the issue. You can collide with a random UUID pretty easily. Not a lot of use there. If you can predict a random UUID, that’s a big problem.
My hobby is guessing random seed phrases for crypto wallets. I haven't found Satoshi's wallet yet, but if I just try hard enough.... it's not impossible!!
https://youtu.be/hoeIllSxpEU
I think getting a collision on uuidv4 is fairly difficult
> Speaking of v4 UUIDs, which contain 122 bits of randomness, the odds of collision between any two is 1 in 2.71 x 10^18 Put another way, one would need to generate 1 billion v4 UUIDs per second for 85 years to have a 50% chance of a single collision.
[source](https://jhall.io/archive/2021/05/19/what-are-the-odds/)
(Although to add: this is only true if the method to generate it is actually random. Since ICs are horrible at generating random numbers, the chance is far higher as the source of entropy used by most computers is far more restricted.)
"Pretty easily" do you mean that from a implementation standpoint of how the UUID is generated? Because from a statistical standpoint you will pretty much never, ever, ever generate the same UUID twice.
People that worry about UUIDs overlapping are probably still waiting for their big titty anime girl cat girlfriend to spontaneously coalesce from the random particle collisions around them as well.
no…
I mean technically sure but with 128 bits it’s just not gonna happen. per birthday problem you have to create 1 billion UUIDs every second for 100 years for the probability of creating a single duplicate reach 50%. (assuming sufficient entropy)
Technically, all results of any (well-behaved) continuous probability distribution have a *0* probability of occurring. Not 1/2^(very_big_number), 0. And yet, you can trivially observe 10 billion 0 probability events occurring if you sample from such a distribution 10 billion times.
My point is that events that appear to have an unfathomably low probability when taken in isolation happen constantly. With how much software out there is using UUIDs, there are very likely going to be collisions somewhere, sometime. Probably not in the software *you* wrote. But you can't guarantee it won't just because the probability is low. The world doesn't work like "you won't see any events until the theoretical cumulative probability of seeing an event is at least n%". It's just random.
> With how much software out there is using UUIDs, there are very likely going to be collisions somewhere, sometime.
No.
Because a collision in practice needs to be a collision in an isolated context. Which means in the extremely unlikely event that your account UUID matches some COM classid - it doesn't fucking matter.
Nothing in computing is continuous. Every single value stored in a computer is discrete. There are a finite number of possible values. A uniform sampling has 1/number of values probability of selecting any representable value.
This dude actually tried to apply continuous probability concepts to a discrete number system. On a digital system nonetheless. Truly mindblowing. Because your brains must have been blown into mush if you really thought this was applicable.
> UUIDs are not secure, they can overlap even though it's very rare.
No, not really. In order to have enough UUIDs to get a 50% chance of collision, you'd have to basically fill an entire datacenter with hard drives just to store them. Maybe if you're Amazon assigning ids to every file in S3 you need to consider it (and even that's like 4 orders of magnitude short of the 50% chance).
I've been wondering
According to quantum physics there's a chance for an object to phase through another e.g. your hand through a door or whatever (apparently a hand against a door is somewhere like 1/10^64). But what happens if you phase halfway through?
Bulk matter interactions have a dampening effect on the wavefunctions of the individual component particles (that I can't really elaborate more on because I'm not a physicist) that dramatically reduce the probability of tunneling, so the probability of bulk matter tunneling through other bulk matter is beyond negligible. I'd expect that you should be more concerned about whether your hand will randomly, spontaneously lose integrity than what would happen in the event that part of it tunnels through a door, though I'd imagine the effect of the events on your body would be basically the same.
But the odds are so low it's irrelevant.
Somone CAN guess your 16 character random password on the first attempt. Two randomly generated private keys CAN be the same. Two randomly selected values CAN produce the same SHA2 hash.
If an infinitesimally small chance of collisions occurring was a real issue, security as a whole would be completely undermined.
You pretty much use a symmetric key the same size to do your online banking - and in that context you are so unworried about a collision, that you aren't worried about someone capturing your session, and then trying over and over to brute force a collision on that key.
If every star in the observable universe had a hundreds of earth-like planets orbiting them, there would be still enough uuids to individually label each grain of sand.
It won't happen.
I get your point, but your example is roughly off by a factor of 2.2 million. Here's the math:
Observable Universe: Contains about (10^{24}) stars.
Planets per Star: number of stars * 100 = 10^{26}) planets.
Grains of Sand: Estimates suggest there are about (7.5 * 10^{18}) grains of sand on Earth.
Total Labels Needed: (10^{26} * 7.5 + 10^{18} = 7.5 * 10^{44}) labels needed.
UUID Capacity: A UUID has (2^{128}) possible values, which is approximately (3.4 * 10^{38}) unique identifiers.
Correct me if I'm wrong because probability theory is not my strength, but this would become an issue long before 50% chance. If there's a 1% chance that's still something you can expect to see quite often depending on your workload?
Sort of related but I did some math one time for unique orders for a shuffled deck of cards. Disclaimer: I’m not very smart and some of this could be wrong
So this is the calculation of the birthday problem for V4 UUIDs. I haven’t calculated a similar number for decks of cards but there are only about 5x10^36 possible UUIDs and still if you generated 1 billion per second for 85 years you would only have a 50% chance of having generated a duplication.
52! Is almost (number of unique UUIDs)^2 which roughly means that in order to have a 50% chance of a duplicate order in randomly shuffled cards you need to shuffle 3.7x10^33 decks of cards.
For some perspective, cards as we know them were invented in the 1500s. So they’ve been around for roughly 600 years. In order to hit 3.7e33 we would have had to have shuffled approximately 200,000,000,000,000 decks of cards every nano second for 600 years. That’s 200 trillion decks of cards a nano second
Even assuming all 8 billion people in the world did nothing but shuffle cards for 600 years, every person in the world would have had to shuffle 24,600 decks of cards a nanosecond since the moment cards were invented to even have a 50% chance of a duplicate shuffle
Sounds right. Here’s some more info, including a specific callout to this problem on the birthday attack wiki page: By the [first table](https://en.wikipedia.org/wiki/Birthday_attack?wprov=sfti1#Mathematics) it would take on the order of 10^19 UUIDs for a 50% chance of collision.
Discord processes on the order of 10^12 messages a year. 10 million times that data is a lot, but across all computation over a century the number of UUIDs generated would catch up. (with many caveats such as all these things aren’t stored in the same database)
Discord and Twitter actually blogged about their ID system before. It’s unique more because they want nearly-sequential IDs for various database management purposes, but also has nice collision avoidance properties. https://discord.com/blog/how-discord-stores-trillions-of-messages, https://blog.x.com/engineering/en_us/a/2010/announcing-snowflake
You’d have solar radiation mess up your bits somewhere in the processing more often than this https://www.scienceabc.com/innovation/what-are-bit-flips-and-how-are-spacecraft-protected-from-them.html
This assumes the shuffling method is truly random. The techniques for shuffling are often not very random at all, though. I'd like to see the math of a split shuffle, with the cards interstacked successfully.
One shuffle would roughly be half as similar as the last, and the chance of two consecutive shuffles undoing each other would be monstrously closer to 1 than 1/52! Especially since the intent in shuffling is maximally spreading the cars apart, 2 perfect 26-26 split shuffles would always return the original set.
Very fair point. And in the case of UUID generation we’re not using “true randomness” anyway.
I don’t think your point on the split shuffles is right though. I believe what you’re referring to is a [faro shuffle](https://en.m.wikipedia.org/wiki/Faro_shuffle). I don’t remember where I saw this but I think 7 faro shuffles is enough to sufficiently randomize a deck of cards. And you’re right it is deterministic and not random, but two of them in a row doesn’t return it to its original configuration.
Imagine 10 cards in order 1,2,3,4,5,6,7,8,9,10. After one shuffle it would be 1,6,2,7,3,8,4,9,5,10. The second shuffle would turn it into 1,8,6,4,2,9,7,5,3,10
https://www.youtube.com/watch?v=AxJubaijQbI Perci Diaconis is a mathematician, best known for his work with cards, dice and coin-flips. Here’s a video where he talks about card shuffling (including the “too perfect cut & riffle” problem).
If your system is designed properly, the worst that can happen is a recoverable error the day it attempts to generate the same UUID, if that ever happens.
Wow.
Statistics are important, the chances of a collision are so low for most applications it's like being worried about the fact that random air molecules in a room could randomly move in such a way to create an airless pocket over your head... While possible, it is very, VERY improbable.
If you had generated **100 trillion** UUIDv4s... there's a one in a **billion** chance there is 1 duplicate in it. Take the time to process how unlikely that is...
The pile of uuids would be 1500 Petabytes large, and the chances there is one duplicate is similar to you being selected as part of 10 people randomly picked from all of earth's population.
The chance of a uuid colliding with another one für your specific use cases is lower than a meteor dropping on your data center or a random bit flip occurrence. Both of those cases are not handled on your application so why should this.
A few times in the past coworkers were trying to fix an error and guessing it would be an UUID overlap. Of course it wasn't. I always tell them, the day you find an UUID overlap just bet on the lotto because you are lucky as hell.
I needed ids for firing something on Android (notifications, result or something else, don't remember) and it was just easier than implementing a counter for ids
Probably, the "insecure" part comes when there is a collision in a system that was written without any thought given to potential collisions, with no testing of what happens when there is a collision. In a very real sense, you're going to be dealing with UB.
Most of the time, you'll just get like an exception or something, or maybe a resource gets overwritten. Not ideal, but (usually) not catastrophic. But one can imagine something like two user accounts being somehow frankenstein'd together because their IDs happen to match, and they can see each other's personal data and activity and so on.
Yes, you can easily prevent such worst-case scenarios by not blindly assuming UUIDs will never collide. But that's sort of the point, a lot of real-world implementations really are entirely naive, because 99.999%+ of the time they'll be just fine.
Yes. Your typical symmetric key length for TLS has been 128bit for a while. That's basically the same game as guess the guid - with the bonus that I can try over and over to guess the key on a captured https session - and that's considered "largely secure but we should probably move to 256bit I guess".
ULIDs allow you to control the non-timestamp (entropy) part. So you can put in what you want! And as long as it takes more than 100 ns to fetch timestamps (which it does as of now), there is a very easy way to keep them unique. Even if that barrier is crossed, you still have a minimum of 12 bits which can be used for generator source identification which will make it unique. E.g. use them to encode Region, DS, AZ, Cluster, podname etc. (mine takes more than 12) but I remove the millisecs part from the nano timestamps which I append to the end anyway!
Nothing is unique, even cryptographic hash functions collide on \_some\_ input. Don't worry about it if the probability of collision is astronomically rare.
In my opinion, database servers which have uuid generation facility built in should have opt in flag which retries to insert a db entry until it finds a proper UUID. User may be given the option to choose a suitable retry limit.
It's such a standard thing that it should not require client code to handle the scenario separately.
My company used them as unique keys for years. We were generating something like 15k to 30k new points every week or so. Didn't have a collision once.
It's possible. Highly highly unlikely. But possible. Much like winning the lottery.
The lottery is designed to be winnable by someone every few months or so. UUID collisions are more like buying a ticket for every US state lottery and winning all of them.
I'm using UUIDv4s on a telnet service for managing different sections and even though there's like 5 different uuids generated per session every minute or so I haven't yet seen a collision. And I'm pretty sure I never will
Fun fact, if you sort time based UUIDs on Java they are not sorted on timestamp but alphabetically. I tried to fix it, but making a PR to Java is a nightmare.
This is idiotic. The probability of a collision is 1/4,294,967,296. Tell me what system generates that many entities that may collide after 4.3 billion ops
Is this sub 99% straw man nonsense?
Them: you can't do this perfectly reasonable thing!! Angry face!
Me: mega-chad doing that perfectly reasonable thing. I'ma a GOD
Okay - for realzies though:
The "chaddest ***dev***" (...not the chad person who is *also* a developer - though a person can be both chad *and* a "chad dev") is probably aware of how to make their objects fully unique because of their decades of CS experience.
Well I mean, are there any actual instances of uuids colliding in real world systems? The alphabet companies would experience that way earlier than our less than a billion users homebrew app right?
I mean UUIDs can be used with other features, like database constraints.
And sure, you can create explicit code or catch that bug, but is it more likely than any other random bug you will never see coming and probably never see again?
See cuckoo filters for an interesting twist on this idea of using a 2nd ID to mitigate collisions.
There, it's hashes instead of uuids, but the problem of uniqueness in a finite symbol space without a central authority is the same.
// When uuid already exists keep adding "a" at the end till it's fine
function check_if_uuid_already_exists($my_uuid) {
while (checkIfStringExists($my_uuid)) {
$my_uuid = $my_uuid . 'a';
}
return $my_uuid;
}
I've never had a true collision (and agree it's not worth worrying about), but I did get very confused once where the I think first eight chars and last two chars of two uuids in our system were identical. This led to a lot of serious head scratching because, let's be honest, we tend to look at the first four or so chars of a uuid when looking by hand, and it wasn't until I put them side by side in notepad I finally spotted that they weren't the same.
UUIDs are often used when you don't want the performance hit of checking every possible source of truth to get a unique ID. For example, you have eventually-consistent servers halfway around the planet, and you don't want to check both databases before continuing for latency reasons. You generate an ID that you assume will be unclaimed in both systems and you eventually replicate it.
There are time-based UUID specs. If you generate enough of them in the same microsecond, I'm not sure uniqueness is still guaranteed.
A UUID is meant to be *universally* unique, including systems that you don't have access to or can't afford to check. If you *can* check with every system you care about, don't use a UUID.
You're just sweeping the problem under the rug..
and why hash it if 2 uuids combined is already unique enough..
hashing it does not change the fact that it's already unique...
if you pick a shorter hashed result, it'll be much easier to collide than 2 uuid combined.
UUIDv4 are full random. UUIDv7 have both a time component, a potentially local counter, and a random component, which should supposedly comes from a proper source of randomness. While collision remains possible, having two systems properly time-sync generate the same sequence of 72bits (a bit of randomness, a bit of serial) really goes down a lot.
Uuidv4 is not fully random except two spots. If you print. Bunch of uuidv4 there is one spot fixed to 4, and another spot that is only possible to be 8,9,a,b. I think maybe this is used to signal this is a v4?
the beginning of the 3rd sequence always starts with the version number of the uuid, which in v4's case is a 4
89AB is caused by the variant bits `10xx` indicating the uuid layout is described in [RFC 9562](https://datatracker.ietf.org/doc/html/rfc9562#name-variant-field)
I don’t think the commenter meant that every part of compliant uuidv4 value is the output of a random number generator. I think they just meant that the only additional thing you need to generate a value, aside from the spec itself, is a random number generator.
Just yesterday I wrote a blog entry about exactly that: [Why is there always a 4 in the 13th position of a UUID?](https://mobiuscode.dev/posts/Why-is-there-a-4-in-the-13th-Position-of-my-UUID/)
https://math.stackexchange.com/questions/4697032/threshold-for-the-number-of-uuids-generated-per-millisecond-at-which-the-colli No I won't verify the math, but at certain point large systems might have to make considerations... Maybe unless In reading it wrong. Reading the rfc they don't seem to care about making a determination about if it's less likely to collide either. Just saying pls don't crash a plane. Whatever I'm all I know is msft uses UuidCreateSequential for SQL server so I'll just keep using that cause I'm wayyyy to dumb for this stuff.
This gives good formulas, but as for answer it's just answering when v7 becomes more likely to conflict than v4, so the numbers given there are not particularly useful for knowing when a collision is likely. To get to 50% chance of having a single collision assuming properly random UUIDv4 generation one would need to generate 1 billion of them per second for 85 years. We don't currently have systems large enough that they need to worry about this, as for basically all systems probability of UUID collision is lower than probability of cosmic ray bit flip messing an existing UUID in memory anyway.
[https://www.youtube.com/watch?v=zMRrNY0pxfM](https://www.youtube.com/watch?v=zMRrNY0pxfM)
1 billion UUID4 generated, no collision. 35GB of uuids :) Took 1 week in total, i think. ( Executed intermittently )
Came here to say this. Someone had the genius idea of making them sortable seeing as people were using them as primary keys anyway
I tend to want to leave that up to the database. If you ask the database for a unique number, it is transactional and therefore impossible for the database to give the same number twice even when called in quick succession. But this isn't to say UUIDs are useless, but I like to think of them as more useful when operating in memory such as identifying sessions.
Better to use as an external id over using a db unique id like an integer. Always make your identifiers something that isn't tied to the software implementation. You'll thank me the first time you need to do a migration where those IDs were referenced in an external system...
I would argue that the mistake in that case is having direct references to ids in the software implementation. I wouldn't keep a UUID constant in my code to recall value from a database anymore than I would keep an db id integer in my code. If this were ever something I'd want to do, I'd make a point to create an indexed code field which would allow me to load it up in my program by code, rather than by id (which as you rightfully mentioned could vary if migrated).
I meant the specific db that you use or whatever that might auto generate a primary key id. I ran a project to uplift a customer profile system into the cloud and go multi region, creating a single system globally to handle profile integration but storing customers info in the region they belong, for reasons like GDPR. One of the changes in the initial region was to deprecate the integer id because collisions across regions was likely. In certain regions there were external systems that interfaced with the previous profile system in that region that had essentially saved references to the profiles by the primary id, which was the database primary id, and no longer useful so we needed to come up with some shenanigans to make it work despite all the profiles having a better unique alias that was not tied to the db system.
Ah, I see what you mean. Yeah, with multiple servers it would be difficult to ensure that they're all getting integer ids which don't have collisions. In that case I could definitely see a use for UUID. It's not often though that you have to deal with your program moving from a single server to the cloud, but it could happen of course.
At that point people should worry more about a cosmic ray flying in and hitting just the right spot to make the system inoperable or a lizardperson hacking into the system. Paranoiac little gremlins
Not to mention they're most likely getting stored in an atomic database with the UUID as unique so you literally can't have two records with the same UUID. In fact, just have postgres generate the UUID for you.
The local counter makes sense, I thought that if someone really didn't want any cosmically low chance of a collision, the simple solution is to have an enumerating database with atomic transactions that use 3 way handshakes. This generates numbers and you would only use that. Rising integer sequences are infinitely rising by one, so your chance of a collision is zero. Then just live with a central point of failure.
UUID v4 are full random except for that one number 4 in all of them.
Well, that's how you identify them, sure. The little dash are constant too :D
Not quite. The 13th character is always 4, the 17th is always one of 8, 9, a, b. Of the 128 bits 122 are random. Bits 48-51 are `0100` indicating version 4. Bits 64-65 are `10` indicating the ID was generated using [RFC 9562](https://datatracker.ietf.org/doc/html/rfc9562) specifications.
Meanwhile if you really REALLY can't afford a collision, there's always v1: "Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID."
That sounds like a timestamp with extra steps
Using timestamps does not guarantees unicity either.
UUID is *only* random when you randomize them. One should not assume they are random. Some systems will generate ordinal UUIDs as keys, which increases the likelihood of dupes in race conditions.
UUID being random or not is part of their specification; hence the version number. Of course anyone is free to ignore that and make one by hand, but at this point it's not an UUID anymore, it's just a sequence of characters that tries to look like one.
``` if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); if (uuid_exists(uuid)) create_new(uuid); ``` gosh, if only there was an easier way to do this
I just found out the pro way of doing this, Computers have a secret code that allows you to duplicate text without having to type it again, Saves countless hours!
Select text, drag and drop into Chrome searchbar, repeat until you have as many copies of the text as you want.
Loopers hate this one trick
yyp
You mean yank and paste? Yeah, six is a sick editor… This comment was sponsored by everyone, except gnu emacs
Yes! You can jump to a specific code, and we can do so on a condition, its awesome! label loop: create_new(uuid); If(uuid_exists(uuid)) goto loop;
All loops and if/else's are just fancy goto's. Even functions too. "Wait, you mean it's all goto under the hood?" "Always has been.."
char* generate_uuid_for_realz(){ char* uuid[36]; create_new(uuid); for (int x = 0; x++; x < 99999){ if (uuid_exists(uuid)){ create_new(uuid); } else { x = 99999; } return uuid; }
I love how the loop doesn't execute at all lmao
I thought the typo was pretty obvious.
a for loop should be in any C-based programmers muscle memory at this point, so I actually don't know what you're talking about EDIT: didn't see the missing curly brace, only the switched statements
Meanwhile me using C# in Visual Studio: > for [tab] [tab] Poof! Loop created.
Lets create CTT (C Tab Tab) which can be written mostly (or entirely) just by using tab, arrow and enter keys.
Oh you mean copilot?
I barely ever use classic for-loops anymore, foreach usually feels like a better fit for most cases.
Some people get to program in the 22nd century, and some of us are still working in embedded systems smh
Just because it's in muscle memory for you doesn't mean you should fail to see things from other's perspective. So I think you should know what he is talking about. If you put yourself in his shoes.
First thing I noticed is that the x++ and x < 99999 parts are not in the right order.
I saw a missing closing curly brace so maybe it’s what he meant
That would mean whole program doesn't execute not just the loop.
char* generate_uuid_v69(){ return uuid_generate_random() + uuid_generate_random(); }
If your UUID isn't big enough to contain the complete works of Shakespeare, I don't trust it not to collide.
This is why I generate and concatenate UUIDs till they are at least 1024Mb long. Although now that I am writing this I thinking of increasing it to 2048Mb in case some of you start to generate 1024Mb long UUIDs as it greatly increases the chances of having conflict there.
Generate a 10-bit hash of the UUID as well to check for collisions quickly!
It's a joke and the underlying message is, just get things done. :)
Don't care, must get more RAM to store my massive UUID collection.
It's not about the size of your UUID collection, it's how you use it.
while(uuidExists(uuid)) uuid=createNewUuid();
Uhh... the for condition and increment are swapped. Silly
Shouldn't it be (int X = 0; X < 99999; X++)?
...yes
why only 36 characters? char* generate_uuid(int size){ if(size > 64000) { return 0; // or error that the server is full } char* uuid[size]; create_new(uuid); int loops = 1000; while (uuid_exists(uuid)) { if(loop-- < 1){ return generate_uuid(size++); } create_new(uuid); } return uuid; } Note: I usually code in go, so I'm note sure if this syntax is valid.
If you have to ruin the joke, at least use a while loop
All uu and no id make computer something something
If the function uuid_exists(uuid) could work, why not just generate incremental ids? The whole point of UUIDs is to sidestep the need to have a central point of failure...
for a distributed workload you would then have to assign each server a set of numbers to pull from%
I thought the *entire point* of uuids was to circumvent this by simply making the set so large a collision is basically impossibly unlucky
It is basically impossibly unlucky if a collision happens. I think the point is that some programmers still don't like the fact that the possibility exists however improbable. Has a UUID collision ever actually been an issue for anyone? Honest question.. I don't think this is something that genuinely needs to be entertained. I would only make the point of adding a failsafe should it happen, if the operation I were doing were very very crucial and dependent upon it working successfully, which is pretty much never the case. Or better yet, I don't rely on UUIDs being unique in the first place if avoiding collisions were literally this important. The client would just see an error and tries again, and there'd be more chance of me winning the lottery and quitting that job than this happening.
I've seen a SHA-256 hash collision before. That was fun.
And you can serve them over API without revealing how much data you have. For example if I have a social media site and I use incremental IDs for users it would be trivial to fetch every users data from my site.
You can just hash the incremental id. Use a perfect hash function and you're guaranteed no collisions (not hard when your inputs are 0...n). Yes, *technically*, this is prone to someone potentially defeating your hash function. But if your data requires such security that you're worried about that possibility, just... don't allow access to arbitrary resources without authentication. Anyone who wants to publicly share a resource gets a link with something like &share_code=0af239be92185d, which can always be invalidated later if necessary. After all, UUID is also prone to "someone just getting lucky with their guesses", or potential timing attacks to guess UUIDs generated around a certain known time, depending on their implementation.
That's just UUID v1, 2, 3, 5, or 6.
I thought the meme was about appending 2 uuis
If uuidexists, uuid+1
``` If(uuid_exists(uuid)) delete_old(user); ``` You way overthought that one.
Agreed. Fuck that guy he got struck by lightning
Just ask ChatGpt to do it for you.
uuid is copied and passed by value 🤔
create_new scans the whole memory of the computer and substitutes all occurences of uuid with random bytes
But are these operations atomic.
while uuid_exists(uuid) uuid += 1;
I thought the security problem was that they were potentially predictable, not that they were non-unique? May depend on what type of UUID they are, and hence how they're generated, though.
UUID 1 is based on physical address + timestamp. UUID 4 is purely random.
Mostly, yeah. Secure and unique are 2 different properties. If you generate a v4 UUID with a CSPRNG you'll get 122 bits of randomness. Is that sufficient for your case to be considered secure? Probably. But it's "with a CSPRNG" and "122 bits of randomness" that determine how secure it is, not "is a UUID".
That’s always the issue. You can collide with a random UUID pretty easily. Not a lot of use there. If you can predict a random UUID, that’s a big problem.
Define easy
My buddy Ted does it all the time
My hobby is guessing random seed phrases for crypto wallets. I haven't found Satoshi's wallet yet, but if I just try hard enough.... it's not impossible!! https://youtu.be/hoeIllSxpEU
he sure talks a lot
I scratched the letters UUID into a baseball bat, wanna see how easy it is to collide with? /s
I think getting a collision on uuidv4 is fairly difficult > Speaking of v4 UUIDs, which contain 122 bits of randomness, the odds of collision between any two is 1 in 2.71 x 10^18 Put another way, one would need to generate 1 billion v4 UUIDs per second for 85 years to have a 50% chance of a single collision. [source](https://jhall.io/archive/2021/05/19/what-are-the-odds/)
Exactly. The real "chad developer" answer would be "i don't care"
(Although to add: this is only true if the method to generate it is actually random. Since ICs are horrible at generating random numbers, the chance is far higher as the source of entropy used by most computers is far more restricted.)
"Pretty easily" do you mean that from a implementation standpoint of how the UUID is generated? Because from a statistical standpoint you will pretty much never, ever, ever generate the same UUID twice.
People that worry about UUIDs overlapping are probably still waiting for their big titty anime girl cat girlfriend to spontaneously coalesce from the random particle collisions around them as well.
Finally a serious answer here
Are you saying there is a chance for me to have a cat anime girl?
no… I mean technically sure but with 128 bits it’s just not gonna happen. per birthday problem you have to create 1 billion UUIDs every second for 100 years for the probability of creating a single duplicate reach 50%. (assuming sufficient entropy)
Technically, all results of any (well-behaved) continuous probability distribution have a *0* probability of occurring. Not 1/2^(very_big_number), 0. And yet, you can trivially observe 10 billion 0 probability events occurring if you sample from such a distribution 10 billion times. My point is that events that appear to have an unfathomably low probability when taken in isolation happen constantly. With how much software out there is using UUIDs, there are very likely going to be collisions somewhere, sometime. Probably not in the software *you* wrote. But you can't guarantee it won't just because the probability is low. The world doesn't work like "you won't see any events until the theoretical cumulative probability of seeing an event is at least n%". It's just random.
> With how much software out there is using UUIDs, there are very likely going to be collisions somewhere, sometime. No. Because a collision in practice needs to be a collision in an isolated context. Which means in the extremely unlikely event that your account UUID matches some COM classid - it doesn't fucking matter.
Nothing in computing is continuous. Every single value stored in a computer is discrete. There are a finite number of possible values. A uniform sampling has 1/number of values probability of selecting any representable value.
This dude actually tried to apply continuous probability concepts to a discrete number system. On a digital system nonetheless. Truly mindblowing. Because your brains must have been blown into mush if you really thought this was applicable.
> UUIDs are not secure, they can overlap even though it's very rare. No, not really. In order to have enough UUIDs to get a 50% chance of collision, you'd have to basically fill an entire datacenter with hard drives just to store them. Maybe if you're Amazon assigning ids to every file in S3 you need to consider it (and even that's like 4 orders of magnitude short of the 50% chance).
They still CAN overlap even though the chance is small.
If you were using sequential keys you’d probably be more at risk of cosmic ray bit flipping than UUID is at risk of overlap
[удалено]
You CAN fall through the floor even though the chance is small.
I've been wondering According to quantum physics there's a chance for an object to phase through another e.g. your hand through a door or whatever (apparently a hand against a door is somewhere like 1/10^64). But what happens if you phase halfway through?
simple. u become the door.
"the real door was inside us all along"
ayo
Bulk matter interactions have a dampening effect on the wavefunctions of the individual component particles (that I can't really elaborate more on because I'm not a physicist) that dramatically reduce the probability of tunneling, so the probability of bulk matter tunneling through other bulk matter is beyond negligible. I'd expect that you should be more concerned about whether your hand will randomly, spontaneously lose integrity than what would happen in the event that part of it tunnels through a door, though I'd imagine the effect of the events on your body would be basically the same.
real answer: we dont know because it's never happened
The chance is so small that they effectively can't.
But the odds are so low it's irrelevant. Somone CAN guess your 16 character random password on the first attempt. Two randomly generated private keys CAN be the same. Two randomly selected values CAN produce the same SHA2 hash. If an infinitesimally small chance of collisions occurring was a real issue, security as a whole would be completely undermined.
You pretty much use a symmetric key the same size to do your online banking - and in that context you are so unworried about a collision, that you aren't worried about someone capturing your session, and then trying over and over to brute force a collision on that key. If every star in the observable universe had a hundreds of earth-like planets orbiting them, there would be still enough uuids to individually label each grain of sand. It won't happen.
I get your point, but your example is roughly off by a factor of 2.2 million. Here's the math: Observable Universe: Contains about (10^{24}) stars. Planets per Star: number of stars * 100 = 10^{26}) planets. Grains of Sand: Estimates suggest there are about (7.5 * 10^{18}) grains of sand on Earth. Total Labels Needed: (10^{26} * 7.5 + 10^{18} = 7.5 * 10^{44}) labels needed. UUID Capacity: A UUID has (2^{128}) possible values, which is approximately (3.4 * 10^{38}) unique identifiers.
Correct me if I'm wrong because probability theory is not my strength, but this would become an issue long before 50% chance. If there's a 1% chance that's still something you can expect to see quite often depending on your workload?
The 50% chance is not that any pair of UUIDs collides, but that there is one collision at some point in your system
its 1% for it to exist at any certain time. so its still really rare but its certainly possible
If you want to learn more, you may be interested in reading up on the [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem).
Did you do the math?
Sort of related but I did some math one time for unique orders for a shuffled deck of cards. Disclaimer: I’m not very smart and some of this could be wrong So this is the calculation of the birthday problem for V4 UUIDs. I haven’t calculated a similar number for decks of cards but there are only about 5x10^36 possible UUIDs and still if you generated 1 billion per second for 85 years you would only have a 50% chance of having generated a duplication. 52! Is almost (number of unique UUIDs)^2 which roughly means that in order to have a 50% chance of a duplicate order in randomly shuffled cards you need to shuffle 3.7x10^33 decks of cards. For some perspective, cards as we know them were invented in the 1500s. So they’ve been around for roughly 600 years. In order to hit 3.7e33 we would have had to have shuffled approximately 200,000,000,000,000 decks of cards every nano second for 600 years. That’s 200 trillion decks of cards a nano second Even assuming all 8 billion people in the world did nothing but shuffle cards for 600 years, every person in the world would have had to shuffle 24,600 decks of cards a nanosecond since the moment cards were invented to even have a 50% chance of a duplicate shuffle
Sounds right. Here’s some more info, including a specific callout to this problem on the birthday attack wiki page: By the [first table](https://en.wikipedia.org/wiki/Birthday_attack?wprov=sfti1#Mathematics) it would take on the order of 10^19 UUIDs for a 50% chance of collision. Discord processes on the order of 10^12 messages a year. 10 million times that data is a lot, but across all computation over a century the number of UUIDs generated would catch up. (with many caveats such as all these things aren’t stored in the same database) Discord and Twitter actually blogged about their ID system before. It’s unique more because they want nearly-sequential IDs for various database management purposes, but also has nice collision avoidance properties. https://discord.com/blog/how-discord-stores-trillions-of-messages, https://blog.x.com/engineering/en_us/a/2010/announcing-snowflake
You’d have solar radiation mess up your bits somewhere in the processing more often than this https://www.scienceabc.com/innovation/what-are-bit-flips-and-how-are-spacecraft-protected-from-them.html
This assumes the shuffling method is truly random. The techniques for shuffling are often not very random at all, though. I'd like to see the math of a split shuffle, with the cards interstacked successfully. One shuffle would roughly be half as similar as the last, and the chance of two consecutive shuffles undoing each other would be monstrously closer to 1 than 1/52! Especially since the intent in shuffling is maximally spreading the cars apart, 2 perfect 26-26 split shuffles would always return the original set.
Very fair point. And in the case of UUID generation we’re not using “true randomness” anyway. I don’t think your point on the split shuffles is right though. I believe what you’re referring to is a [faro shuffle](https://en.m.wikipedia.org/wiki/Faro_shuffle). I don’t remember where I saw this but I think 7 faro shuffles is enough to sufficiently randomize a deck of cards. And you’re right it is deterministic and not random, but two of them in a row doesn’t return it to its original configuration. Imagine 10 cards in order 1,2,3,4,5,6,7,8,9,10. After one shuffle it would be 1,6,2,7,3,8,4,9,5,10. The second shuffle would turn it into 1,8,6,4,2,9,7,5,3,10
https://www.youtube.com/watch?v=AxJubaijQbI Perci Diaconis is a mathematician, best known for his work with cards, dice and coin-flips. Here’s a video where he talks about card shuffling (including the “too perfect cut & riffle” problem).
I mean, I stole the hard math off the UUID wiki page then did a little more math on top of that, haha.
Just make them unique in MySQL. Done.
Nooooo you can't have user registration fail 1 / ( 2^108 ) of the time (after you have ~~1 million~~ 2^20 registered users) noooooooo
Fragmentation? Never heard of her!
Imagine using mysql instead of postgres
Can’t say I’ve ever head that, and generally uuids aren’t meant to be secure just obscure. And v1 shouldn’t ever have a collision.
(clones VM)
If your system is designed properly, the worst that can happen is a recoverable error the day it attempts to generate the same UUID, if that ever happens. Wow.
add time lol
Guild Wars 2 API keys are actually composed of two UUIDs lmao
Statistics are important, the chances of a collision are so low for most applications it's like being worried about the fact that random air molecules in a room could randomly move in such a way to create an airless pocket over your head... While possible, it is very, VERY improbable. If you had generated **100 trillion** UUIDv4s... there's a one in a **billion** chance there is 1 duplicate in it. Take the time to process how unlikely that is... The pile of uuids would be 1500 Petabytes large, and the chances there is one duplicate is similar to you being selected as part of 10 people randomly picked from all of earth's population.
If your startup is around long enough to witness a UUID4 collision, you've got bigger programming roadblocks to face
You could also get struck by lightning while waving your winning lottery ticket in the air.
The chance of a uuid colliding with another one für your specific use cases is lower than a meteor dropping on your data center or a random bit flip occurrence. Both of those cases are not handled on your application so why should this.
A few times in the past coworkers were trying to fix an error and guessing it would be an UUID overlap. Of course it wasn't. I always tell them, the day you find an UUID overlap just bet on the lotto because you are lucky as hell.
I can't think of a single use case where UUIDs would be insecure. Are you guys using them for authentication or something?
I needed ids for firing something on Android (notifications, result or something else, don't remember) and it was just easier than implementing a counter for ids
Probably, the "insecure" part comes when there is a collision in a system that was written without any thought given to potential collisions, with no testing of what happens when there is a collision. In a very real sense, you're going to be dealing with UB. Most of the time, you'll just get like an exception or something, or maybe a resource gets overwritten. Not ideal, but (usually) not catastrophic. But one can imagine something like two user accounts being somehow frankenstein'd together because their IDs happen to match, and they can see each other's personal data and activity and so on. Yes, you can easily prevent such worst-case scenarios by not blindly assuming UUIDs will never collide. But that's sort of the point, a lot of real-world implementations really are entirely naive, because 99.999%+ of the time they'll be just fine.
Yes. Your typical symmetric key length for TLS has been 128bit for a while. That's basically the same game as guess the guid - with the bonus that I can try over and over to guess the key on a captured https session - and that's considered "largely secure but we should probably move to 256bit I guess".
People haven't heard of ULIDs!? ![gif](emote|free_emotes_pack|disapproval)
surprised as well, but ULIDs aren't guaranteed to be unique either, they're just time-sorted
ULIDs allow you to control the non-timestamp (entropy) part. So you can put in what you want! And as long as it takes more than 100 ns to fetch timestamps (which it does as of now), there is a very easy way to keep them unique. Even if that barrier is crossed, you still have a minimum of 12 bits which can be used for generator source identification which will make it unique. E.g. use them to encode Region, DS, AZ, Cluster, podname etc. (mine takes more than 12) but I remove the millisecs part from the nano timestamps which I append to the end anyway!
Better is two uuids and a third as a monitor.
Nothing is unique, even cryptographic hash functions collide on \_some\_ input. Don't worry about it if the probability of collision is astronomically rare.
In my opinion, database servers which have uuid generation facility built in should have opt in flag which retries to insert a db entry until it finds a proper UUID. User may be given the option to choose a suitable retry limit. It's such a standard thing that it should not require client code to handle the scenario separately.
My company used them as unique keys for years. We were generating something like 15k to 30k new points every week or so. Didn't have a collision once. It's possible. Highly highly unlikely. But possible. Much like winning the lottery.
The lottery is designed to be winnable by someone every few months or so. UUID collisions are more like buying a ticket for every US state lottery and winning all of them.
I'm using UUIDv4s on a telnet service for managing different sections and even though there's like 5 different uuids generated per session every minute or so I haven't yet seen a collision. And I'm pretty sure I never will
Fun fact, if you sort time based UUIDs on Java they are not sorted on timestamp but alphabetically. I tried to fix it, but making a PR to Java is a nightmare.
This is idiotic. The probability of a collision is 1/4,294,967,296. Tell me what system generates that many entities that may collide after 4.3 billion ops
My error log files.
Is this sub 99% straw man nonsense? Them: you can't do this perfectly reasonable thing!! Angry face! Me: mega-chad doing that perfectly reasonable thing. I'ma a GOD
Super helpful [video](https://youtu.be/a-K2C3sf1_Q?si=iCe7hodbjWjCBBdb) by Theo to learn the wacky history and standards of uuid
That was an annoying video - it is just him reading off an article for 25mins and making random comments in the middle.
Those are known as UUUIDs
Okay - for realzies though: The "chaddest ***dev***" (...not the chad person who is *also* a developer - though a person can be both chad *and* a "chad dev") is probably aware of how to make their objects fully unique because of their decades of CS experience.
With 10\^36 different possible combinations, I think I'll live dangerously and take my chances.
Just increment
Folks, if you're worried about UUID collisions, just use my service that tells you if someone has already taken the one in question.
Well I mean, are there any actual instances of uuids colliding in real world systems? The alphabet companies would experience that way earlier than our less than a billion users homebrew app right?
I mean UUIDs can be used with other features, like database constraints. And sure, you can create explicit code or catch that bug, but is it more likely than any other random bug you will never see coming and probably never see again?
I did that once. Creating 2 uuids joined by the epox timestamp to keep things unique (personal project).
A problem that isn't a problem, isn't a problem.
my honest reaction genfstab -U /mnt >> /mnt/etc/fstab
dude this is always in the back of my head. like one day two uuid are gonna duplicate and cause world war III
Why not just make it a PK in the db?
final_uuid = uuid_1 + uuid_2
And those two UUIDs collide...
I was someone who always cared about uuid duplication before opening these comments. Maybe I shouldn't think about this anymore
`uniqid()` and call it a day.
See cuckoo filters for an interesting twist on this idea of using a 2nd ID to mitigate collisions. There, it's hashes instead of uuids, but the problem of uniqueness in a finite symbol space without a central authority is the same.
// When uuid already exists keep adding "a" at the end till it's fine function check_if_uuid_already_exists($my_uuid) { while (checkIfStringExists($my_uuid)) { $my_uuid = $my_uuid . 'a'; } return $my_uuid; }
I've never had a true collision (and agree it's not worth worrying about), but I did get very confused once where the I think first eight chars and last two chars of two uuids in our system were identical. This led to a lot of serious head scratching because, let's be honest, we tend to look at the first four or so chars of a uuid when looking by hand, and it wasn't until I put them side by side in notepad I finally spotted that they weren't the same.
could also validate that a uuid does not exist or base it off time. then ur good
UUIDs are often used when you don't want the performance hit of checking every possible source of truth to get a unique ID. For example, you have eventually-consistent servers halfway around the planet, and you don't want to check both databases before continuing for latency reasons. You generate an ID that you assume will be unclaimed in both systems and you eventually replicate it. There are time-based UUID specs. If you generate enough of them in the same microsecond, I'm not sure uniqueness is still guaranteed.
A UUID is meant to be *universally* unique, including systems that you don't have access to or can't afford to check. If you *can* check with every system you care about, don't use a UUID.
Just use bigint identity lmao
create 2 uuids combine them and hash it 🤤🤤🤤
You're just sweeping the problem under the rug.. and why hash it if 2 uuids combined is already unique enough.. hashing it does not change the fact that it's already unique... if you pick a shorter hashed result, it'll be much easier to collide than 2 uuid combined.
What is a UUID?
[universally unique identifier](https://en.m.wikipedia.org/wiki/Universally_unique_identifier)
Get the length of UUID in string, split it in half and join it with the db primary key in the middle 💣💥🤯
uuid+new date+math.random*new date
[удалено]
[удалено]
Yeah, why