Nature of Man
… nothing that happens to Man is ever natural

16 January 2006, Monday

Astrology and Other Hash Functions for the Masses

Filed under: Uncategorized — Mordred @ 22:01

(A short explanation for noncoders and other muggles)
A hash function is a method for reducing an arbitrary long piece of data into a smaller, fixed-size piece, with the added bonus that the smaller piece will be a sorta kinda unique representation of the long piece. Because of the assumed uniqueness, often the hash value is used as an identifier of the long piece of data. That’s not generally true of course, you can’t shrink a book to one word, and hope that no other book will shrink to that same word. But, if you want to compare two books and their corresponding “hash words” are different, you can be absolutely sure that the books are different as well, and skip the lengthy word-for-word check. If the hash values are the same, we call it a collision. In cryptographic applications the collisions may be a bad thing, while in other scenarios, they are merely a nuisance (depends on how you choose to decide if two things are the same or not). This explanation is beginning to steer away from the purpose of the article, so it will stop HERE.

You may think that this is a highly technical concept, which concerns only programmers and other such lowlife geeks, but as many specific concepts it can be applied to Real Life ™ with interesting results.

In Bulgaria we have something called a Unified Citizen Number, which is an unique identifier of any person in the country. It is used to identify people in official contexts, instead of using their name for example(which can be changed, or duplicate the name of another). The number in itself, apart from information about date and place of birth and gender, contains a “checksum” - a one-digit hash value appended to the 9 digits of data, which is used to make sure the whole number is intact and no digits are altered (in reality, because the hash value is too short, if you took out random 10-digit numbers, one in ten will have a valid checksum). But that’s not the most interesting part. The interesting part comes when the UCN itself fails to serve as an identifier (by virtue, or shall I say, vice, of being too short, or just improper for the use). I know of several quite severe problems arising from assigning two different persons the same number (Bulgarians can google for “едно и също ЕГН”, “същото ЕГН” and similar strings and see for themselves. Don’t forget to use the quotes.), and hundreds of possibilities for a light version of “identity theft” where just by knowing peoples’ numbers and names, you can make many interesting Gogol’s “Dead Souls”-style tricks.

On a side-note, legislature forbidding the general public from having access to anyone’s personal data WAS passed, but anyone wise enough to grab a few sheets of voters listings some 2-3 years ago should have enough dead souls to serve him for a lifetime. And don’t tell anyone, but I chanced to obtain a list of all people (names, UCNs, place of birth) born in one entire region in Bulgaria, and imagine what a truly dedicated person could do.

A similar problem is found in barcode labels, see this classical +ORC cracking tutorial (c.a. 1996 I think) explaining how to do it.

The best known, and probably the oldest example of Real Life ™ hash functions is found in Astrology. From the most simple function having 365 (okay, or 366, you pedants) days on its input and returning 12 categories on its output, to the most complex zodiac divisions into zodiacal signs, houses, etc., based on elaborate calculations of birthminutes, trigonometry, the angle of the shadow that is cast by a three-foot-long pole exactly 13 weeks after the birth and the vector of direction of the migratory birds — all this is simply a hash function.

The important feature of the astrological hash functions is not avoiding collisions. On the contrary, they WANT to have many people in the same category (so they can sell them ’special’ jewellery, etc). No, the best feature is that the algorithm for calculating them is known to all the sca… practicioners , and whomever of them you visit for a second oppinion (as a layman’s scientific test) he would arrive at the same predictions as the first one. Obivously, as the two haven’t met (or didn’t have time to, or live 1000km apart), their ’science’ must be correct!

Notice the use of the word ’science’ above — after all, if a phenomenon is repeatable and we can predict it, then the prediction is scientifically valid. What actually is predictive here is the algorithm the astrologers use (the same function, given the same input data will of course give you the same output!) - the actual content of their predictions has nothing to do with science, or indeed, reality.

The same thing is observable in those idiotic SMS ’services’ I saw on TV (I watch have a glimpse at TV about once in a trimester and they manage to appall me every time with something completely new). They work like this - you send them your name and the name of your heart’s desire, and they tell you how well will you match together with accuracy of one percent! And it’s obviously not random, no sir, see - I will send it a second time, and they will predict the same!

Oh, puh-lease…

Powered by WordPress