The Algorithm
This time, we did invent the algorithm!
You don't have to understand this to be able to geohash.
- Strings of the current date (in yyyy-mm-dd format, which should be the only date format you use anyway) and the daily opening price of the Dow Jones Industrial Average (as quoted at finance.google.com) are concatenated, with a hyphen separating the two.
- West of -30° longitude: If there is no opening price for the Dow on the desired day, the opening price from the previous, or most recent trading day is used instead. So Saturday and Sunday both use Friday's opening price.
- East of -30° longitude: Same as west, except the Dow's opening price for the previous, or most recent trading day is used, even if a new one becomes available later in the day in your time zone. That is, Thursday uses Wednesday's open, Friday uses Thursday's open, and Saturday through Monday all use Friday's opening price. (see 30W Time Zone Rule)
- When there's a Dow holiday, the most recent trading day's opening price is used.
- The resulting string is then fed through the well-documented MD5 cryptographic algorithm to generate a pseudo-random (yet easily verifiable) "hash" of 32 hexadecimal digits.
- The "hash" is then split into two halves of 16 hexadecimal digits each.
- Each half of the "hash" is prepended with a decimal point (so as to represent a hexadecimal fraction) and is converted to a base-10 fraction.
- The resulting decimal fractions are appended to the integral (lat,lon) values of any given graticule to produce that graticule's geohash target for the day.
Contents
Specification
This is a more precise specification for anyone attempting to implement the algorithm.
The process to calculate the geohash coordinates for a given date "GD" and graticule "[Lat,Lon]" is as follows.
Note: A graticule consists of an integer latitude (in the range -90 to -0, and +0 to +90) and longitude (in the range -179 to -0 and +0 to +179). [-0,5] and [0,5] are separate graticules with different geohashes, for example.
- Determine the applicable Dow Jones opening date "DJOD" : (see 30W Time Zone Rule)
- If the longitude is between -179 and -30 inclusive, use the latest date up to and including GD on which a Dow Jones opening price has been or will be published. (On weekends and Dow holidays, DJOD will be earlier than GD.)
- Otherwise (if the longitude is between -29 and +179 inclusive), use the latest date up to and including one day before GD on which a Dow Jones opening price has been or will be published. (DJOD will always be at least one day earlier than GD.)
- Obtain the opening price of the Dow Jones Industrial Average for the DJOD. This is usually available from 9.30 am New York time, and published to two decimal places.
- Form a string by concatenating GD (in YYYY-MM-DD format), a hyphen "-", and the applicable opening price. For example: "2005-05-26-10458.68"
- Pass this string through the MD5 cryptographic algorithm to generate an MD5 hash of 32 hexadecimal digits.
- Split the hash into two halves of 16 hexadecimal digits each.
- Prepend a decimal point before each half, forming a hexadecimal number between 0 and 1. (Example: 0.db9318c2259923d0)
- Convert each half to decimal. (Example: 0.857713267707002344)
- Append the first decimal number formed, without the leading 0, to the graticule's latitude to form the geohash latitude. (Note this is a string operation: appending 0.8577 to longitude -1 yields -1.8577)
- Similarly, append the second decimal number formed to the graticule's longitude to form the geohash longitude.
Calculational Aids
- Online md5() generator
- Fractional hexadecimal-to-decimal calculation
- Dow source
- As of 2022, it is no longer trivial to pagescrape from Google Finance. For those wishing to capture the opening price directly from finance.google.com, you will need to parse the arrays from the
AF_initDataCallback
blocks. - Or, you could use one of the other sources listed at Dow Jones Industrial Average.
- As of 2022, it is no longer trivial to pagescrape from Google Finance. For those wishing to capture the opening price directly from finance.google.com, you will need to parse the arrays from the
- The Dow Jones opening time is Monday to Friday (except Dow holidays) 09:30:00-05:00 or 09:30:00-04:00 (EST/EDT; reference city in this timezone: New York).
Quirks
There are a number of results of the application of the algorithm that may not be immediately apparent, but which make for interesting thought experiments and even more interesting achievements under increasingly bizarre or adventurous scenarios.
- On weekends and Dow holidays, all graticules around the globe share the same base coordinates (although mirrored in the differetn hemispheres). On other days, the coordinates east of -30 longitude are completely different from those west.
- The coordinates at the prime meridian are mirrored instead of following the grid pattern. This means that a low longitude value (say, .001) could yield two hash points very close together on opposite sides of 0°. At the prime meridian, once every 111 days or so there will be two hashes within 1km of each other due to this phenomenon. The equator displays the same phenomenon, and the intersection of the two can put 4 hashes in close proximity to each other, unfortunately in the Atlantic Ocean.
- The 180th meridian behaves similarly (mirroring) on weekends and Dow holidays. Otherwise, the points either side of the line are independently random, and may be arbitrarily close together or up to two graticule-diagonal-distances apart, depending on the two random coordinates involved. These also interact with the equator as above.
- The line of -30 longitude has no special properties on weekends or Dow holidays. Otherwise, the points on one side bear no relationship to the points on the other, as for the antimeridian.
- However, the interaction of the 180th meridian with the international date line leads back into interesting territory. Where they diverge, up to three hash points can be found arbitrarily close together, even when not near the equator. Standing just west of 180E, just east of the date line, you might be on the 179E hash for "today", with the 179E hash for "tomorrow" just west of you, and the 179W hash for "today" just east of you. Getting three different random coordinates to come close to each other is an order of magnitude less likely than the previous case.
- The date line can also produce a single graticule with two (or zero) hash points at the same time, depending on which side of the line the hash points for each of the two days fall on. This can happen anywhere the line divides a graticule.
- Graticules are mostly square at the equator, while at the poles they are roughly 2km wide by 111km tall triangles. If you manage to visit either pole, which itself surely proves that you are One with Nature, you could conceivably visit a large number of hash points in a short period of time. Due to the 30W rule the hash points will be split into two groups, comprising uneven halves of two different sized 360gons. At worst you will have to travel 700km to visit all 360 hash points, at best they could be arbitrarily close together (July 28 2008 exhibited .985 and .855 offsets, putting 150 hash points along a mere 4.4km trip near the north pole, and the other 210 along a larger 59km arc just 14.4km away. All 360 points would have required just 78km of travel to reach!).
Q: I tried to use the md5sum from my unix/linux command line and the hash was not the same as the comic. What is the correct command line?
- A: You probably forgot the “-n” (echo without a newline). Try this to match the example:
echo -n 2005-05-26-10458.68 | md5sum
- The md5sum command may be called md5 instead.
Implementations
If you don't want to do the calculations yourself, you don't have to. A full list of reference and practical implementations can be found on the Implementations page.
Known Issues
Several known issues are presented at Known Issues. There is an ongoing discussion on these and other issues at Talk:Main Page.
Randomness
Some people have questioned the algorithm's randomness. To analyze this issue, a tiny python program was written that uses The Algorithm to fire at a window's canvas marking pixels red. Here is the result: The background is black, a pixel becomes brighter the more often it is hit. If you can see any regularity or a pattern, please watch A Beautiful Mind and check yourself into an appropriate facility.