Difference between revisions of "Implementations/Libraries/Perl"

From Geohashing
imported>Relet
(Created page with "== Lazy Geohasher == {{30w compliant|partial=1}} If you'd rather just stay at home and wait for a geohash to come to you (or perhaps have some other reason to participate in [[Ge...")
 
imported>Relet
Line 1: Line 1:
 +
__TOC__
 +
 +
== Alternative Perl Implementation ==
 +
 +
Because there is no way this page is complete without a lesson on endianness...
 +
 +
In writing the above "lazy" script I needed to come up with a way to do the actual hashing in Perl. So, here it is pulled out from that, and using today's data from the web service:
 +
 +
{{30w compliant|yes=1}}
 +
<pre>
 +
#!/usr/bin/perl -w
 +
require 5.9.2;
 +
 +
use POSIX qw(strftime);
 +
use LWP::Simple;
 +
use Digest::MD5 qw(md5);
 +
 +
die "usage: $0 lat long" unless @ARGV == 2;
 +
my @graticule = map { int } @ARGV;
 +
 +
sub dow_open {
 +
    my $date = strftime("%Y/%m/%d", localtime(shift));
 +
    get("http://carabiner.peeron.com/xkcd/map/data/$date");
 +
}
 +
 +
sub geohash {
 +
    map { $_ / 2**64 } unpack("Q>Q>", md5(shift));
 +
}
 +
 +
my $today = strftime("%F", localtime);
 +
my $dow = dow_open(time - ($graticule[1] > -30 ? 86400 : 0));
 +
 +
for (map { substr($_, 1) } geohash("$today-$dow")) {
 +
    print shift @graticule, $_, "\n";
 +
}
 +
</pre>
 +
 +
This is the same bare output format as the shell script, for piping into your favorite formatter or google-launcher or whatever. The reason it requires Perl 5.10, however, is the code used to interpret the hash:
 +
 +
<pre>
 +
map { $_ / 2**64 } unpack("Q>Q>", md5(shift))
 +
</pre>
 +
 +
One unfortunate thing I have noticed across implementations is that a lot of effort has been spent converting hexadecimal into decimal. It is important to remember that MD5 hashes are 128 *bits* -- hexadecimal is just a form of representation. (It's very convenient, and we see it everywhere. But even in the original comic, it's representation.) In languages that provide the power to deal with data directly, the familiar hex representation can frequently be skipped in our quest to present numbers as decimal. This is what unpack does above (The Python implementation uses their flavor of unpack as well. I am sure Ruby has something similar).
 +
 +
I point this out only because Perl neglected to get this right until recently, and so the code as written will not run on 5.8 or earlier. As they are presented in the algorithm, the two 64-bit binary fractions (think: 1/2's place, 1/4's place, 1/8's place... with the caveat that we only see the numerator digits; the denominators are all 1!) from the 128 bits of MD5 are big-endian. Sadly, the machine running Perl might be big-endian or little-endian (think: the first 8 places at the end, then the next 8 places before that, etc... it's reversed by bytes, not bits). 5.10 provides the "<" and ">" endianness specifiers for all types, even 64-bit numbers. But on 5.8, attempting to directly grab the bits will always use the machine's native arrangement, and so only work if you're in big-endian-land. Most of us (running x86 CPUs) are not. Here is one (ugly) way to deal with it:
 +
 +
<pre>
 +
map { $_ / 2**64 } reverse unpack("QQ", reverse md5(shift));
 +
</pre>
 +
 +
Evaluation works backward from the right; by reversing the byte string (remember how it's backwards by bytes? the word that looks forward to us looks backward to them, too), we are able to unpack two 64-bit chunks that *would* be the right numbers, but are now (since the whole thing was reversed) in the wrong order, so we flip them around again. If they weren't the same sizes, we would have had to reverse the unpack specification as well (in fact, we did; you just couldn't tell). Fun, no? So if you need to run this on 5.8, patch this in and remember to delete the "require" line.
 +
 +
==All-Time Closest Geohash to Where You Lived At The Time==
 +
{{30w compliant|yes=1}}
 +
 +
This Perl script takes as its input a comma-separated text file giving the starting and ending dates and latitude and longitude of every place you've ever lived (or worked, or any other place you're interested in), and finds the closest hashpoint ever to come to any of those places while they were current for you.  See if you have a retroactive [[Couch Potato Geohash]] somewhere in your history!  (You're out of luck if you lived anywhere before 1928-10-01, though, since that's as far back as the data goes.)
 +
 +
<pre>
 +
#!/usr/bin/perl -w
 +
 +
# Program to find nearest historical geohash to place(s) you're interested
 +
# in, such as your homes or workplaces.  As it goes forward through the
 +
# dates it's checking for hashes in, it outputs each one that sets a new
 +
# record for closest yet to a place of interest.
 +
 +
use strict;
 +
use Geo::Hashing;
 +
 +
# set current date
 +
my $currdate = sprintf("%04d-%02d-%02d", (localtime)[5]+1900,
 +
                                        (localtime)[4]+1,
 +
                                        (localtime)[3]);
 +
 +
# Initialize extremes so they're later replaced by actual values
 +
my $mindate = $currdate;
 +
my $maxdate = '1928-10-01';
 +
 +
# Read in places and dates of interest.  File should have all the locations
 +
# you want to check for nearby past geohashes in, such as all the places
 +
# you've ever lived (or worked), with the date range for each.
 +
# Date ranges may overlap, so you can track multiple homes/workplaces at once.
 +
# Put in file places.csv in same directory as this script
 +
# Each line is in format: FromDate, ToDate, Latitude, Longitude, Title
 +
# Dates are in form YYYY-MM-DD
 +
# Lat/Lon are in decimal degrees, negative numbers for W / S
 +
# Empty FromDate means 'from beginning of time', 1928-10-01 in this case
 +
# Empty ToDate means 'until present', going up to and including current date
 +
 +
my $i = 0;
 +
my @fromdate = ();
 +
my @todate = ();
 +
my @lat = ();
 +
my @lon = ();
 +
my @title = ();
 +
open INFILE, "<places.csv" or die "You must create a places.csv file in the same directory as this script.
 +
This is intended to contain a list of every place you've lived
 +
(or worked, etc.), by date and location.
 +
Format: 1 line per place
 +
FromDate,ToDate,Latitude,Longitude,Title
 +
Dates in format YYYY-MM-DD
 +
Lat/Lon in decimal degrees
 +
Title is whatever text string you want to identify a place with.\n";
 +
while (<INFILE>)
 +
{
 +
  chomp;
 +
  ($fromdate[$i], $todate[$i], $lat[$i], $lon[$i], $title[$i]) = split(/\,/);
 +
  $fromdate[$i] = '1928-10-01' if !$fromdate[$i];
 +
  $todate[$i] = $currdate if !$todate[$i];
 +
 
 +
  $mindate = $fromdate[$i] if $fromdate[$i] lt $mindate;
 +
  $maxdate = $todate[$i] if $todate[$i] gt $maxdate;
 +
 
 +
  $i++;
 +
}
 +
close INFILE;
 +
 +
my $latmiles = 69.1703234283616; # Conversion factor of lat degrees to miles; always constant
 +
 +
my @MonthLen = (0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
 +
 +
my $date = $mindate;
 +
my ($yr, $mo, $dy) = split(/-/, $date);
 +
$MonthLen[2] = 29 if &is_leap($yr);
 +
 +
my $mindist = 9999; # Initialize min distance to high number
 +
 +
while ($date le $maxdate)
 +
{
 +
  for ($i = 0; $i<=$#fromdate; $i++)
 +
  {
 +
    if ($date ge $fromdate[$i] && $date le $todate[$i])
 +
    {
 +
      my ($lat, $lon) = (int($lat[$i]), int($lon[$i])); # Integer lat/lon for hashpoint lookup
 +
   
 +
      my $geo = new Geo::Hashing(lat => $lat, lon => $lon, date => $date); # Get hashpoint for date
 +
     
 +
      my $latdiff = abs($geo->lat - $lat[$i]); # Find absolute distance in each axis direction
 +
      my $londiff = abs($geo->lon - $lon[$i]);
 +
     
 +
      my $lonmiles = 69.1703234283616 * cos($lat[$i]*0.0174532925199433); # Conversion factor of lon degrees to miles
 +
      my $latdiffmiles = $latdiff * $latmiles; # Convert to miles
 +
      my $londiffmiles = $londiff * $lonmiles;
 +
     
 +
      my $distance = sqrt($latdiffmiles**2 + $londiffmiles**2); # Find hypotenuse
 +
     
 +
      if ($distance < $mindist) # Found new shortest distance so far
 +
      {
 +
        $mindist = $distance;
 +
        printf "$date: %0.6f, %0.6f -- %0.4f miles from $title[$i]\n", $geo->lat, $geo->lon, $distance;
 +
      }
 +
    }
 +
  }
 +
 
 +
  # Advance date
 +
  $dy++;
 +
  if ($dy > $MonthLen[$mo])
 +
  {
 +
    $dy = 1;
 +
    $mo++;
 +
    if ($mo > 12)
 +
    {
 +
      $mo = 1;
 +
      $yr++;
 +
      if (&is_leap($yr))
 +
      {
 +
        $MonthLen[2] = 29;
 +
      }
 +
      else
 +
      {
 +
        $MonthLen[2] = 28;
 +
      }
 +
    }
 +
  }
 +
 
 +
  $date = sprintf("%04d-%02d-%02d", $yr, $mo, $dy);
 +
}
 +
 +
print "Finished searching.\n";
 +
 +
sub is_leap {
 +
  # Returns 1 if leap year, 0 if not
 +
  my($yr) = @_;
 +
 
 +
  if ($yr%4)
 +
  {
 +
    0;
 +
  }
 +
  elsif (!($yr%400))
 +
  {
 +
    1;
 +
  }
 +
  elsif ($yr%100)
 +
  {
 +
    1;
 +
  }
 +
  else
 +
  {
 +
    0;
 +
  }
 +
}
 +
</pre>
 +
 +
Sample places.csv file:
 +
 +
<pre>
 +
1982-08-26,1983-05-10,40.441164,-79.939020,Freshman Dorm
 +
1983-08-25,1986-05-12,40.452789,-79.948439,Off-Campus Living
 +
2000-02-03,,26.340754,-80.129356,Apartment
 +
</pre>
 +
 
== Lazy Geohasher ==
 
== Lazy Geohasher ==
 
{{30w compliant|partial=1}}
 
{{30w compliant|partial=1}}

Revision as of 07:52, 10 April 2012

Alternative Perl Implementation

Because there is no way this page is complete without a lesson on endianness...

In writing the above "lazy" script I needed to come up with a way to do the actual hashing in Perl. So, here it is pulled out from that, and using today's data from the web service:

This implementation IS FULLY 30W-compliant.
#!/usr/bin/perl -w
require 5.9.2;

use POSIX qw(strftime);
use LWP::Simple;
use Digest::MD5 qw(md5);

die "usage: $0 lat long" unless @ARGV == 2;
my @graticule = map { int } @ARGV;

sub dow_open {
    my $date = strftime("%Y/%m/%d", localtime(shift));
    get("http://carabiner.peeron.com/xkcd/map/data/$date");
}

sub geohash {
    map { $_ / 2**64 } unpack("Q>Q>", md5(shift));
}

my $today = strftime("%F", localtime);
my $dow = dow_open(time - ($graticule[1] > -30 ? 86400 : 0));

for (map { substr($_, 1) } geohash("$today-$dow")) {
    print shift @graticule, $_, "\n";
}

This is the same bare output format as the shell script, for piping into your favorite formatter or google-launcher or whatever. The reason it requires Perl 5.10, however, is the code used to interpret the hash:

map { $_ / 2**64 } unpack("Q>Q>", md5(shift))

One unfortunate thing I have noticed across implementations is that a lot of effort has been spent converting hexadecimal into decimal. It is important to remember that MD5 hashes are 128 *bits* -- hexadecimal is just a form of representation. (It's very convenient, and we see it everywhere. But even in the original comic, it's representation.) In languages that provide the power to deal with data directly, the familiar hex representation can frequently be skipped in our quest to present numbers as decimal. This is what unpack does above (The Python implementation uses their flavor of unpack as well. I am sure Ruby has something similar).

I point this out only because Perl neglected to get this right until recently, and so the code as written will not run on 5.8 or earlier. As they are presented in the algorithm, the two 64-bit binary fractions (think: 1/2's place, 1/4's place, 1/8's place... with the caveat that we only see the numerator digits; the denominators are all 1!) from the 128 bits of MD5 are big-endian. Sadly, the machine running Perl might be big-endian or little-endian (think: the first 8 places at the end, then the next 8 places before that, etc... it's reversed by bytes, not bits). 5.10 provides the "<" and ">" endianness specifiers for all types, even 64-bit numbers. But on 5.8, attempting to directly grab the bits will always use the machine's native arrangement, and so only work if you're in big-endian-land. Most of us (running x86 CPUs) are not. Here is one (ugly) way to deal with it:

map { $_ / 2**64 } reverse unpack("QQ", reverse md5(shift));

Evaluation works backward from the right; by reversing the byte string (remember how it's backwards by bytes? the word that looks forward to us looks backward to them, too), we are able to unpack two 64-bit chunks that *would* be the right numbers, but are now (since the whole thing was reversed) in the wrong order, so we flip them around again. If they weren't the same sizes, we would have had to reverse the unpack specification as well (in fact, we did; you just couldn't tell). Fun, no? So if you need to run this on 5.8, patch this in and remember to delete the "require" line.

All-Time Closest Geohash to Where You Lived At The Time

This implementation IS FULLY 30W-compliant.

This Perl script takes as its input a comma-separated text file giving the starting and ending dates and latitude and longitude of every place you've ever lived (or worked, or any other place you're interested in), and finds the closest hashpoint ever to come to any of those places while they were current for you. See if you have a retroactive Couch Potato Geohash somewhere in your history! (You're out of luck if you lived anywhere before 1928-10-01, though, since that's as far back as the data goes.)

#!/usr/bin/perl -w

# Program to find nearest historical geohash to place(s) you're interested
# in, such as your homes or workplaces.  As it goes forward through the
# dates it's checking for hashes in, it outputs each one that sets a new
# record for closest yet to a place of interest.

use strict;
use Geo::Hashing;

# set current date
my $currdate = sprintf("%04d-%02d-%02d", (localtime)[5]+1900,
                                         (localtime)[4]+1,
                                         (localtime)[3]);

# Initialize extremes so they're later replaced by actual values
my $mindate = $currdate;
my $maxdate = '1928-10-01';

# Read in places and dates of interest.  File should have all the locations
# you want to check for nearby past geohashes in, such as all the places
# you've ever lived (or worked), with the date range for each.
# Date ranges may overlap, so you can track multiple homes/workplaces at once.
# Put in file places.csv in same directory as this script
# Each line is in format: FromDate, ToDate, Latitude, Longitude, Title
# Dates are in form YYYY-MM-DD
# Lat/Lon are in decimal degrees, negative numbers for W / S
# Empty FromDate means 'from beginning of time', 1928-10-01 in this case
# Empty ToDate means 'until present', going up to and including current date

my $i = 0;
my @fromdate = ();
my @todate = ();
my @lat = ();
my @lon = ();
my @title = ();
open INFILE, "<places.csv" or die "You must create a places.csv file in the same directory as this script.
This is intended to contain a list of every place you've lived
(or worked, etc.), by date and location.
Format: 1 line per place
FromDate,ToDate,Latitude,Longitude,Title
Dates in format YYYY-MM-DD
Lat/Lon in decimal degrees
Title is whatever text string you want to identify a place with.\n";
while (<INFILE>)
{
  chomp;
  ($fromdate[$i], $todate[$i], $lat[$i], $lon[$i], $title[$i]) = split(/\,/);
  $fromdate[$i] = '1928-10-01' if !$fromdate[$i];
  $todate[$i] = $currdate if !$todate[$i];
  
  $mindate = $fromdate[$i] if $fromdate[$i] lt $mindate;
  $maxdate = $todate[$i] if $todate[$i] gt $maxdate;
  
  $i++;
}
close INFILE;

my $latmiles = 69.1703234283616; # Conversion factor of lat degrees to miles; always constant

my @MonthLen = (0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);

my $date = $mindate;
my ($yr, $mo, $dy) = split(/-/, $date);
$MonthLen[2] = 29 if &is_leap($yr);

my $mindist = 9999; # Initialize min distance to high number

while ($date le $maxdate)
{
  for ($i = 0; $i<=$#fromdate; $i++)
  {
    if ($date ge $fromdate[$i] && $date le $todate[$i])
    {
      my ($lat, $lon) = (int($lat[$i]), int($lon[$i])); # Integer lat/lon for hashpoint lookup
    
      my $geo = new Geo::Hashing(lat => $lat, lon => $lon, date => $date); # Get hashpoint for date
      
      my $latdiff = abs($geo->lat - $lat[$i]); # Find absolute distance in each axis direction
      my $londiff = abs($geo->lon - $lon[$i]);
      
      my $lonmiles = 69.1703234283616 * cos($lat[$i]*0.0174532925199433); # Conversion factor of lon degrees to miles
      my $latdiffmiles = $latdiff * $latmiles; # Convert to miles
      my $londiffmiles = $londiff * $lonmiles;
      
      my $distance = sqrt($latdiffmiles**2 + $londiffmiles**2); # Find hypotenuse
      
      if ($distance < $mindist) # Found new shortest distance so far
      {
        $mindist = $distance;
        printf "$date: %0.6f, %0.6f -- %0.4f miles from $title[$i]\n", $geo->lat, $geo->lon, $distance;
      }
    }
  }
  
  # Advance date
  $dy++;
  if ($dy > $MonthLen[$mo])
  {
    $dy = 1;
    $mo++;
    if ($mo > 12)
    {
      $mo = 1;
      $yr++;
      if (&is_leap($yr))
      {
        $MonthLen[2] = 29;
      }
      else
      {
        $MonthLen[2] = 28;
      }
    }
  }
  
  $date = sprintf("%04d-%02d-%02d", $yr, $mo, $dy);
}

print "Finished searching.\n";

sub is_leap {
  # Returns 1 if leap year, 0 if not
  my($yr) = @_;
  
  if ($yr%4)
  {
    0;
  }
  elsif (!($yr%400))
  {
    1;
  }
  elsif ($yr%100)
  {
    1;
  }
  else
  {
    0;
  }
}

Sample places.csv file:

1982-08-26,1983-05-10,40.441164,-79.939020,Freshman Dorm
1983-08-25,1986-05-12,40.452789,-79.948439,Off-Campus Living
2000-02-03,,26.340754,-80.129356,Apartment

Lazy Geohasher

This implementation IS PARTIALLY 30W-compliant.

If you'd rather just stay at home and wait for a geohash to come to you (or perhaps have some other reason to participate in Geohash Hacking), there's a perl script which will search for combinations of times and DOW Jones opening prices to find when you could win the much coveted Couch Potato Geohash or Cubicle Geohash award. Just put in the ranges of values for the stock market and the time you want to wait for, and watch those CPU cycles fly! -- thaniel.drake@gmail.com.

30w compliance statement: This code does not use live data. Users are expected to determine from the date of the Geohash which opening value is relevant for their locale.

Disclaimer: The author humbly requests that you do not manipulate the Dow Jones Industrial Average to ensure that a geohash collides with your address. The author accepts no responsibility for any resulting financial instability you cause by such manipulations.

Here is an improved implementation which sorts results by great-circle distance. Distance cutoff, length of search, and dollar spread searched may be adjusted at the command line. Days may be a positive integer count of days following today or a comma-separated list of %F datespecs. Same 30W caveats apply.

This implementation IS PARTIALLY 30W-compliant.
#!/usr/bin/perl -w
require 5.9.2;

use Digest::MD5 qw(md5);
use POSIX qw(strftime);
use Math::Trig qw(deg2rad great_circle_distance);

die "usage: $0 lat lon orig-dow [spread] [days] [max-km]" unless @ARGV >= 3;
my ($lat, $lon, $orig, $spread, $days, $max_km) = @ARGV;
my ($g_lat, $g_lon) = map { int } $lat, $lon;
$spread ||= 500; $days ||= 7; $max_km ||= 1;

sub geohash { map { $_ / 2**64 } unpack("Q>Q>", md5(shift)) }
sub genday {
    my $date = shift;
    for (($orig - $spread) * 100 .. ($orig + $spread) * 100) {
        my $dow = sprintf("%.2f", $_ / 100);
        ($f_lat, $f_lon) = map { substr($_, 1) } geohash("$date-$dow");
        $gh{"$date-$dow"} = [$g_lat . $f_lat, $g_lon . $f_lon];
    }
}

if ($days =~ /^\d{4,}-\d{2}-\d{2}/) { genday($_) for split(/,/, $days) }
else { genday(strftime("%F", localtime(time + 86400 * $_))) for (1...$days) }

sub d2r { deg2rad($_[0]), deg2rad(90 - $_[1]) }
sub gdist { great_circle_distance(d2r($lat, $lon), d2r(@{$_[0]}), 6378) }

printf "%s -> %f,%f (%.3fkm)\n", $_->[0], @{$gh{$_->[0]}}, $_->[1] for
    sort { $a->[1] <=> $b->[1] }
        grep { $_->[1] < $max_km }
            map { [$_, gdist($gh{$_})] } keys %gh;

For discussion of the actual algorithm stuff hiding in there, please see the next section.