Difference between revisions of "Sourcerer Implementation Notes"

From Geohashing
imported>Sourcerer
m (PHP Example)
imported>Sourcerer
m (More PHP Code)
Line 2: Line 2:
  
 
My plan is to create some new analysis tools for this wiki. This page might help others to implement their tools.
 
My plan is to create some new analysis tools for this wiki. This page might help others to implement their tools.
 +
 +
I'm using a standard Windows 7 PHP installation and running the examples from the command line. Linux or other PHPs should work the same.
  
 
# Get some page content ("Hello World!").
 
# Get some page content ("Hello World!").
Line 28: Line 30:
 
=== PHP Code - List first twenty of Consecutive geohash achievement - Command line application ===
 
=== PHP Code - List first twenty of Consecutive geohash achievement - Command line application ===
  
  <?php
+
  <nowiki><?php
 
   $html = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
 
   $html = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
 
   echo $html;
 
   echo $html;
?>
+
?></nowiki>
  
 
=== PHP Code - As above but use JSON ===
 
=== PHP Code - As above but use JSON ===
  
  <?php
+
  <nowiki><?php
  $json    = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
+
  $json    = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
  $readable = json_decode($json, true);
+
  $readable = json_decode($json, true);
  print_r($readable);
+
  print_r($readable);
?>
+
?></nowiki>
  
 
=== PHP Expedition as JSON ===
 
=== PHP Expedition as JSON ===
  
  <?php
+
  <nowiki><?php
  $json    = file_get_contents('http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=2015-02-07_52_1&prop=revisions&rvprop=content');
+
  $json    = file_get_contents('http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=2015-02-07_52_1&prop=revisions&rvprop=content');
  $readable = json_decode($json, true);
+
  $readable = json_decode($json, true);
  print_r($readable);
+
  print_r($readable);
?>
+
?></nowiki>
  
=== PHP Get every page title for specified category ===
+
=== PHP Get every page title for a specified category ===
  
This fetches 100 pages at a time with 5 second pauses between fetches to avoid loading the wikik too much.
+
This fetches 100 pages at a time with 5 second pauses between fetches to avoid loading the wiki too much.
  
 
  <nowiki><?php
 
  <nowiki><?php

Revision as of 13:19, 9 February 2015

Plan of Campaign

My plan is to create some new analysis tools for this wiki. This page might help others to implement their tools.

I'm using a standard Windows 7 PHP installation and running the examples from the command line. Linux or other PHPs should work the same.

  1. Get some page content ("Hello World!").
  2. Write some code (perhaps PHP command line code) to do the same. Make sure this does not overload the wiki.
  3. Download different kinds of page.
  4. Do some simple statistics on the downloaded data.
  5. Create reports in wiki markup.
  6. If it works, upload them to the wiki.

Various Downloads

Help Page

http://wiki.xkcd.com/wgh/api.php

JSON Page Content Dump

http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=User:Sourcerer&prop=revisions&rvprop=content

List first twenty of Consecutive geohash achievement

http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20

List twenty more of Consecutive geohash achievement

http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmcontinue=page%7C323030392d30332d31362035322030%7C17773&cmlimit=20

PHP Code - List first twenty of Consecutive geohash achievement - Command line application

<?php
   $html = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
   echo $html;
?>

PHP Code - As above but use JSON

<?php
  $json     = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
  $readable = json_decode($json, true);
  print_r($readable);
?>

PHP Expedition as JSON

<?php
  $json     = file_get_contents('http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=2015-02-07_52_1&prop=revisions&rvprop=content');
  $readable = json_decode($json, true);
  print_r($readable);
?>

PHP Get every page title for a specified category

This fetches 100 pages at a time with 5 second pauses between fetches to avoid loading the wiki too much.

<?php
  // =======================================================================
  // === These constant values must be set before running the script =======
  // =======================================================================
  
  define("API_URL", 'http://wiki.xkcd.com/wgh/api.php');    // Search any wiki page source for "api.php" to find this path on other MediaWiki sites
  
  // =======================================================================
  // Return an array containing the cmcontinue value followed by page titles
  // Index [0] contains cmcontinue needed to get the next page
  // Index [1] up to [cmlimit] contains the page titles, could be zero items
  // =======================================================================
  function getTitlesInCategory($cmtitle, $cmlimit = 10, $cmcontinue = "")
  {
    if ($cmcontinue == "") { $continue = ""; } else { $continue = "&cmcontinue=$cmcontinue"; }

    $url         = API_URL . "?action=query&format=json&list=categorymembers&cmtitle=$cmtitle&cmlimit=$cmlimit$continue";
    $json        = file_get_contents($url);
    $decodedjson = json_decode($json, true);
    
    echo $url . "\n";
    
    $titles = array();
    
    if (isset($decodedjson['query-continue']))
    {
      $titles[] = $decodedjson['query-continue']['categorymembers']['cmcontinue'];  // Next page if there is one
    }
    else
    {
      $titles[] = "";
    }

    foreach($decodedjson['query']['categorymembers'] as $value)
    {
      $titles[] = $value['title'];
    }
    
    return $titles;
  }
  // =======================================================================

  // =======================================================================
  // Return an array containing ALL page titles for the category
  // =======================================================================
  function getAllTitlesInCategory($cmtitle)
  {
    $allTitles = array();
  
    $titles = getTitlesInCategory($cmtitle, 100);  
    // print_r($titles);  
    $cmcontinue = $titles[0];
    unset($titles[0]);
    $allTitles = array_merge($allTitles, $titles);
  
    while($cmcontinue != "")
    {
      sleep(5);
      $titles = getTitlesInCategory($cmtitle, 100, $cmcontinue);
      // print_r($titles);  
      $cmcontinue = $titles[0];
      unset($titles[0]);
      $allTitles = array_merge($allTitles, $titles);
    }
    
    return $allTitles;
  }
  // =======================================================================
  
  // =======================================================================
  // main program
  // =======================================================================
  
  print_r(getAllTitlesInCategory("Category:Consecutive_geohash_achievement"));
?>