Sourcerer Implementation Notes

From Geohashing
Revision as of 13:31, 10 February 2015 by imported>Sourcerer (Planing Campaign)

Plan of Campaign

My plan is to create some analysis tools for this wiki. This page might help others to implement their tools.

I'm using a standard Windows PHP 5.3.5 installation and running the examples from the command line. Linux or other PHPs should work the same.

  1. Get some page content ("Hello World!").
  2. Write some PHP command line code to do the same. Make sure this does not overload the wiki.
  3. Download lists of pages and different kinds of page.
  4. Do some simple statistics on the downloaded data.
  5. Create reports in wiki markup.
  6. If it works, upload the reports to the wiki.

Various Downloads

Help Page

http://wiki.xkcd.com/wgh/api.php

JSON Page Content Dump

http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=User:Sourcerer&prop=revisions&rvprop=content

List first twenty of Consecutive geohash achievement

http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20

List twenty more of Consecutive geohash achievement

http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmcontinue=page%7C323030392d30332d31362035322030%7C17773&cmlimit=20

PHP Code - List first twenty of Consecutive geohash achievement - Command line application

<?php
   $html = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
   echo $html;
?>

PHP Code - As above but use JSON

<?php
  $json     = file_get_contents('http://wiki.xkcd.com/wgh/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Consecutive_geohash_achievement&cmlimit=20');
  $readable = json_decode($json, true);
  print_r($readable);
?>

PHP Expedition as JSON

<?php
  $json     = file_get_contents('http://wiki.xkcd.com/wgh/api.php?format=json&action=query&titles=2015-02-07_52_1&prop=revisions&rvprop=content');
  $readable = json_decode($json, true);
  print_r($readable);
?>

PHP Get every page title for a specified category

This fetches 100 page titles at a time with 5 second pauses between fetches to avoid loading the wiki too much.

<?php
  // =======================================================================
  // === These constant values must be set before running the script =======
  // =======================================================================
  
  define("API_URL", 'http://wiki.xkcd.com/wgh/api.php');    // Search any wiki page source for "api.php" to find this path on other MediaWiki sites
  
  // =======================================================================
  // Return an array containing the cmcontinue value followed by page titles
  // Index [0] contains cmcontinue needed to get the next page
  // Index [1] up to [cmlimit] contains the page titles, could be zero items
  // =======================================================================
  function getTitlesInCategory($cmtitle, $cmlimit = 10, $cmcontinue = "")
  {
    if ($cmcontinue == "") { $continue = ""; } else { $continue = "&cmcontinue=$cmcontinue"; }

    $url         = API_URL . "?action=query&format=json&list=categorymembers&cmtitle=$cmtitle&cmlimit=$cmlimit$continue";
    $json        = file_get_contents($url);
    $decodedjson = json_decode($json, true);
    
    echo $url . "\n";
    
    $titles = array();
    
    if (isset($decodedjson['query-continue']))
    {
      $titles[] = $decodedjson['query-continue']['categorymembers']['cmcontinue'];  // Next page if there is one
    }
    else
    {
      $titles[] = "";
    }

    foreach($decodedjson['query']['categorymembers'] as $value)
    {
      $titles[] = $value['title'];
    }
    
    return $titles;
  }
  // =======================================================================

  // =======================================================================
  // Return an array containing ALL page titles for the category
  // =======================================================================
  function getAllTitlesInCategory($cmtitle)
  {
    $allTitles = array();
  
    $titles = getTitlesInCategory($cmtitle, 100);  
    // print_r($titles);  
    $cmcontinue = $titles[0];
    unset($titles[0]);
    $allTitles = array_merge($allTitles, $titles);
  
    while($cmcontinue != "")
    {
      sleep(5);
      $titles = getTitlesInCategory($cmtitle, 100, $cmcontinue);
      // print_r($titles);  
      $cmcontinue = $titles[0];
      unset($titles[0]);
      $allTitles = array_merge($allTitles, $titles);
    }
    
    return $allTitles;
  }
  // =======================================================================
  
  // =======================================================================
  // main program
  // =======================================================================
  
  print_r(getAllTitlesInCategory("Category:Consecutive_geohash_achievement"));
?>