WikiSwarmPerl Redux

November 15, 2017

I’m revisiting some old projects, and updated my WikiSwarmPerl script (from this post) to work with the latest MediaWiki API and rictic’s fork of codeswarm.

Ran it on the revisions to Donald Trump, and it seems to work. Code below.

#!/usr/bin/perl
# by Jay Savage
# 24 Sep 2010 (9 Nov 2017)
# This code is distributed under the same license as Perl itself:
# http://dev.perl.org/licenses/artistic.html

use warnings;
use strict;

use XML::Simple;
use LWP::UserAgent;
use Time::Piece;
use URI::Escape;

my $more    = 0;
my $offset  = undef;
my $rvlimit = 500;
my $page_name    = "Donald_Trump";    # <-- The Page   my $page = uri_escape_utf8($page_name);   my $stop          = undef; my $accumulator   = [];   my $wikipedia_api = qq{http://en.wikipedia.org/w/api.php};   my $ua = LWP::UserAgent->new();
$ua->agent('WikiSwarm-Perl/0.1 (daggerquill@gmail.com); ');

# print $ua->agent ."\n";

until ($stop) {
   my $wikipedia_query = $wikipedia_api
   . qq{?action=query&prop=revisions&titles=$page&rvprop=timestamp|user|size&rvlimit=$rvlimit&format=xml};
   if ($offset) {
    #my $rvstart = "&rvstartid=$offset";
    my $rvstart = "&rvcontinue=$offset";
    $wikipedia_query .= $rvstart;
}
my $result = $ua->get($wikipedia_query);
     # print $result->decoded_content . "\n";
     my $xml    = XMLin($result->decoded_content);
     push @$accumulator, @{ $xml->{query}->{pages}->{page}->{revisions}->{rev} };
     undef $offset;
     if ( defined $xml->{'continue'}->{'rvcontinue'} ) {
         # this *will* cause an undef in match warning on the last run
         # I'm lazy
         $offset = $xml->{'continue'}->{'rvcontinue'};
         undef $stop;
     }
     else {
       $stop = "yes";
   }

}

print <<EOF;


EOF

foreach my $revision (@$accumulator) {
    # print $revision->{timestamp} . "\n";
    my $tp = Time::Piece->strptime($revision->{timestamp}, "%Y-%m-%dT%H:%M:%SZ");
    printf qq{\n}, $page, $tp->epoch * 1000, uri_escape_utf8($revision->{user});
     # yes, codeswarm uses milliseconds.
     # got bitten by that.
     # uri_escape is important. Codeswarm throws exceptions on wide chars in usernames.
 }

 print "\n"

 __END__
Advertisements

Catching up

January 31, 2012

Making up for time without a reliable internet connection: a week’s worth pf project 365.

 
023:366

024:366

025:366

026:366

027:366

028:366

029:366

030:366

022:366

January 23, 2012

022:366 by daggerquill
022:366, a photo by daggerquill on Flickr.

021:366

January 23, 2012

021:366 by daggerquill
021:366, a photo by daggerquill on Flickr.

020:366

January 23, 2012

020:366 by daggerquill
020:366, a photo by daggerquill on Flickr.

018:336

January 19, 2012

018:336 by daggerquill
018:336, a photo by daggerquill on Flickr.

Outtake from a web photo shoot for our department.

016:366

January 17, 2012

016:366 by daggerquill
016:366, a photo by daggerquill on Flickr.