One more bit of Perl and Wikipedia code to throw out there, as a follow-up to my previous post.

This leverages the MiediaWiki APi to generate quick and dirty analytics. A couple of my frequently-used examples are shown–top days, top users, recent edits, histogram–but I use it a a general-purpose platform that I can customize for specific questions. Sample output looks like the screen shot below.

The screen shot also answers one of the questions people ask me most frequently, why I hard code values into my scripts instead of parsing @ARGV. I usually run things directly from within my editor (normally emacs or TextMate) and it’s often easier in that scenario to edit the code than the command line.

Code below the break.

Read the rest of this entry »

Wikiswarm Perl

January 24, 2011

This is a little something that didn’t make it into my latest project. After reading James Bridle’s post on Wikipedia and historiography last fall–where he uses the Iraq War article as an example–I thought it would be interesting to visualize the article in a different way: using Jamie Wilkinson’s Wikiswarm. The result was quite interesting, particularly how long it took for things to heat up, and the way no on seemed to care about the war during the World Cup. Unfortunately, both the Wikipedia API and the wikipedia and URI gems have changed over the last two years. wilkinson’s code no longer works, and I didn’t know enough Ruby to fix it quickly. So I re-wrote the download routine and xml accumulator in Perl. The video is best in HD at YouTube. Code below.

 Read the rest of this entry »

Some thing never get old

August 27, 2007

This little Perl script I wrote 2 1/2 years ago to recover images from digital camera media is one of them.

It’s weird the way sometimes you spend months working on projects you never look at again, and then other times you write 25 lines of what you think is a one-off, and end up using it almost every day.

Blogged with Flock

Somehow, I managed to resist posting this reply on beginners-perl. Apparently I hate flame wars more than I hate Perl6. Who knew? I still want to put it out there, though:

On 5/14/07, Chas Owens <> wrote:

Not that I use them myself, but the “proper” (for various values of
proper) way to use POD for multi-line comments is

=for comment
Commented text

Yes it sucks, yes it will be fixed* in Perl 6* (or possible earlier if
the trend of adopting Perl 6 features in Perl 5 releases continues).
It will be a quote-like operator, so you will be able to say

This is
a multi-line comment

so is this
#[ and this one is nested ]

#{{{ multiple brackets must match on both sides }}}

* for various definitions of fixed, there are some caveats, the
biggest being that a # as the first character on a line is always
considered a single line comment even if the next character is a
bracketing character. Don’t ask me why.


Actually, only for a special subset of fixed where 'fixed' eq 'broken'. All I have to say to the Perl6 team on this and a thousand other syntax issues is: “if it ain’t broke, don’t fix it.” We don’t need Perl to be Python or Java. We already have Python and Java, and Ruby besides. We need Perl to be Perl. Perl6, however, is manifestly not Perl, and this is great example of why.

One of the hallmarks of Perl has always been a loose disregard for whitespace. “hash-bracket begins a multi-line comment except at the beginning of a line, or except succeeding a backslash and preceding period, or unless preceded by a space” sounds like the demented ramblings of man trying to write a Java compiler in haskell. Oh, wait… But I digress.

Fortunately, the POD syntax remains fairly sensible. The only change, so far as I know, is a fix to something that was actually broken, namely the =for comment syntax. And what’s more, it’s usable by mere mortals.


After a good night’s sleep, I’m glad I didn’t hit send. Griping over Perl6 syntax doesn’t belong on the beginner’s list. Of course, I would argue that references to Larry’s Apocalypses don’t, either. Especially no in re FAQs. But that’s a different story.

For those of you playing along at home, the thread started with a question about multi-line comments, and why Perl doesn’t haven an equivalent to C’s ‘/* ... */‘ notation. It’s a common enough question. The simple answer, of course, is POD. For a few corner cases, though, it’s a less-than-optimal solution, so people occasionally clamor for something else. Perl6 is going to give it to them. A hash followed immediately by a bracket of any kind, possibly doubled or tripled, will begin a multi-line comment.

The problem, though, is that the new syntax doesn’t play nice with oh, about a thousand bits and pieces of preprocessors and common utilities. So a hash in column one will remain the beginning of a single-line comment, regardless of what comes after it:

# this is a single-line comment

#{ so is this }

this is a multi-line comment

this is a syntax error

Confused? You will be.

Ok, last post about twitter for a while, but I thought I’d share.

One of the things I don’t like about twitter is logging on in the morning to find more tweets than my client will display. Going to and hitting “next” a few times isn’t really any answer; it would take all morning at the current refresh rate. To address the issue, I whipped up a little Perl script called twittersleep. It uses LWP::UserAgent and XML::Simple for the grunt work. LWP::UserAgent is part of the core, and if you do any web work at all, you probably have XML::Simple.

When I’m going to be AFK for a while, I run twittersleep. It grabs my friends’ statuses from the XML API every five minutes and logs them to a simple *DBM database.

When I come back, I run twittersleep -r to read them back out. Currently it generates a simple text dump suitable for passing to less or parsing with grep.

You can grab the code here.

In the learn something new every day department, we have this gem from John W. Krahn on the perl beginners list in response to the following question:

I have a file that I would like to read in then
do the following:

– Read in each line and remove any duplicate text
with tags
– Sort the file so all tag IDs are in sequential
– Save the results to a different file name.

Can this be done easily? If so, how? I’m really a
newbie at this stuff. Any help would be greatly

The sample data looked like `Data 1. My own advice to the poster involved a `while ()` loop, a hash, a couple of splits and regexes, and then sorting the result. I didn’t even bother posting it once I saw John’s elegant reply:

my %seen;
print $out map $_->[ 1 ],
sort { $a->[ 0 ] $b->[ 0 ] }
map [ //, $_ ],
grep />([^< ]+)<!– && !$seen{ $1 }++,

The real brilliance here is line 4, and I have to admit it had me stumped for a minute. `m//` returns 1 or 0 for success or failure in scalar context, and an array of the parenthesis captures, if any, in list context. When I first read this code, I was convinced that the second `map` should only be passing 1’s to `sort`, because the result of the match was being assigned to an index of an anonymous array, which it seems should be a scalar. You would think that `$x = [ /(.)/ ]` should be pretty much equivalent to `$y = /(.)/; $x = [ $y ]`. You would, however, be wrong. It’s actually functionally equivalent to `@y = /(.)/; $x = [ @y ]`. Expresisons inside `[]` are evaluated in list context. This makes sense, too, because it enables things like `$x = [0..255]`. And, of course, it means that `m//` returns a list of paren matches.

Update: Ok, so I lied about not missing the formatting plugins. I’m not sure what happened here, but all of my angle brackets keep getting turned into comments. Just imagine that it looks like what I say it should. Better yet, just read the thread.


September 21, 2005

I want to apologize to anyone who’s commented here lately, particularly to Autrijus and grumpy who commented on my Perl6 rant. For some reason, I don’t seem to be getting emails when people comment. I’m looking into it.