One more bit of Perl and Wikipedia code to throw out there, as a follow-up to my previous post.

This leverages the MiediaWiki APi to generate quick and dirty analytics. A couple of my frequently-used examples are shown–top days, top users, recent edits, histogram–but I use it a a general-purpose platform that I can customize for specific questions. Sample output looks like the screen shot below.

The screen shot also answers one of the questions people ask me most frequently, why I hard code values into my scripts instead of parsing @ARGV. I usually run things directly from within my editor (normally emacs or TextMate) and it’s often easier in that scenario to edit the code than the command line.

Code below the break.

Read the rest of this entry »


Wikiswarm Perl

January 24, 2011

This is a little something that didn’t make it into my latest project. After reading James Bridle’s post on Wikipedia and historiography last fall–where he uses the Iraq War article as an example–I thought it would be interesting to visualize the article in a different way: using Jamie Wilkinson’s Wikiswarm. The result was quite interesting, particularly how long it took for things to heat up, and the way no on seemed to care about the war during the World Cup. Unfortunately, both the Wikipedia API and the wikipedia and URI gems have changed over the last two years. wilkinson’s code no longer works, and I didn’t know enough Ruby to fix it quickly. So I re-wrote the download routine and xml accumulator in Perl. The video is best in HD at YouTube. Code below.

 Read the rest of this entry »

Savage’s Theorem

April 14, 2008

I don’t remember where or when, now, but but a few years ago I came across a piece of advice from a respected security expert that ran something like this: “If you treat your users like criminals, they will invariably prove you right.” Even though I don’t remember who said it or where (I think it may have been an article somewhere by Rob Fickenger), it’s stuck with me, because there are a several important ideas packed into it.

The first has to do with administrator mindset: The network exists for the users, and you should be protecting it for them, not from them. If your users are in your threat model, the problem is probably you, not them (of course, we’re talking about sysadmins, not webmasters of public sites, here). If you’re suspicious and go looking for trouble, though, you’ll probably find it. We’ve all worked with admins like that–and at some point, some of us have probably fallen into the trap of being admins like that–so most of us can recognize why that attitude isn’t productive.

The second idea is potentially transformative: policy and attitude influence user behaviors as much as they respond to them. Part of it has to do with the path of least resistance. If the policy makes it difficult for regular users to do their jobs because of fear that some users will abuse their privileges, then even normal users will start looking for ways to circumvent the system. This is why the RIAA approach to copyright fails so miserably. But part of it also has to do with fostering an general spirit of trust, and with the way technocultural knowledge is disseminated. Users look to policy to establish norms. If the policy implies that most users are devious hackers attempting to subvert the system to their own uses, then that is what users will assume they should be. If, on the other hand, the cues point toward a norm of responsible use, the majority of users will pick on that, too.

This is why CYA is a horrible guiding principle for any organization, and why one of the worst things policy makers can do is write policy for corner cases. There will always be bad apples, but write the policy for the general case–for how to use the system, not for how not to use the system–and deal with the exceptions as exceptions.

This insight, of course, has a much wider application than computing systems. It applies in almost any social setting. It is closely related, for instance, to the problems we see throughout the academy with “helicopter parents” and the resurrection of in loco parentis on campus: if you treat students like they’re not adults, they’ll never start to act like adults.

We talk about people “rising to the challenge,” but we never stop to realize that the reverse is also true. Thus, Savage’s Theorem:

People will generally meet your expectations of them.

GTD: Email in reverse

March 19, 2008

I’ve heard this email tip before, but I always forget to use it, even thought it’s probably the best single productivity enhancer I could implement on a daily basis. The advice? Write your email in reverse. If resist the temptation to just hit “reply all” and instead compose the body of the message, then write a descriptive subject, and then add the recipients, you’ll be less likely to hit send prematurely, and less likely to needlessly cc people who don’t really have a stake in the conversation. Making sure you get it right the first time, and don’t have to send a correction later or get bogged down in a long convo with people who don’t need to be involved will save both you and your readers time.  [via @steverubel] 


March 6, 2008

Watching Lee Lefever’s ignite presentation from last Feb. Every time I see something that came out of the year that went into The World Is Not Flat, I’m more amazed.

With a little help from the folks on the Literature and Latte forums, I have been gently massaging Scrivener into something I might want to write a dissertation in. Or, rather something I can write a dissertation in. A big part of that has been learning XSLT, since Scrivener uses Fletcher Penny’s Multimarkdown for LaTeX export. I am agaog. I am aghast. I am a number of other descriptors that imply the combination of unpleasant surprise and simple stupefication.

Who designed this monstrosity? As a culture, programmers share two main features. We’re lazy, and we’re creatures of habit. Whence, then, XSLT? Just because it transforms XML doesn’t mean it has to look like XML. In what world does this:

            <xsl:with-param name="substring">



            <xsl:with-param name="replacement">



make sense? Most text processing (or, if you prefer, “transforming”) lanuages have an idiom for this, and usually it looks more like s/~/\\ensuremath{\\sim}/ or maybe sub('\ensuremath{\sim}', '~'). Where’s the laziness, XSLT? And why are you reinventing the wheel–and a square wheel at that?

Not quite sure how I missed this, but apparently U-M’s collaboration with Google to put its works in the public domain online hit a major milestone on Friday: one million books scanned and put online.