The Wiki is in French...

A few years ago, I was managing a team that, among other things, was responsible for the administration of our internal tools, including the company's wiki. One day, while in a meeting, my phone started going crazy with phone calls and email. Pulling up a message from one of the engineering directors it became clear that we had a serious problem...

		
 From: 	David ____
 Subject: The Wiki is in French...
 Date: 	June 30, 2008 6:38:18 PM PDT
 To: 	wiki-admin@redacted.com

 In both Firefox and IE6, the Wiki is suddenly localized to French. I just had Ken check this and he's seeing the same...

 Merci,

 David

I hastily excused myself from the meeting and at my desk I was able to verify that, while the user-generated content was still in English, all of the navigation, controls, labels, etc. of the wiki was indeed in French! Qu'est-ce qui se passe?!

Well, it didn't take long to find the admin panel that allowed for the selection of English, Spanish or French as the default language. I queried the admin team, but no one admitted to making such a change. In fact, none of us had known, up until this moment, that such a panel existed, as it was buried deep in the hierarchy of config pages.

Ok, so what happened? We began poring over the HTTP access logs and immediately noticed something strange: thousands of GET requests over the last 20 minutes...way more traffic than the wiki normally gets...(it was a small company, so normally we'd see a few hundred requests an hour). On further examination we noticed that each URL request was unique. Given the spped of the requests and sequencing of the requested links, it looked like a bot was crawling the site. Ok, now a quick search for the URL of the admin page containing the language settings and...bingo! This crawler had hit the admin page that controls the localization settings. But how could it have actually reset the configuration? Well, as it turns out, language changes were effected by making a GET request to a URL of the form /admin?defaultLang=<LANG>. In fact, a link of for each English, Spanish and, lastly, French was embedded in the admin page HTML. So, as the crawler indexed the admin pages, it would follow the links and, as a side effect, would change configuration settings for the wiki, including, but not limited to, the default language for the site.

After determining the source IP of the crawler (one of the admins was experimenting with Nutch as a supplement to the wiki's impoverished search capabilities and had authenticated the crawler using their admin credentials) and stopping the crawl we stepped back in wonder at how a major application vendor had missed such a core tenet of web application architecture...GET requests (along with HEAD, OPTIONS and TRACE methods) should have no side effects and must not change application state. If the wiki had used discrete forms for each option that POST'ed the changes for language and other settings, the crawler would not have triggered state changes. Furthermore, if an admin had bookmarked one of the URLs for whatever reason, it is quite possible they could have accidentally revisited (and implicitly re-configured) the wiki without realizing it. Users often bookmark result pages from state-changing activities...if you are not careful to avoid unsafe GETs you will find that bookmarks, browser refreshes and use of the back button can wreak havoc on your application's state.

It's a all-to common mistake, and one I'll admit to having made over and over again in my early days building web-applications...but understanding how HTTP works, and the implicit contracts of it's methods, is a must if you are serious about building legit apps. I'm always surprised at how many folks claim to be building RESTful APIs and apps without understanding these core concepts (which means they are missing the fundamental properties of REST).

Here is some required reading on the subject:

Hopefully this anecdote helps folks understand how key these concepts are to building robust web applications. Merci.