Searching Wikipedia Sucks!

Post Image Have you tried searching Wikipedia lately? Don’t bother, because you probably won’t find what you’re looking for! I am continually amazed at how terrible the Wikipedia search results are. Here’s an example of what I mean. Go to Wikipedia, type “al gor” in the search box, and click the search button. You should see something like this. That’s right, the top results are Al-Merrikh, Cy-Gor, Firouzabad, and Kagame Inter-Club Cup.

Absolutely terrible! If you type the same thing in the search box at Google, not only do you get accurate results, but Google prompts you with “Did you mean: al gore”. Why yes, I did! So why is searching Wikipedia so bad?

Part of the problem is that Wikipedia actually has two search modes: “Go” and “Search”. If you type “Al Gore” (spelled correctly) in the box and click Go, you’re taken right to the entry about Al Gore. If you instead click Search, you’re taken to a list of articles that contain or reference “Al Gore”. You can read more about searching Wikipedia here. So they’ve sort of complicated things by including two buttons instead of just one. The Go button is useful when you know the name of the article you want, but useless otherwise.

The other part of the problem is that the search algorithm just plain sucks. I know they don’t have a lot of resources, but you’d think that one of the most popular websites on the web could have a decent search feature. Matching “al gor” with “al gore” is a problem that has been solved for years, yet Wikipedia doesn’t even come close to accomplishing it!

Wikipedia itself mentions external search engines as a way to find what you’re looking for, but they aren’t really much better. For instance, if you type “al gor” at the special Google search for Wikipedia page, you do get the correct Al Gore entry as the first result, but the rest are not relevant at all.

So here’s where we’re at. Google knows that if you type “al gor” you probably mean “Al Gore”. Wikipedia knows about all of the entries that reference “Al Gore”. What we need is a way to combine the two! Is that really so much to ask?

If you know of a better way to search Wikipedia, please let me know!

Amapedia by Amazon.com

Post ImageWikipedia is a superb resource for general information, but I think there’s room (and demand) for topical “wikipedias” too. Such as a wikipedia for product information. Which is exactly what Amazon.com recently launched:

Amazon has just released a new Wikipedia clone, called Amapedia. It’s described as “a community for sharing information about the products you like the most.”

I took a quick look at the site, and so far it’s not very impressive. It has potential though. I have to agree with Richard:

The site looks pretty raw currently and has little info in it – it is after all brand new. But a wikipedia for products makes perfect sense for Amazon. Who better to spotlight products and gather product information from the community, than Amazon?

With enough contributions, Amapedia could become the site to check before you purchase something. Good idea Amazon!

Read: Read/Write Web

Why nofollow at Wikipedia is a good thing

Post ImageYou may have heard that Wikipedia recently decided that all outbound links would be coded with the “nofollow” tag, meaning that search engines do not give the links any weight in their algorithms. The idea is that it will make it much less desirable for spammers to add their links to the thousands of pages at Wikipedia. Sounds good right? Well so far, the reaction has been pretty negative:

Although the no-follow move is certainly understandable from a spam-fighting perspective, it turns Wikipedia into something of a black hole on the Net. It sucks up vast quantities of link energy but never releases any.

Lots of bloggers are worried that the new scheme does not properly recognize the original sources of information. A blog or other site will still be cited on the Wikipedia page, but that citation no longer carries any weight with the search engines.

I think that argument is fairly weak. If you are really deserving of some major “link energy” then you’ll get it, because chances are, Wikipedia won’t be the only site linking to you. So worries about not getting “credit” in the form of Google-juice are pretty unfounded, I think.

I suppose it comes down to the “perfect world” scenario. In a perfect world, there would be no spam, and everyone would benefit maximally from linking to one another. Thing is, we don’t live in a perfect world – thus we have to attempt to reduce the imperfections. This policy is an attempt to do that with spam.

I see the nofollow policy as serving the greater good. Is an individual’s link juice more important than everyone’s access to a reliable, spam-free Wikipedia? The answer is no, and that’s why I think the nofollow policy is good.

Wikiasari Search Engine

Post ImageThe Times of London is reporting that Jimmy Wales, founder of Wikipedia, is planning to launch a search engine next year in collaboration with Amazon.com. Dubbed Wikiasari, the search engine will allow users to rank web pages in an effort to create more accurate results (via Techmeme):

“Essentially, if you consider one of the basic tasks of a search engine, it is to make a decision: ‘this page is good, this page sucks’,” Mr Wales said. “Computers are notoriously bad at making such judgments, so algorithmic search has to go about it in a roundabout way.

It appears the big selling point of the search engine will be that it harnesses the wisdom of crowds. Google already does this, with PageRank, but in a less direct way. I am not sure if the new idea is going to fly – how many people really want to rank pages when they search? Usually you just want the results immediately. I’d bet most people won’t want to invest an extra few minutes to visit and rank the results.

I really have no idea what Amazon.com has to do with this project, but recall they too have their own search engine, A9.

Read: Times of London

Students using Wikipedia

Post ImageWikipedia has become pretty popular in the last couple years, and I am sure that most students have at least seen the site, even if they don’t use it regularly. I think the online encyclopedia is an excellent resource, full of really great information. I also think it should be treated like any other resource, whether online or offline – with caution. That said, I don’t think there’s any reason students should not use it. An intern at CNET News.com thinks otherwise:

Wikipedia is one of the Internet’s latest additions to the information revolution. More importantly, it’s the reason I was able to finish my massive second-semester AP English research final project in less than 45 minutes.

As the deadline loomed, I knew there was no way I would be able to sort through thousands of Google search results or go to the library to research while simultaneously performing other vital homework completion functions like talking online, reading celebrity gossip and downloading music. So I did what any desperate, procrastinating student would do–I logged on to Wikipedia, pulled up the entries on Renaissance literature and filled in the gaps until I had a presentable product.

Until recently, many kids in my high school, myself included, used Wikipedia without questioning the integrity of its content. Before Colbert highlighted the unreliability of the site’s information, I doubt many people even realized it isn’t an authoritative, credible source.

So please take my advice, students: Wikipedia is a great place to find out about local bands or start doing research. However, before including Wikipedia information in a term paper or using Wikipedia entries to study for exams, make sure you support your findings with more legitimate sources.

So let me get this straight – you’re an advanced placement English student, with a major research project, and you’re waiting until the last minute? Then you rely solely on Wikipedia entries and a few blanks you filled in? As one student to another, I hope you failed. And are you really so unable to think for yourself that you just assume Wikipedia is the be all end all of accurate information? Pretty sad it takes a comedian on television to teach you that it isn’t.

Wikipedia has been found to be just as accurate as Britannica (granted, I would like to see some additional studies back this up). The difference is that Britannica entries are shorter and contain a neutral perspective, while Wikipedia entries can be longer, include multiple perspectives, links to other resources, pictures and other multimedia, and much more. Wikipedia is also able to offer a much wider range of topics, including some very specific articles on niche subjects. There’s no reason to think that Wikipedia can’t be as comprehensive or accurate as traditional encyclopedias, though it varies from article to article. In fact, on average, I bet it is better.

I guess this really isn’t so much about whether students should use Wikipedia or not – to me, it’s clear they should. The point that needs to be made is that students always need to find multiple sources for information they want to use, and they’ve always got to add something extra. Even in a research paper, a little commentary and anaylsis will help your paper rise to the top of the pile when the time comes for it to be graded.

Don’t use only Wikipedia, but don’t be afraid to use it in addition to your other resources either.

Read: CNET News.com

Old School 2

Post ImageOh man, I just can’t believe this. Well that’s not true, I can, I just can’t wait! Via Sharon, I got this link to the IMDB site with an entry for Old School 2! Yes, a sequel to one of the funniest movies I have ever seen. There information there is sparse, but Wikipedia has the goods:

Old School 2 is the announced sequel to Old School. This film will be made by DreamWorks SKG and will be distributed by Paramount Pictures. It is scheduled to be released some time in 2007.

That’s just the description though. The real goods are in the cast list they provide: Luke Wilson, Will Ferrell, Vince Vaughn, Jack Black, Ben Stiller, Owen Wilson, Steve Carrell, Elisha Cuthbert, Kathy Bates, and Christopher Walken.

Is that even possible? The Wikipedia entry has a warning that sources have not been cited, so take that list with a grain of salt, but still! That would be absolutely amazing! IMDB has the movie listed for a 2007 release. I hope it is correct.

Black van of death, Frank the tank, and all the other greatness that was Old School. Just imagine!

I wish IMDB had RSS feeds for movies, then I could easily see when the cast is updated.

Wikipedia Under Fire

Wikipedia is without a doubt one of my favorite websites. Even though I have only ever made one or two contributions to Wikipedia, I find the site invaluable for research. The vast amount of information immediately available is hard to overlook for research of any sort (there are 848,598 English language articles as of this post). If you have a question about something, you can probably find the answer at Wikipedia.

Called “the self-organizing, self-repairing, hyperaddictive library of the future” by Wired Magazine in March of 2005, Wikipedia has enjoyed much success. The Wired article is just one of many mainstream media articles praising the site, and there are many thousands if not millions of bloggers and others who use and recommend Wikipedia each and every day. The New York Times offers some numbers describing Wikipedia’s success:

The whole nonprofit enterprise began in January 2001, the brainchild of Jimmy Wales, 39, a former futures and options trader who lives in St. Petersburg, Fla. He said he had hoped to advance the promise of the Internet as a place for sharing information.

It has, by most measures, been a spectacular success. Wikipedia is now the biggest encyclopedia in the history of the world. As of Friday, it was receiving 2.5 billion page views a month, and offering at least 1,000 articles in 82 languages. The number of articles, already close to two million, is growing by 7 percent a month. And Mr. Wales said that traffic doubles every four months.

Lately though, despite all of the success and impressive usage numbers, cracks have started to appear. Two questions, both of which have been asked before, have once again been brought into the spotlight – just how reliable is the information found on Wikipedia, and where is the accountability?

Consider what happened to John Seigenthaler Sr.:

ACCORDING to Wikipedia, the online encyclopedia, John Seigenthaler Sr. is 78 years old and the former editor of The Tennessean in Nashville. But is that information, or anything else in Mr. Seigenthaler’s biography, true?

The question arises because Mr. Seigenthaler recently read about himself on Wikipedia and was shocked to learn that he “was thought to have been directly involved in the Kennedy assassinations of both John and his brother Bobby.”

If any assassination was going on, Mr. Seigenthaler (who is 78 and did edit The Tennessean) wrote last week in an op-ed article in USA Today, it was of his character.

Whoever added that false information to the article did so anonymously, so beyond publicly stating the truth, Mr. Seigenthaler really had no recourse. So there’s the issue of false information, and how to stop people from entering it. Wikipedia works on the premise that mistakes are caught by later contributors, and regular users who monitor changes. Clearly, that doesn’t always work.

If reliability and accountability weren’t enough, how about ethics? Should you edit the entry for something you were involved in? The question was raised earlier this week when Adam Curry attempted to make some changes to the entry for Podcasting. Dave Winer explains:

Now after reading about the Seigenthaler affair, and revelations about Adam Curry’s rewriting of the podcasting history — the bigger problem is that Wikipedia is so often considered authoritative. That must stop now, surely. Every fact in there must be considered partisan, written by someone with a confict of interest. Further, we need to determine what authority means in the age of Internet scholarship. And we need to take a step back and ask if we really want the participants in history to write and rewrite the history. Isn’t there a place in this century for historians, non-participants who observe and report on the events?

Dave makes some very good points. Upon first reading his entry, I though the question of historians and third-party observers was very obvious and a simple way to resolve these kinds of issues. The more I thought about it though, the less sure I felt. Requiring historians and non-participants to write the entries simply because that’s the way we’ve always done it may not be the best way to move forward. Thanks to Wikipedia and the web in general, we have the ability to turn the conventional wisdom “the winners write the history books” completely upside down. By editing websites like Wikipedia as events are taking place (such as the creation of podcasting) do we not have a better chance of capturing a more realistic view of history? If all sides of an issue can enter their views, do we not have a more accurate and complete entry? Of course, we unfortunately need to deal with flame wars in many of these cases, but maybe that will change as the process matures.

The issues I mentioned above are currently getting a lot of attention, and are pretty natural in the evolution of a system like Wikipedia. I don’t think anyone should be surprised that questions of reliability, accountability and ethics are being asked. And if you really stop and think, you’ll probably realize that the solution to all of these problems has been around for a very long time. As with all websites on the Internet, it is up to the reader to use his or her best judgement in evaluating the accuracy and relevancy of the informaton on a web page. Searching the information available at Wikipedia should be no different than searching the information available in Google – reader/searcher/user beware.