Wednesday, April 17, 2013

Graphing the Links between Articles on Wikipedia

One of my summer projects last year was to graph the links between Wikipedia articles. I was inspired by the Wiki Wars concept and wanted to see if I could programmatically beat the game. I used C# and the Mono .NET runtime to build a Wikipedia pathfinder.

Here are a couple of graphical representations of the results. They make great wallpapers!

My Current Wallpaper
A Zoomed Out Link Graph

Obtaining the Data

My first task was to download the Wikipedia dataset in its' entirety. I chose the XML format which compressed is approximately 10GB. The uncompressed file is nearly 40GB. I naively attempted to load the dataset using the .NET runtime XML parsers. Unfortunately they attempt to inflate the data before it can be queried. My server has 4GB of RAM and I quickly learned that this simply wouldn't work.

I decided to work with a subset of the data to ease the load on my server. This worked out well.

Assembling a Graph

The links on Wikipedia can be modeled using the abstract graph data type.

I have a type that consists of the page name and a list of all other pages it links to. To determine a path, I recursively search for the destination page. Once I find a path to the destination I record the hops necessary to get there. I try to find an even shorter path until I can no longer find a path that has not already been found.

Graphing the Results!

Once I realized that I could definitely beat this game with some simple programming, I quickly lost interest in the project. I decided to try graphing the output. Here is one of the first graphs that I created. This one originates at Dinosaur and performs 6 hops. There is lots of overlapping text, but it looks good!

An early prototype!
I was motivated to press on. I decided to have all links originate from the center of the viewport. I distribute outbound links evenly around the source article. As the number of outbound links increases, the distance between the source and destination article must increase to avoid overlapping text. Here is a screenshot of this idea in its' infancy.

Linux is the Source
In this screenshot you can see the idea coming together. Linux is at the center of the screen and outbound links are distributed in a circle around it. Unfortunately there are still some bugs at this point and things don't look quite right yet. I tweak the calculations involved in the locations of destination articles to yield a better result.

Starting to look Good
You can see the arrangement is starting to come together but there are still some rather large gaps between articles. It is really quite interesting to see what articles are linked to one another without the rest of the text "in the way". Apple Inc. was chosen as the test case for the following graph.

Apple Inc.
All of these graphs are generated as SVG files which render nicely using Inkscape providing that they aren't too large. Here is a screenshot of Inkscape showing the chaos involved in one of these images. You can see how a huge amount of outbound links actually exceed the bounds of the viewport.

Inkscape, most bugs are fixed at this point
Releasing the SVGs

If anyone wishes to see the original SVG files, I will be happy to share them. They are rather large so I won't bother hosting them until I have some serious interest.

Releasing the Code

I have decided not to release this code. There is nothing proprietary about what I have done and I have given you more than enough information to create your own implementation. This was a weekend experiment and as such is riddled with dependencies on my server and specific files that existed on my hard drive at the time. It is untested and is definitely not production code.

This became more of an art project than anything and the results are fantastic. If you decide to implement your own, I would love to see it!


  1. You can surf the varieties of escorts here. Escorts Service in Haridwar In the list of factors to get various people around the world. You can make your evening amazing with this quality Call Girls in Haridwar Every one of our escorts is completely ready to give you personal moments packed with the colour tone of avarice. Check our other services...
    Russian Call Girls in Gurgaon 
    Russian Call Girls in Jaipur
    Call Girls in Aerocity


  2. Our Escorts in Agra, They are very adventurous; therefore they will take you to places other tour guides wouldn’t dare to tread. Give these girls a chance to lead your way and you will live to remember this powerful tour. Check our other services...
    Female Escorts in Agra
    Russian Escorts in Agra
    Escorts in Agra
    Call Girls in Faridabad


  3. If you would like to book an escort like Call Girl in Mahipalpurfor an upcoming time, you can use our mail reservation type where you can specify a substitute woman should your perfect option not be available. You will need to provide us with your real name on the reservation type, but be confident we are very distinct and will keep your private information private and protected. Check our other Services...
    Russian Escorts in Delhi
    Escorts Service in Aerocity
    Escorts Service in Mahipalpur
    Escorts in Aerocity
    Escorts in Mahipalpur

  4. عزل اسطح بالدمام لحماية وزيادة عمر الخزان الافتراضي.

    شركه عزل فوم بالدمام عزل فوم


  5. Also known to come with crucial offerings for its sensible, down-to-earth and important clients,Call Girls in Gurgaon really means in fulfilling their lives with cheerful moments to make love in concern to ensure them live their lives in abetter way. Check our other Services...
    Charges of Call Girls in Gurgaon
    Escorts Service in Gurgaon
    Female Escorts in Gurgaon
    Gurgaon Escorts Service
    Housewife Escorts in Gurgaon