City Council data now available in Edmonton’s open data catalogue

Yesterday Edmonton became the first city in Canada to release “a fully robust set” of City Council datasets to its open data catalogue. A total of five datasets were released, including meeting details, agenda items, motions, attendance, and voting records. There are now more than 100 datasets available in the catalogue, with more on the way.

Here’s the video recording of the news conference:

The City also produced a video about the new datasets:

The Office of the City Clerk is responsible for managing Council & Committee meetings, boards, elections, and more. The release of this data (referred to as “Clerk’s data” by some City employees) is another example of the way that office has embraced technology over the years. Kudos to Alayne Sinclair and her team, as well as Chris Moore, Ashley Casovan, and the rest of the IT team for making this data available!

I’m really excited about the potential for this data. The information has long been available on the City’s website, it was just locked away in meeting minutes as “unstructured” data – possible for humans to read relatively easily, but not for software. Now that it is available as “structured” data in the open data catalogue, applications can be written that take advantage of the data. You can find the data under the City Administration tab of the catalogue. Unfortunately the datasets only go back to June 1, 2011 instead of the start of Council’s term in October 2010. Currently, the datasets are updated daily.

I’ve now had a chance to look through the data, and while it looks good, it is unfortunately incomplete at the moment. There’s quite a bit of data missing. I would love to do some statistical analysis on the data, but with so many missing records there’s a good chance that my conclusions would be incorrect. I have already summarized my findings and passed them along to the team, so hopefully they can resolve the issues quickly!

I have already added functionality to ShareEdmonton for this data, and as soon as the datasets are fixed, I’ll release it. I hate to say “stay tuned” but there’s not much choice right now!

1.2 zettabytes of data created in 2010

For the last five years or so, IDC has released an EMC-sponsored study on “The Digital Universe” that looks at how much data is created and replicated around the world. When I last blogged about it back in 2008, the number stood at 281 exabytes per year. Now the latest report is out, and for the first time the amount of data created has surpassed 1 zettabyte! About 1.2 zettabytes were created and replicated in 2010 (that’s 1.2 trillion gigabytes), and IDC predicts that number will grow to 1.8 zettabytes this year. The amount of data is more than doubling every two years!

Here’s what the growth looks like:

How much data is that? Wikipedia has some good answers: exabyte, zettabyte. EMC has also provided some examples to help make sense of the number. 1.8 zettabytes is equivalent in sheer volume to:

  • Every person in Canada tweeting three tweets per minute for 242,976 years nonstop
  • Every person in the world having over 215 million high-resolution MRI scans per day
  • Over 200 billion HD movies (each two hours in length) – would take one person 47 million years to watch every movie 24/7
  • The amount of information needed to fill 57.5 billion 32GB Apple iPads. With that many iPads we could:
    • Create a wall of iPads, 4,005 miles long and 61 feet high extending from Anchorage, Alaska to Miami, Florida
    • Build the Great iPad Wall of China – at twice the average height of the original
    • Build a 20-foot high wall around South America
    • Cover 86 per cent of Mexico City
    • Build a mountain 25 times higher than Mt. Fuji

That’s a lot of data!

EMC/IDC has produced a great infographic that explains more about the explosion of data – see it here in PDF. One of the things that has always been fuzzy for me is the difference between data we’ve created intentionally (like a document) and data we’ve created unintentionally (sharing that document with others). According to IDC, one gigabyte of stored data can generate one petabyte (1 million gigabytes) of transient data!

Cost is one of the biggest factors behind this growth, of course. The cost of creating, capturing, managing, and storing information is now just 1/6th of what it was in 2005. Another big factor is the fact that most of us now carry the tools of creation at all times, everywhere we go. Digital cameras, mobile phones, etc.

You can learn more about all of this and see a live information growth ticker at EMC’s website.

This seems as good a time as any to remind you to backup your important data! It may be easy to create photos and documents, but it’s even easier to lose them. I use a variety of tools to backup data, including Amazon S3, Dropbox, and Windows Live Mesh. The easiest by far though is Backblaze – unlimited storage for $5 per month per computer, and it all happens automagically in the background.

Exploring Apps4Edmonton using Microsoft Live Labs Pivot

You’re going to hear a lot more about apps over the next few weeks! The deadline for submissions for the City of Edmonton’s Apps4Edmonton competition was Friday evening. Local developers came up with more than 30 really interesting and useful local apps, which will now compete for your votes and for the attention of the judges. You can learn more about the prizes and the competition here.

I started looking at some of the apps, and decided I wanted a better interface to browse them. I thought it would be nice to be able to sort the apps, to see a screenshot of each one, and to see which datasets each of the apps made use of. I also didn’t want to spend too much time on it, so with all of that in mind, this seemed like the perfect opportunity to experiment with Pivot.

Here’s what I came up with! Click on the image below to load the Apps4Edmonton Apps Directory in Pivot. You’ll need Silverlight 4 installed for it to work. Alternatively, if you have downloaded Pivot and have it installed on your computer, you can browse to this URL inside Pivot.

Click here to launch the Pivot!

Might take a minute or two to load. If it doesn’t, just refresh it. What you see are all the apps from the contest page, with a screenshot, description, contest URL, and list of datasets for each one. If you want to see just the apps that use the “Police Stations” dataset for example, you can select it in the navigation pane on the left and the view will update.

Ever since TechEd, I’ve been really interested in Microsoft Live Labs Pivot, an interactive data visualization technology. It’s great for exploring large datasets, identifying relationships, visualizing patterns, etc. The Apps4Edmonton dataset isn’t very large of course, but the tool still does a great job.

How It Works

I started out by building a Pivot Collection using Microsoft Excel. Pivot has a pretty handy tool for turning spreadsheets into collections, so that’s what I used initially. Quickly though I realized that I wanted to host this on the web somewhere, and that I wanted others to help me refine the dataset.

I uploaded the spreadsheet to Google Docs, and then downloaded the Just In Time Pivot Collection sample. After a little bit of experimentation with the Google Docs API (which I have never used before) I had the code working to create my collection on the fly. It loads the spreadsheet from Google Docs, creates the collection, and then serves up the XML and Deep Zoom images.

The spreadsheet is mostly complete, but a few apps are missing datasets. This is because either it wasn’t immediately obvious which they were using, or they simply don’t use any that are part of the data catalogue. You can update the spreadsheet here.

If you’d like to experiment with creating your own just-in-time Pivot Collection, you can download the sample code here and the code for the collection I wrote here. I also made use of CutyCapt to generate screenshots. You’ll also want to check the XML schema.

Apps4Edmonton

There are some really great apps in the Apps4Edmonton competition, so check them out. You’ve got until September 10 to vote for your favorite ideas and apps!

And for full disclosure, I submitted ShareEdmonton to the competition. If you like it, vote for it!

UPDATE: Thanks to John for helping me get the Pivot Collection right!

Edmonton Neighbourhood Census Data

For a long time I’ve wanted to get the City of Edmonton’s neighbourhood census data in CSV format (or really any usable format other than PDF). Recently, with the help of Laura (and Sandra) at the City’s Election & Census Services department, who I met at the Open City Workshop, I finally got it. And now you can have it too!

Download the Edmonton Neighbourhood Census Data in CSV

I’ve also emailed this to the City’s open data team, so hopefully they can get it in the data catalogue soon.

Visualizing the Data

Why is having the census data in a format like CSV useful? Well for one thing, it enables creatives to do stuff with that data through code or other tools. For instance, I was able to generate a heat map for the City of Edmonton:

The darker sections are more heavily populated, the lighter yellow regions are less populated.

Not all neighbourhoods are reflected, as the City does not release details for neighbourhoods with a population between 1 and 49. Here are some other things we can learn from the data set:

  • Total population in the data set is 777,811, which means there are 4628 individuals unaccounted for (total for 2009 was 782,439).
  • The average neighbourhood population is 2424, or 3039 if you exclude neighbourhoods with a reported population of 0.
  • The median neighbourhood population is 2216.
  • Oliver and Downtown are the only two neighbourhoods with a population greater than 10,000.
  • More dwellings are owned (192,171) than rented (121,953).

ShareEdmonton

Another reason having this data in CSV is useful is because app developers can more easily integrate it into the things they are building. For example, all the census data is now available at ShareEdmonton! So when you view a neighbourhood, you’ll see the census data on the right side (see Alberta Avenue for example). You can also browse neighbourhoods by population. I’ve also fixed the neighbourhood search, so it works better now.

This is just the first of a few neighbourhood-related updates this month, so stay tuned for more!

Apps4Edmonton

Yesterday the City released more information on the Apps4Edmonton competition. The first phase, from now until May, is “accepting community ideas”. Basically they want you to tell them what data you want. Aside from the obvious “we don’t know what we don’t know” problem, I think the community has done a pretty good job of defining desired data sets already.

They City had a great start in January with the launch of the data catalogue, but we need more data. Especially data like the census data, which myself and many others have been asking for since the day the PDFs were released. There are clearly some internal issues that need to be worked out if I was able to acquire this before the open data team was. I hope they get everything resolved for the competition, because it’ll be a pretty boring one if we still only have twelve data sets (New York and other cities had dozens, maybe even hundreds, before their competitions).

That said, I know there are passionate, smart people working on it. Email opendata@edmonton.ca if you have data set requests or want to get involved in Apps4Edmonton.

Open Data comes to Edmonton

Today I’m excited to share the news that Open Data has arrived in Edmonton! In a presentation to City Council this afternoon, Edmonton CIO Chris Moore will describe what the City has accomplished thus far and will outline some of the things we can look forward to over the next six months (I’ll update here after the presentation with any new information). This morning, he announced the initial release of data.edmonton.ca, the City of Edmonton’s open data catalogue. Starting immediately, developers can access 12 different data sets, including the locations of City parks, locations of historical buildings, and a list of planned road closures.

PDF You can download the report to Executive Committee here in PDF.

The report was created in an open fashion – the information inside was provided by 39 contributors who had access to a shared document on Google Docs.

Data Catalogue

The data catalogue is currently in the “community preview” phase, which basically means that the City of Edmonton may make breaking changes. Critically, the data available in the catalogue is licensed under very friendly terms:

“The City of Edmonton (the City) now grants you a worldwide, royalty-free, non-exclusive licence to use, modify, and distribute the datasets in all current and future media and formats for any lawful purpose.”

Developers access the data in the catalogue using the APIs. This might seem a little cumbersome at first, but it actually means you can programmatically traverse and download the entire catalogue! Developers can also run simple queries and view preview data on each data set page.

The catalogue features a prominent “feedback” link on every page, so check it out and let the City know how to make it better.

OGDI

The City of Edmonton’s data catalogue is built on Microsoft’s Open Government Data Initiative (OGDI) platform. OGDI is an open source project that makes it easy for governments to publish data on the web. The City of Edmonton, which is the first major government agency in Canada North America to use OGDI, will be contributing enhancements back to the project. OGDI is built atop the Windows Azure platform, and exposes a REST interface for developers. By default it supports the OData, JSON, and KML formats. Developers can access ODGI using their technology of choice, and C#, Java, and PHP developers can make use of the toolkits provided by Microsoft.

History of Open Data in Edmonton

We have been talking about open data for roughly a year now (and probably even longer). On February 18, 2009, Edmonton Transit officially launched Google Transit trip planning, which made use of a GTFS feed provided by ETS. At TransitCamp Edmonton on May 30, 2009, that data was made available to local developers. I led a discussion about open data a couple of weeks later at BarCampEdmonton2, on June 13, 2009. Councillor Don Iveson submitted a formal inquiry on open data to City administration on October 14, 2009. A few days later, the community talked again about open data at ChangeCamp Edmonton on October 17, 2009, focusing on Councillor Iveson’s inquiry. That event led to the creation of the #yegdata hashtag, a UserVoice site to identify potential data sets, and a number of smaller follow-up events. It also prompted Chris Moore to open up access to the creation of his report. On November 23, 2009 the City of Edmonton hosted an Open Data Workshop at City Hall that was attended by about 45 people.

What’s next?

First and foremost, developers need to start using the data! There will also be opportunities to provide feedback on the catalogue, to help prioritize new data sets, and to get involved with crafting the City strategy. Here’s the Program Plan for the City’s Open Data Initiative:

  • January 13, 2010: Initial release of City of Edmonton data catalogue
  • January 2010: Sessions with utility & organizational partners to obtain more data
  • February 2010: Public Involvement Plan
  • February – April 2010: Official data catalogue release, application competition!
  • March – April 2010: Development & approval of open data strategy for the City of Edmonton
  • May 2010: Open Data Administrative Directive, approved by City Manager
  • May – June 2010: Open Data Road Show, to communicate the strategy

In Vancouver, the policy came first and the data catalogue came second. In Edmonton we’re doing the reverse. We end up with the same result though: by the spring we’ll have a data catalogue in use by developers, and an official policy and strategy for open data in the future. This is fantastic news for all Edmontonians!

Congratulations & Thanks

Congrats and thanks to: Chris Moore for providing the leadership necessary at the City of Edmonton for all of this to become a reality; James Rugge-Price and Devin Serink, for organizing the workshop in November, for doing most of the behind-the-scenes work, and for always keeping the discussion alive and interesting; Jacob Modayil, Stephen Gordon, Jason Darrah, and Gordon Martin for supporting this initiative from the beginning, and for bringing valuable experience and leadership to the table; Don Iveson, for recognizing the positive role that open data will play in building a better a Edmonton; all of the members of the community who have contributed ideas and helped to spread the word about open data; all of the other City of Edmonton employees who have supported open data in Edmonton. And finally, thanks to Vancouver, Toronto, and everyone else who came before us for leading the charge.

Enough reading – go build something amazing!

Edmonton Sun violates the EPS Crime Map Terms of Use

Back in July, the Edmonton Police Service launched its Neighbourhood Crime Mapping site. Like most people I was quite enthusiastic about the site, until I read the terms of use and realized how restrictive they were. Basically you can look at the numbers, but you can’t do anything with them (such as publish them on a blog). The Crime Mapping site is not open data. I emailed back and forth with the EPS, and was told that they wouldn’t be changing the terms of use. And, they haven’t.

That didn’t stop the Edmonton Sun, however. They apparently ignored the terms of use altogether, and published an article on December 20th summarizing a number of statistics from the website:

Some of Edmonton’s roughest neighbourhoods faced markedly fewer crimes in 2009, according to police statistics.

The statistics came through a new crime mapping system launched by Edmonton police last summer.

I had asked for permission to do something similar and was turned down. After reading the Sun article, I emailed the EPS to find out if the terms of use had been changed (despite the text on the website staying the same). Here’s what Acting S/Sgt. John Warden wrote back:

The Edmonton Sun did not have the EPS’ permission to use the information from the Crime Mapping website and the EPS is dealing directly with the Edmonton Sun in relation to this.

I emailed back a couple of follow-up questions, but have not yet received a response. The Edmonton Sun article is still active on the website, so I’m not exactly sure what “dealing directly with the Edmonton Sun” means.

I’m annoyed by this, obviously. Was it an honest mistake? Maybe. Is it a case of a large media organization getting off the hook? Maybe. Will it happen again? Probably. No one reads the fine print, we all know that.

I don’t think the current terms of use is appropriate, and I strongly urge the Edmonton Police Service to change it.

Canadian Finals Rodeo (CFR) Attendance Numbers

The 2009 Canadian Finals Rodeo wrapped up on Sunday, and although attendance was down from previous years, it was still pretty good. I’m always disappointed, however, when the press release or media article comes out and compares attendance only with the previous year, or sometimes with the record year. I’m often more interested in trends, and in comparing with other events. Slowly but surely, I’ll gather all of the data to make that easier! So far I have:

And now, I have some data for CFR. Here are the attendance numbers for this year compared with last year:

Day 4 is the Saturday, and is always higher because there are both matinée and evening events. Here are the attendance numbers from 2005 to 2009:

As you can see attendance peaked in 2006, the record year for CFR.

Download the 2008/2009 attendance data in CSV

Download the 2005-2009 attendance data in CSV

Open Data at ChangeCamp Edmonton

Tomorrow morning local politicians, bureaucrats, and ordinary citizens will gather at the University of Alberta for ChangeCamp Edmonton. I’m encouraged by the number of people that have registered, and by the conversations that have already started. That’s what tomorrow is all about: getting people together to discuss ideas and solutions.

I don’t know exactly which topics people will want to discuss tomorrow, but I know for sure that open data will be one of them. There’s significant momentum building for the concept, and we’re starting to see progress on making it happen throughout Canada (and elsewhere).

Open data here in Edmonton received a nice boost this week from Councillor Don Iveson when he submitted a formal inquiry to City administration:

In local, national and sub-national governments around the world there is a trend toward making up-to-date government information freely available on-line in generically accessible data formats as so-called ‘Open Data’.

  1. What level of awareness does the City Administration have regarding Open Data in municipal government?
  2. What current initiatives are underway within City Administration that might qualify under the spirit of Open Data?
  3. What further initiatives are under consideration within the city, and on what basis are they being evaluated?
  4. Is Administration monitoring any successes and or challenges with this trend in other jurisdictions, especially large Canadian cities, and if so what can be shared with Council?
  5. What would City Administration’s recommendation be on next steps regarding Open Data plans or strategies?

I know there was already some things going on behind the scenes at the City of Edmonton, but Don’s inquiry should expedite and give credibility to those things. This is an important step.

I’ve been pushing for open data in Edmonton for a while now, along with many others. I think ChangeCamp will be a great opportunity to further discuss the concept and next steps. I generally think about open data in the context of a municipality, but there’s room for discussion at the provincial and federal levels too. Here are some of the key things I think we can cover:

  • Let’s make sure everyone (citizens, politicians, City administration) is on the same page about what we mean by “open data”. This could be high level (what kinds of data are open) and low level (what formats are considered open).
  • What is the City working on? What are citizens working on? Let’s get a status report from both sides.
  • What kinds of data could be made open? Which data is most in demand by citizens? What data has been made available in other cities, such as Vancouver or Toronto?
  • Licensing is vital for open data to work. We need to ensure data is licensed as permissively as possible, otherwise we’re restricting its utility. Which licenses make sense? What have other municipalities used?
  • Often lost in the discussion about what data to make available is how to be notified of changes to that data. RSS feeds, email subscriptions – how should citizens be notified when data is updated or otherwise changed?
  • Another aspect that we need to consider: the creation of data. There is lots and lots of data that our governments can start making available in open formats, but there’s even more data created on a daily basis. What can we do to ensure that it is open data also? How about APIs or other mechanisms for citizens to provide input/data? Open 311 comes to mind.

Here are some links that might be useful tomorrow:

See you in the morning!

Open Data in Edmonton? Follow Vancouver’s lead

Last week Vancouver launched an open data portal, providing one-stop-shopping for open data provided by the city. David Eaves called the launch “a major milestone for Vancouver” and explained:

The Data Portal represents an opportunity for citizens, especially citizen coders, to help create a City that Thinks Like the Web: a city that enables citizens to create and access collective knowledge and information to create new services, suggest new ideas, and identify critical bugs in the infrastructure and services, among other a million other possibilities.

He was also quick to point out that getting access to the data is just the beginning. Citizens have to use it, or risk losing it. The next day he launched VanTrash, an application to make garbage collection sexier. Use it or lose it indeed!

I think it’s interesting that he started with garbage collection, because I too identified that as an area that could use some innovation. A couple months ago, I spent about an hour on the phone with a manger in the Waste Management department at the City of Edmonton, trying to get access to the data behind the garbage collection schedules. Currently you can enter your address here to download your collection schedule in PDF. But if you want to find the schedule for a different part of the city, you’re out of luck. And even if you manually tried enough addresses to find all the zones and collection schedules, they’d be in PDF, which means you can’t easily add them to a calendar.

By the end of the call, I think he finally understood what I was after, and he said he’d have to get back to me. He never did, unfortunately. I can only hope that my request had an impact and that it will eventually help to open the data floodgates in Edmonton.

Open Data doesn’t have to be difficult!

Take a look at the data available at Vancouver’s data portal. Most of the data there is simple and exists elsewhere, in a less “creative friendly” format. A good example is the list of libraries. You can download the data in CSV, XLS, or KML formats, but it really just comes from the Vancouver Public Library website. The CSV contains the library name, it’s latitude, longitude, and address. Simple stuff, but potentially really useful if combined with other data sets.

Here’s an example in Edmonton. Let’s say I want to know how the crime rate of neighbourhoods with libraries compares to those without. What data would I need for that?

  • A list of libraries, with their locations (see below)
  • A list of neighbourhoods, with their boundaries
  • Crime statistics by neighbourhood
  • Census data for neighbourhoods to find comparable ones without libraries

Could you find this today? Yes, but it’s definitely not easy! The EPL website lists the libraries with addresses, so you’d need to figure out the lat/long on your own. The City of Edmonton website lists the neighbourhoods, but you’d need to figure out the boundaries on your own. The EPS website provides reported crimes by neighbourhood. And finally, the City of Edmonton provides census data for neighbourhoods in PDF.

If I could get all the above data in CSV format, it would have taken a matter of minutes to find the answer (I should point out that not all of that data exists at Vancouver’s portal either). Instead, I had to do a lot more work. The very rough result (because I compared with a random sample of similarly populated neighbourhoods) is that neighbourhoods with libraries were 1.5 times more likely to have crime than neighbourhoods without libraries in 2008. Though if you don’t count Downtown, then the crime rate is about the same for neighbourhoods with libraries and those without.

Maybe you’re thinking “what a useless example” and that’s fine – it is one of just hundreds or thousands of possible uses for that data! Just imagine what would be created if software developers and other creatives in Edmonton had access to the data.

Libraries Data

All this talk of open data, why not give you some? I’ve created a CSV of the Edmonton Public Library locations in the exact same format as the Vancouver Public Library data (minus eplGO in the Cameron Library). Enjoy!

Download the Edmonton Public Library location data in CSV

Onward in Edmonton

I’ve heard rumblings that the City of Edmonton will be doing some stuff in the open data space in the next couple of months, but I’m not holding my breath. There haven’t been enough conversations taking place. I’m hopeful that the right people are envious of the progress that has been made in Vancouver, however. I sure am!

Attendance Numbers for Edmonton’s Capital EX

Edmonton’s Capital EX wrapped up yesterday. Sharon and I visited on Thursday evening and had a good time. Today Northlands released the attendance numbers, and though slightly lower than previous years, the ten-day festival still recorded an impressive 717,966 visits. I had been looking forward to the final numbers, so that I could compare it with previous years.

Here are the attendance numbers for the last ten years (you can download the raw data below):

Though much of the data is missing, I was able to track down some numbers going all the way back to 1879:

After getting this information, I decided to compare it to the population of Edmonton for the same time periods. Here is the comparison for the last ten years:

And the same comparison starting in 1879:

 

A couple things to note about the data in this post:

  • The event changed from Klondike Days (adopted in 1962) to Capital EX in 2006. This explains the large drop that year.
  • The event was a six-day fair from 1912 to 1967, and a ten-day fair thereafter (I think, certainly for the last 20 years or so it has been). I haven’t adjusted the figures for this.
  • The population data, which comes from the City of Edmonton, doesn’t account for surrounding communities.

Download the Capital EX Attendance & Edmonton Population data in CSV

Sources: iNews880, CBC, Edmonton Journal, Amusement Business (1, 2, 3, 4), City of Edmonton, Capital EX Fair History