EPS responds to my query about the Crime Mapping terms of use

Last week I posted about the new Crime Mapping site launched by the Edmonton Police Service (EPS). One of my criticisms of the site was the very restrictive terms of use or disclaimer that you must agree to before you can use the site. In particular:

While it is acceptable to pass the website link on to others in your community, you will not share the information found on the website with others other than with members of the Edmonton Police Service or other law enforcement agencies; and

You will only use this website and the information in it so you can inform yourself of, and participate in, this community policing initiative;

This is problematic because it effectively means that you can’t do anything with the data that EPS has now made available. You can look at it using their site, but you can’t then blog about that data, or add it to a PowerPoint presentation.

I emailed a request for clarification and received a response from Amit Sansanwal, Criminal Statistics Coordinator at EPS. I asked for and was granted permission (by their legal department) to publish his response:

The EPS views the Neighbourhood Crime Mapping website as a valuable addition to our community policing initiative.

The EPS, however, is of the view that this tool can only be effective and achieve its community policing objectives if people seeking the information visit the Neighbourhood Crime Mapping website directly themselves.

By visiting the website, hopeful participants in this EPS community policing initiative can learn about what kind of information is available to them (e.g. crime prevention and partnership programs) and how it fits within this program.

We appreciate your interest in this program and hope that you tell others about the existence of the Neighbourhood Crime Mapping website.

In a later email, Amit pointed out that the current preferred way to get EPS statistics is through Statistics Canada.

The crux of their position, if I understand it correctly, is that they don’t want people looking for crime statistics to come across an inaccurate or malicious source. That seems reasonable. The problem is that such a position assumes people are actively seeking the information. By opening up access to the data and allowing others to make use of it, they can potentially reach far more Edmontonians, not to mention the benefits that could come from mashups or other data visualizations. Furthermore, it seems as though they just want to force people to use the Crime Mapping site so that they can promote additional programs to users.

The Crime Mapping site is fun to look at, but I would argue its utility is restricted by the current terms of use. Unfortunately, it doesn’t look like that’ll be changing any time soon.

Data on Edmonton’s new 12-ward system

Last night City Council voted in favor of changing from the current 6-ward system to the more common 12-ward system used throughout North America. The change will take effect for next year’s municipal election. For more background, check out Dave’s post. You can also check out the City of Edmonton’s page for more information.

As an advocate of open data, I thought I’d share with you some data related to the new wards below. All of the data is available on the City website somewhere, but not in an easily consumable form. I’ve done the legwork to make it accessible.

Amendments made to the motion last night affected the wards a little:

  • Grovenor and McQueen neighbourhoods moved from Ward 1 to Ward 6.
  • CPR West moved to Ward 8 from Ward 10.
  • Calgary Trail North and Calgary Trail South moved from Ward 11 to Ward 10.
  • Some ravine boundaries were changed from “in-the-middle” to “top-of-bank”.

Here are the stats on the new wards:

In a table (download CSV file here):

Ward Population Electors
1 62,625 51,061
2 67,306 54,704
3 63,819 49,465
4 67,811 52,666
5 62,424 49,615
6 70,840 62,152
7 63,549 51,865
8 66,196 57,189
9 68,214 53,889
10 61,276 49,935
11 64,770 51,329
12 63,609 48,529

The average population of each ward is 65,203 and the average number of electors for each ward is 52,700. This data comes from the 2009 Municipal Census.

Here are the number of neighbourhoods in each ward:

I’ve also compiled a list of neighbourhoods in each ward which you can download in CSV here. Or if you’d rather just look, you can download the list in PDF here.

I’m trying to track down or create a good quality map of the 12 wards, but this’ll have to do for now. What I’d really love is lat/long coordinates for each ward. If you have something better than that graphic, let me know!

Go do something useful or interesting with this data, and then tell me about it. I’m looking to collect local examples to strengthen the case for open data at the City of Edmonton!

UPDATE: Here’s a better map in PDF format.

UPDATE2: Here’s an even better color map showing the wards and neighbourhoods in PDF format.

Calgary takes first steps toward becoming an Open City

A motion will go before Calgary’s City Council next week that outlines the first steps in the process of making Calgary an Open City. Calgary follows in the footsteps of Vancouver, which passed a similar motion back in May. DJ has all the details on the Calgary motion here. I think it’s pretty cool that the news is first announced on a blog!

Calgary’s motion will result in a report from City Administration to be presented to Council no later than December 2009, outlining the overall strategy for making Calgary an open city. In particular, the report will identify “opportunities to make more of The City’s data open and accessible while respecting privacy and security concerns , and ensuring that data is available through use of open standards, interfaces and formats.” Other aspects of the strategy will include increasing online citizen participation, procuring and supporting open source technologies, and increasing the number of City services available online.

This is exciting news for developers and other creative professionals in Calgary and elsewhere. I’ve been pushing for open data in Edmonton recently, and I really hope we’re not too far behind our southern neighbours on this issue. There are a number of advantages to making data available in open standards and formats:

  • Citizens can subscribe to data that is of interest to them
  • Data can be mashed together in new ways, revealing new information
  • Visualization of data can help citizens make better decisions
  • Citizens can work together to organize data
  • Government can learn more about its data from citizen contributions

Additionally, using well-understood, open formats such as XML or CSV helps to “future-proof” the data. You don’t need proprietary technology to read a CSV file – any programming language or software platform will work.

One issue that isn’t mentioned in Calgary’s motion but which is very important, is licensing. It’s important that when Calgary does make data available, that it does so in the least restrictive way possible. Either public domain, or creative commons, or something similar. It would be a shame if they made a ton of data available and then had ridiculous terms of use around it.

Open data is about empowering citizens to work with their governments. I’m encouraged by the recent interest among municipalities in Canada, and I hope the trend continues.

Edmonton Police Service (EPS) Crime Mapping tool now online

Back in June we learned that the Edmonton Police Service was planning to launch a new website that would enable citizens to find crime statistics for their neighbourhoods. This afternoon, the EPS Crime Mapping tool went online, and it does just that. You can search for stats on eight types of crimes in any neighbourhood across any time period since 2007. From the press release:

The new crime mapping tool will provide members and citizens with a better understanding of what is going on the neighbourhoods they work and live in.

I’ve been playing with the site today, and I like it. There are pros and cons, however.

How It Works

The first step is to agree to the disclaimer – more on that in a minute. Next, you pick the crimes you want statistics for. The eight types include assault, break and enter, homicide, robbery, sexual assaults, theft from vehicle, theft of vehicle, and theft over $5000. Third, you pick the neighbourhood – there are 357 listed in the system. Finally, you select the time period. There are some quick selections such as yesterday or the last 30 days, or you can enter any two dates. Click “Show Crimes” and your neighbourhood appears on the map, covered in colored dots to represent the reported crimes. Here’s what Oliver looks like for the last 30 days with all crime types selected:

There’s also a “View Statistics” tab above the map that will show you a table for the last three years broken down by month, with a graph below that.

The Good

There are some really good things about this site. First and foremost, the data is excellent. I’m glad that they included everything up-front, instead of doing a test release or something to start. Second, it’s built using Google Maps. This is a big win for EPS – it’s a stable technology that Google is continually making better, and I would guess that most Edmontonians are familiar with it. Third, it’s fast. Almost as soon as you click the button, your stats appear.

The Bad

There are two things about the site that I don’t like. First is the disclaimer – it’s too restrictive. These two points in particular are problematic:

While it is acceptable to pass the website link on to others in your community, you will not share the information found on the website with others other than with members of the Edmonton Police Service or other law enforcement agencies; and

You will only use this website and the information in it so you can inform yourself of, and participate in, this community policing initiative;

That effectively means you can’t do anything with the data. This is in direct contrast with what the press release would lead you to believe:

Providing our citizens with the real picture of neighbourhood crime is the first step in engaging them to do something about it. Members of the public will be better equipped with knowledge to work collaboratively with the EPS to reduce and prevent crime.

What’s the point of making the data available if you can’t do anything with it? Why can’t I blog about the crime stats in a particular neighbourhood? Or mash the crime stats up with some other data? I challenge the notion that simply being able to see the dots on a map equips me to do something about crime in my neighbourhood.

I’ve emailed the feedback address listed on the site asking about this, but I haven’t yet received a response.

The second bad thing about the site is that while it does make data available, it does so in an opaque and closed way. If Edmonton is going to become an open city (with respect to data), sites like the crime mapping tool need to provide information for multiple audiences. One is the average citizen who is happy to click around on the map. Another increasingly important audience is the creative professional who wants to do something with the data, and needs it in a machine-readable format such as a CSV or XML file.

The Undocumented API

The first thing I did after testing the site with my neighbourhood was poke around for clues about where the data comes from. It didn’t take long to realize that there’s a JSON web service behind the application. You can access it here. It’s probably not meant for public consumption, but it’s there and it works. I was able to throw some code together in about 30 minutes to get data out of the service. While it would still be good to have static data files available, the API largely negates the con I mentioned above. As it is unofficial however, who knows if it will remain active and working, so enjoy it while you can.

Final Thoughts

Overall I think the Crime Mapping tool is excellent. We need more applications and services like this, though with less restrictive terms/licensing and easier-to-access data. Kudos to EPS for building this, and let’s hope they improve it.

UPDATE: There are more details in this article. For instance, the tool apparently cost $20,000 to build, and is automatically updated each morning.

Foundations for an Open Edmonton

Today at BarCamp, I led a discussion around building an open Edmonton. Inspired by the great things happening in Vancouver, I wanted to stimulate the discussion here. I started with two fundamentals:

  1. The City of Edmonton must have the desire to be an open city.
  2. The primary audience is the Creative Class of Edmonton, the secondary audience is all citizens.

Next, I shared what I feel are the five basic foundations of an open city:

  1. Free – both financially and philosophically
  2. Permissive Licensing – things like Creative Commons, should be public domain
  3. Open Standards – formats that anyone can read and write
  4. Plentiful Data – make as much data available as possible
  5. Timely Access – eliminate delays and give everyone equal access

After my five slides (a photo for each of the above) we got into a great discussion about the idea. Here are some of the questions that came up:

  • Are citizens ready for so much data?
  • Why would City Council not want to be an open city?
  • What is the current state of progress on the idea in Edmonton?
  • How does privacy & security factor in?
  • What are some great examples of other cities doing this?

All things that we need to explore further. I’m not sure what the next step is, but eventually, I think it would be great to make a presentation on becoming an open city to Council.

In the meantime, Edmonton has already made some data available – a Google Transit data feed – and some other examples include London’s mySociety. Also, be sure to read Vancouver’s Open City Motion.

TransitCamp Edmonton: Data for Developers

I’ve been looking forward to this presentation for a long time! As you may know, I’ve been one of the more vocal citizens asking for an API or data dump from Edmonton Transit. I think only positive things will result from giving everyone access to the data! ETS simply doesn’t have the resources to build interfaces for the iPhone, SMS, etc., so releasing the data would enable other people to build them instead!

Today at TransitCamp Edmonton, I’m pleased to share with you that ETS has become the 2nd transit authority in Canada (and 29th in the world) to release their route and schedule information for free in the GTFS format!

Here are the slides from my presentation:

The ETS GTFS data is about 16 MB compressed and 177 MB uncompressed, so it’s quite a bit of data. If you’re looking for some help getting started, I’d suggest checking out the googletransitdatafeed project and the timetablepublisher project.

We’re also going to be holding a programming competition, as a little extra incentive for you to build something cool and useful with the data. So far we’ve got three prizes: 6 months of free transit for first place, 4 months for second place, and 2 months for third place (to clarify: that’s 6 months for the team, not for each individual on the team). I don’t have all the details yet, but stay tuned. I’ll be posting more information on the TransitCamp site (and here).

I think this is fantastic. Open cities are the future, and this is a big step in the right direction for the City of Edmonton.

Mountains of data, right at your fingertips

Last week, two announcements caught my eye. The first was from Amazon.com, which announced that there is now more than 1 TB of public data available to developers through its Public Data Sets on AWS project. The second was from the New York Times, which announced its Newswire API, providing access all NYTimes articles as they are published.

This is a big deal. Never before has so much data been so readily available to anyone. The AWS data is particularly interesting. All of a sudden, any developer in the world has cost-effective access to all publicly available DNA sequences (including the entire Human Genome), an entire dump of Wikipedia, US Census data, and much more. Perhaps most importantly, the data is in machine-readable formats. It’s relatively easy for developers to tap into the data sources for cross-referencing, statistical analysis, and who knows what else.

The Newswire API is also really intriguing. It’s part of a growing set of APIs that the New York Times has made available. With the Newswire API, developers can get links and metadata for new articles the minute they are published. What will developers do with this data? Again, who knows. Imagination is the only limitation now that everyone can have immediate access.

Both of these projects remove barriers and will help foster invention, innovation, and discovery. I hope they are part of a larger trend, where simple access to data becomes the norm. Google’s mission might be to organize the world’s information and make it universally accessible and useful, but it’s projects like these that are making that vision a reality. I can’t wait to see what comes next!

Thoughts on backups with MozyPro

At around 1:30am on August 6th, a hard drive in one of our database servers died. It took down our mail server and WordPress blogs, but everything else (such as Podcast Spot) was unaffected. It sucks, but it happens. We’ve had many drives die over the last few years, unfortunately. All you can do is learn from each experience.

In this case, we had a full image of the server backed up. All we had to do was stick in a new hard drive, and deploy the image. That worked fairly well, though it did take some time to complete. The only problem was that the image was about 24 hours old – fine for system files, but not good for the data files we needed. For the most up-to-date data files, we relied on MozyPro.

(I should point out that we generally configure things so that data files are on separate drives from the system. In this case, we had about 250 MB of data files on the system drive. I have since reconfigured that.)

For the most part, Mozy has worked well for us. We’ve had a few bumps along the way, but no major complaints or problems. Until I tried to restore the data files yesterday, that is. The first problem was that I couldn’t use the Windows interface. The Mozy client would not “see” the last backup, presumably because the image was older than the last backup. You’d think it could connect to Mozy and figure that out, but apparently not. So I tried to use the Web Restore. It eventually worked, but it took about four hours to get the files. I don’t mean to download them, but for Mozy to make them available for download. Thank goodness it was only about 1000 files and 250 MB or it could have taken days!

So I learned that Mozy is reliable, but certainly not quick. If you need to restore something quickly, make sure you have a local backup somewhere. If you’re just looking for reliable, inexpensive, offsite storage then Mozy will probably work fine for you.

My next task is to upgrade this server particular to a RAID configuration, something we had been planning to do anyway. Should have done it sooner!

281 exabytes of data created in 2007

data I typed the title for this post into Windows Live Writer, and a red squiggly appeared under the word “exabytes”. I just added it to the dictionary, but I can’t help but think that it’ll be in there by default before long.

Either it takes three months to crunch the data or March is just the unofficial “how much did we create last year” month, because researchers at IDC have once again figured out how many bits and bytes of data were created in 2007. You’ll recall that in March of last year, they estimated the figure for 2006 to be 161 exabytes. For 2007, that number nearly doubled, to 281 exabytes (which is 281 billion gigabytes):

IDC attributes accelerated growth to the increasing popularity of digital television and cameras that rely on digital storage. Major drivers of digital content growth include surveillance, social networking, and cloud computing. Visual content like images and video account for the largest portion of the digital universe. According to IDC, there are now over a billion digital cameras and camera phones in the world and only ten percent of photos are captured on regular film.

This is obviously a very inexact science, but I suspect their estimates become more accurate with experience.

Interestingly, this is the first time that we’ve created more data than we have room to store (though one wonders if that’s simply due to a lack of historical data than anything else).

Read: ars technica

161 exabytes of data created in 2006

Post ImageThere’s a new report out from research firm IDC that attempts to count up all the zeroes and ones that fly around our digital world. I remember reading about the last such report, from the University of California, Berkeley. That report found that 5 exabytes of data were created in 2003. The new IDC report says the number for 2006 is 161 exabytes! Why the difference?

[The Berkeley researchers] also counted non-electronic information, such as analog radio broadcasts or printed office memos, and tallied how much space that would consume if digitized. And they examined original data only, not all the times things got copied.

In comparison, the IDC numbers ballooned with the inclusion of content as it was created and as it was reproduced – for example, as a digital TV file was made and every time it landed on a screen. If IDC tracked original data only, its result would have been 40 exabytes.

Even still, that’s an incredible increase in just three years. Apparently we don’t even have enough space to store all that data:

IDC estimates that the world had 185 exabytes of storage available last year and will have 601 exabytes in 2010. But the amount of stuff generated is expected to jump from 161 exabytes last year to 988 exabytes (closing in on 1 zettabyte) in 2010.

Pretty hardcore, huh? You can read about zettabytes at Wikipedia. I’m not too worried about not having enough space though, even if we were attempting to store all that data (which we aren’t). Hard drives are already approaching the terabyte mark, so who knows how big they’ll be in 2010. Then of course there’s also the ever falling costs of DVD-like media.

More importantly, I bet a lot of the storage we “have available” right now is totally underutilized. You’d be hard pressed to find a computer that comes with less than 80 GB of storage these days, and I can assure you there are plenty of users who never even come close to filling it up. Heck, even I am only using about 75% of the storage I have available on my computer (420 GB out of 570 GB) and I bet a lot of it could be deleted (I’m a digital pack rat).

Read: Yahoo! News