You are browsing the archive for events.

Hakuna My Data: NBO Data Bootcamp

January 30, 2012 in Data Journalism, events

This post is by Friedrich Lindenberg, developer on OpenSpending.

“My Name is XXXX, I am a member of the Kenyan parliament for the constituency of XXXX in the 2007-2012 election cycle. During my time in parliament, I have positioned myself against taxes for MPs.

Of the Development Funds allocated to my constituency, I have spent 12mn KSH in 2010 and 8mn KSH in 2009. Since 2007, I’ve funded 201 projects, of which 72 (9mn KSH) related to Education, 56 (7.2mn KSH) related to Health and 20 (4.2mn KSH) to Infrastructure.

The largest projects I have funded include… “

Auto-generated, spending data-driven campaign speeches like this are just one of the many ideas of the Data Bootcamp that took place in Nairobi last week. Invited by the African Media Initiative and the World Bank Insititute, about 70 participants – both journalists and developers – met on Strathmore University’s campus to learn and practise both the skills and tools required for data-driven reporting.

The four-day programme combined tools training with practical work in small groups. Elena Egawhary (BBC NewsNight) gave a workshop on data analysis in Excel, Sreeram Balakrishnan (Google Fusion Tables) introduced both Refine and Fusion Tables. Team members from both the Kenya data portal and the World Bank finance site presented their respective offerings, while Gregor and myself from the OpenSpending team gave intros to web scraping and advanced map visualisation.

During group work, journalists and developers teamed up to try their newly learned skills in different domains ranging from sports (football player profiles) to education (missing toilets in schools, “The Shit Ordeal”) and the financial transparency story-telling mentioned above.

The workshop also served as a community-building event for Kenya’s young and impressive Open Data initiative. Future events, aimed at civil society organisations and polictical actors will help to further promote the re-use of government information released through the initiative.

All this is happening in a place where transparency is an essential tool to be developed: Not only is the access to information now guaranteed by the 2010 Kenyan constitution, there are also major political issues that deserve close attention from local and international watchdogs. These include not only the ongoing incursion of Kenyan troops into Somalia in an effort to fight Al-Shebab terrorist groups, but also the upcoming nationwide elections in December 2012. The elections will instate a new bicameral system of government, with many previously unknown candidates standing for office. In the previous 2007 vote, bad polling station data had quite literally led to widespread unrest and thousands of deaths across the nation.

In all, it was a fantastic to get in touch with the Kenyan participants of the workshop and to see how the organizers of the event – a brilliant team including Craig Hammer, Justin Arenstein and Jay Bhalla – are working to foster an open data community in this bustling developing nation.Given the great ideas generated during the team sessions, I’m sure this work will soon bear its first fruits.

Data = Seized, Sanitised and Sanity-checked. Open Data Day 2011

December 12, 2011 in events

This post is by Mark Brough, Research Officer at Publish What You Fund, Lucy Chambers, Community Coordinator for OpenSpending, and Irina Bolychevsky, Product Owner for CKAN. It is cross-posted on the OpenSpending Blog and the Open Knowledge Foundation Blog and Mark Brough’s contribution is also featured on aidinfolabs.org.

Saturday, December 3rd was Open Data Day, and London took the challenge to throw a hackday to help data be opened, cleaned and shown off to the world…

Fuelled only by enthusiasm, caffeine and 5 packets of ready-made popcorn, the CKAN, OpenSpending and IATI teams, along with some new faces, joined forces to liberate as much data as they could…

OpenSpending + IATI + CKAN

As part of the IATI Open Data Day challenges, Mark Brough did some work to get the existing IATI Data into OpenSpending. David Read, from the CKAN team, and a new face to the data wrangling crew, Johannes, scraped data on aid donations from France and Austria that were locked-up in web apps in order to help fill in the gaps in the global aid data jigsaw puzzle. You can see the results on OpenSpending.

The French (AFD) and Austrian (ADA) aid data appears to be incomplete: the AFD’s [2010 Annual Report]http://www.afd.fr/jahia/webdav/site/afd/shared/PUBLICATIONS/Colonne-droite/Rapport-annuel-AFD-VF.pdf suggests that South Africa is the biggest recipient country, receiving €403 million, but in the data, Morocco is the biggest recipient and there are no transactions in South Africa.

The Austrian Development Agency data was carefully cleaned by Johannes, with region and country codes being added for all entries to create a tidier dataset. However, the original data contained, for example, four different spellings of Bosnia and Herzegovina, suggesting that countries are being manually entered rather than selected from an existing list. [For 2010]http://openspending.org/ada/?_time=2010&_view=country, the second biggest recipient of the Austrian Development Agency’s aid (after aid not going to a specific country) appears to be Austria.

Nevertheless, despite the issues surrounding data quality, it was a useful exercise to show both the value of open data – that if you release your data, you can do pretty cool things with it – and the costs of keeping it locked away, namely that the data then has to be scraped from sites in quite a labour-intensive way.

These, along with many other datasets discovered on the day via tweets and emails have been added to the Open Data Day Group on theDataHub.org.

On the same day, we worked to get the data released as part of the International Aid Transparency Initiative into OpenSpending. You can see the results of the IATI wrangling process on OpenSpending.org/iati. This following section is written by Mark.

1. Getting the data

Downloading the existing IATI data has already become quite a big task; with 19 publishers so far, the data currently amounts to over 750MB with 1169 packages. Fortunately this is made easier by the IATI Registry, which provides an API to access all existing datasets, and a simple script (links at end) can retrieve all of the data.

2. Extracting the data

Extracting the data from the XML files is more complicated. Although IATI data uses a standard schema, there are a few cases where publishers have either used the markup incorrectly, or else interpreted the definitions slightly differently. This can be simple problems such as stating that an organisation is “implementing” rather than “Implementing”, or placing the date within the text of the tag and not the “iso-date” attribute of that tag, or more significant problems such as placing implementing organisations in the “accountable” organisation field.

However, these problems are still fairly limited and follow fairly regular patterns, so they are not too hard to overcome. There are more significant problems when some donors have for example used three-letter (ISO-3) country codes, rather than two-letter (ISO-2) country codes. (This is considered below in “next steps”.)

3. Wrangling the data

OpenSpending is designed to show spending data, and has a powerful aggregation system to show large collections of transactions in a meaningful way. However, IATI data is organised by activities, with transactions nested within activities (projects), and – reflecting the business models of funders – activities sit within other activities (e.g., projects within programs), although they are not nested in the actual XML. Furthermore, one of the significant advantages of IATI compared to other aid data formats is that it permits multiple sectoral classifications, allowing you to assign a proportion of the value of an activity to each sector. So, you might have an activity that is 50% related to health and 50% to education.

To prepare the data for OpenSpending, each transaction inherits the properties of its activity (and, if that activity has a parent, that parent activity’s title and description). Then, the transaction is broken out into mini transactions, with the proportion of the activity assigned to each sector used to assign a proportion of the value of the transaction to each sector. So, from transactions, you get mini “sector-transactions”.

This takes about 40 minutes to compile, and then one final step remains: to convert the currencies to a single currency. Currently, USD, EUR and GBP amounts are used in the IATI data. All data is converted to USD using the average for 2010 from the OECD’s Financial Indicators (MEI) dataset. (This is also considered below in “next steps”.)

4. Loading the data

OpenSpending’s new web-based loading interface makes it relatively easy to load data in, although you currently also have to write a model and views (links at end).

Results

The results can be viewed in the OpenSpending IATI dataset. You can explore the data by recipient country, sectors, funding organisation, and drill down through the data to see the data for an individual country.

Problems with the data

So far I’ve noticed the following problems:

  • “Unknown” recipient location is incorrectly marked as “South Sudan”
  • Recipient countries are listed twice, as Spain has used ISO3 rather than ISO2 country codes.
  • Sweden is listed as “Ministry of Foreign Affairs” (this is how they have listed themselves as the Funding Organisation in the data)
  • Sweden’s implementing organisations have been lost as they placed them in the accountable organisation field.

Please let me know if you see anything else problematic, if you have and criticisms of feedback of the way the data has been presented, or if you think there are other ways you’d like to be able to explore the data, based on the available dimensions.

Next steps

As mentioned above, there are some problems with the data which should properly be dealt with at the level of the donor agency. But there are others that will probably have to be dealt with by users of the data:

  • Mapping between different sector vocabularies, so that you can see all “Health” projects, and not only the health projects according to a single vocabulary
  • Mapping between countries and regions, so that every project in a country has a related region
  • Correctly converting currencies using the “value-date” column to get a more precise (at least month-specific) conversion.

What else have you noticed with the data? Is there anything else that should be changed? Anything interesting?

You can contact Mark about this data via the OpenSpending mailing list

Useful Links

Please create an account to get started.

Subscribe to the OpenSpending blog

Tweet Blender

LaurieJLaurieJ: @Peston @hmtreasury there's also http://t.co/q1YHFkCy for simple web visualisation of UK tax from @openspending
13 months ago from Twitter for Mac
openspendingopenspending: Rolling out a new content management system. Let's start with something easy: EU spending overview - http://t.co/H7pUum19 #openspending
13 months ago from Twitter for Mac
openOVopenOV: @jjovanos En #D66 #CDA #PvdA en #VVD kiezen voor het tekort in potentie met 40 miljard vergroten. #openspending nu echt hard nodig.
13 months ago from Qwit