Connecting the Aid Flows — making sense of the IATI data

Walkthrough by Mark Brough, Publish What You Fund, and Rufus Pollock, Open Knowledge Foundation.

The International Aid Transparency Initiative is a political agreement by the world’s major donors – including international banks, private foundations and NGOs – on a common way to publish aid information. It also defines a technical standard for exactly how that information should be published, IATI-XML.

So far, 29 donors representing 74% of Official Development Finance (ODF) have committed to publishing to IATI. A further 13 donors representing 45% of ODF have already published, and 12 NGOs and foundations have published their own data.

This page details how we converted each donor’s data, using simple scripts and open source tools, from raw XML data in the IATI Registry into a consolidated dataset and then, via loading into OpenSpending to visualisations like those shown above and an easy-to-use RESTful API.


The IATI Registry and pulling data together

Data publishers convert their data to the IATI format, publish it on their websites, and then register it with the IATI Registry, which runs on the CKAN data portal software.

This decentralised structure of open data feeds, rather than a centralised database, is an important part of IATI. It creates flexibility to allow data providers to publish data in a format that makes sense for their business model, provide live feeds coming straight out of their system, and continuously update and improve the data, in terms of the coverage, proportion of fields used, and quality of the data entered within those fields.

It also means that many applications can easily take the data – because it is publicly accessible, openly licensed, and in a standard, comparable format – and create interesting tools and visualisations.

However, this decentralised structure also creates challenges for people who want to use the data: quite a lot of technical knowledge is required to convert IATI-XML data into a format that is suitable for analysis. The data has to be downloaded from a range of websites, and then parsed to extract the information from XML files, into a format more suited for analytical purposes – e.g. to aggregate and analyze project funding. This is why it makes sense to place it in (for example) a relational database, a spreadsheet or another format for presenting the data.

So far, 29 donors representing 74% of Official Development Finance (ODF) have committed to publishing to IATI. A further 13 donors representing 45% of ODF have already published, and 12 NGOs and foundations have published their own data.


Consolidating the data into a simple format

So a first step we took was to consolidate the IATI data into a simple CSV-based format:

  1. Get data files from the IATI Registry. Fortunately, the first part of the process is made easier because the IATI Registry uses the CKAN API. This allows easy access to download the entire corpus of data.

  2. Convert into an SQLite database. IATI data contains detailed lists of activities (projects or programmes) that contain many transactions (incoming or outgoing financial flows). Activities are classified in various ways, with multiple recipient countries and sectors. Activities can also be hierarchically related to each other (a project can be part of a bigger programme). A database structure is therefore a good first step to ensure that these relationships are maintained, and that the data is accurately represented.

At this point we have a nice complete version of the IATI data in an SQLite file.


Loading into OpenSpending

A large aspect of IATI is about money – which projects are being funded, where, and transactions from one organisation (the funder) to another (the implementer). So a natural next step was to get this data into OpenSpending, which requires data in a simple CSV-based structure. So next:

  1. scripts create a file in the OpenSpending CSV format by breaking each transaction in each activity into sectoral sub-components

  2. the value of each transaction is then prorated to the proportion of the activity that is assigned to each sector (again, activities can be assigned to several sectors)

Aside: This means that when the data is aggregated by sector, the aggregate values should be accurate. However, it also means that artificial transactions are created in order to make this representation possible. A sensible way to deal with this would be to create a view of the data that shows only activities, or only “real” transactions, rather than the mini-transactions necessarily created through this process.

With that done we now have a 600Mb CSV file of IATI transactions which we uploaded as part of the IATI dataset on the DataHub.

Finally, we can use the OpenSpending import process and its model editor to import the data and provide relevant information such as the types and roles of the various columns.


API Access

As well as the user-friendly interface, you can query OpenSpending via an API. This allows you to pull out data for use in other applications.

For example, you can see the biggest implementing organisations of education projects in Uganda in 2011:

http://openspending.org/api/2/aggregate
    ?dataset=iati
    &cut=time.year:2011
        |recipient_country.name:ug
        |sector.name:11220
        |transaction_type.name:d
    &drilldown=to

Or the biggest recipient countries in 2011 for HIV/AIDS projects:

http://openspending.org/api/2/aggregate
    ?dataset=iati
    &cut=time.year:2011
        |sector.name:13040
        |transaction_type.name:d
    &drilldown=recipient_country

Or the largest funders of projects that have been reported so far in 2012:

http://openspending.org/api/2/aggregate
    ?dataset=iati
    &cut=time.year:2012
        |transaction_type.name:d
    &drilldown=from

This allows you to visualise the data very easily using something like the Bubbletree, seen here.

Comments are closed.