Post mortem: scraper

This is a post mortem to and older project. Somebody on Facebook was after utilization rate development of parking garages in Jyväskylä, Finland. The company Jyväs-Parkki was not publishing any statistics. I believe this was in Spring of 2012. There was buzz around open data around so +1 web scraper project now in queue.

I was looking for a project to try out RRDTool and thought this was perfect. Figuring out the present numbers didn’t take long:

Getting the graph to be drawn didn’t take much longer:

So there was now a steady stream of RRDTool graphs per garage and with different time intervals that got updated every 30 mins except night time:

I presented the idea and the implementation briefly in Open Knowledge Roadshow event held in Jyväskylä in Aug/Sep 2013.

A couple of years later I decided I had wasted enough of inodes from my server as the script saved a new file every 30 minutes. So I decided to put the data to database, but I wasn’t going to do this anymore in bash and converted the scraper in Python. This only required 20-something lines and depended only on lxml and requests libraries.

During that session I only replicated the existing functionality that printed the numbers in stdout, just like grep script. I didn’t get to incorporate any proper DB then.

I remembered the inodes again one or two years later and thought there’s now some easy integrating to be done.

While I was pondering on which connecting library to use, I noticed that a distribution upgrade had broken my Python 2 so that there were missing datetime import messages flowing to stderr and nothing coming to stdout. After investing a moment to a more robust solution I ended up with almost a year worth of empty files and no benefits for using Python there.

So that went well.