OpenRefine for NHS Librarians

I have been lobbying for Library Carpentry courses for NHS librarians for years now and they are finally happening – yay! Thank you so much to Holly Case Wyatt and Health Education England for making this happen. I think that data skills can be quite intimidating for many librarians for a couple of reasons: 1) the terminology/jargon is impenetrable! 2) it’s hard to see how it can be useful in a practical, day-to-day sense. To help encourage other librarians to get involved with data manipulation using OpenRefine, I’m going to write some step-by-step tutorials on how to accomplish some basic tasks relevant to NHS librarianship.

There are loads of reasons to learn to code. For me personally, it comes down to two main ones:

  1. Save time – it really is quicker in the long run. Yes, you have to invest the time and effort to learn initially – it took me about 7 hours to figure out how to configure the spreadsheet that I share in example 2 and I could probably have catalogued all 136 books in that time. But now I can repeat this task with a spreadsheet of 10000 books in less time!
  2. It is empowering. I think we should be better equipped to communicate and negotiate with vendors. I also think that librarians should actually be at the forefront of developing our own software and championing open source solutions – we know our needs and our users’ needs best, so we are best equipped to create solutions that work for our context.

First, some background and disclaimers. I am largely self-taught and am by no means a programming wizard. I learned HTML and CSS as a kid to make my Final Fantasy fansite pretty, and have taken an Introduction to Python short course in 2019 and a MySQL one about 10 years ago, of which I remember very little. I have also started a few Code Academy courses online but never finished any! I think the key to learning basic coding is to just be willing to try things out and make mistakes. Also I learn a lot from example and do plenty of shameless googling and copying of other people’s code, editing it so it works for me. Often I don’t actually know why it works but as long as it does what I need it to do, that’s enough for me. My Google search history looked like this: ‘openrefine add column from another spreadsheet’ ‘openrefine delete comma end of string’ ‘openrefine extract text after a string’ – someone somewhere has probably tried to do what you are trying to do and there will be a post on the internet with the answer.

Any questions do let me know and I will do my best to answer! yiwen.hon@rmh.nhs.uk or on Twitter as @yiwen_h

Step by step guides for 3 different practical use cases for OpenRefine for NHS librarians:

Useful resources: