25

Your Bot Reaps What Judges Sow: RPA Data Harvesting from the United Kingdom Supr...

 4 years ago
source link: https://www.tuicool.com/articles/67BzM3m
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

… without using a single line of code, of course

Oct 6 ·5min read

YvUvamz.jpg!web

Image by BBC News

This month marks ten years since the Supreme Court of the United Kingdom (UKSC) gained its independence from the upper House of Parliament. This can mean only one thing to all data enthusiasts: a decade’s worth of data on the latest law of the land. But people who have read law — myself included — might yet find it tricky to code from scratch to extract information directly from a webpage. This post guides the readers on how to save that hassle by using robotic process automation (RPA).

Robotic Process Automation

m2auIjY.jpg!web

Image by Rock’n Roll Monkey on Unsplash

Robotic process automation is an automation technology involving a bot, or your own AI workforce. Unlike conventional automation tools, RPA uses the graphical user interface (GUI), which allows your computer to “watch” your action and repeat after you. You merely show, not tell, what is to be done. You do not have to speak in a complicated machine language to make yourself understood. The application will mimic the task repetitively. This is simple yet powerful.

There are four kinds of RPA, listed below in the ascending order of sophistication and specialisation.

  1. Web scraping software: collects data and saves it into a structured, processable format
  2. Templatised software: provides a kit with which specialist programmers can build a bot that delivers beyond data collection and synthesis
  3. Enterprise-level software: scalable, reusable, and can automate large business operations
  4. Sector-specific software: customised to a particular and complex procedure, eg accounting

In this guide, we are going to try the basic web scraping. The RPA platform I will use is called Automation Anywhere, but there are also other providers like Blue Prism, UIPath, and so on. You might want to consider the scale and the complexity of automation you aim to achieve when choosing a bot provider, but community editions should suffice in most cases and are available free of charge.

For our purposes, the harvesting process is straightforward. It is threefold: (i) setting up, (ii) showing your bot what to do, and (iii) letting it do the job for you.

Step 1: Setting up

3Q3MfuY.jpg!web

Image by Lorenzo Cafaro on Pixabay

Launch the RPA application and change the recording option in the top left-hand corner to “Web Recorder.” This is found in the drop-down menu.

zM732aM.png!web

Open the website that you want to harvest data from. In my case, that would be the UKSC website, specifically the page in which the decided cases are published. Note there should be a set of data available in a tabular format.

vQVFJjf.png!web

https://www.supremecourt.uk/decided-cases/index.html

Copy the URL and go back to the application. Hit the Record button and paste the address into the box that pops up.

7jURRbe.png!web

Press Start. This will enable you to demonstrate to your bot what to do. Upon clicking Start, the URL will load on a new browser window. The website should look the same as when you initially visited it to retrieve the URL, except there will be a control bar hovering over it.

MVfaiqQ.png!web

Step 2: Showing your bot what to do

3e6zmaY.jpg!web

Image by Frank Busch on Unsplash

There are two modes of extraction on the control bar — “Extract Data” and “Extract Table.” Since I want a wholesale extraction of data, that is, every single row and column, I will proceed with “Extract Table.” If you, however, would rather drop certain columns, you may click on “Extract Data” and show the bot a desired pattern of what to include and what not to.

Once the bot identifies the tabular data that you wish to extract, it would enclose the content in a green boundary. Click on the table to confirm.

fAFJRnU.png!web

You are then allowed a preview up to 50 rows of what the harvested data would look like. Everything looks fine, so I will proceed.

Be sure to check the box if the table spans across multiple pages. You will also need to show the bot how to move on to the next page by capturing the link to that page.

3YFvy2R.png!web

Click Next, give this set of data a name so it can be saved as a csv file, and select an appropriate encoding if necessary. Hit the “Finish” button and also “Stop Recording.” This will take you back to the application, where you can save the entire process as a single task for your bot to repeat.

Step 3: Letting the bot do the job for you

NBzaIjj.jpg!web

Image by Franck V. on Unsplash

We are almost there. Run the saved task.

There will be a run time window and, finally, a fresh set of data — sowed by the UK Supreme Court Justices for the past 10 years and reaped by my bot in less than 10 seconds.

b2aquyY.png!web

When you are a beginner, coding to gather data online could feel daunting, if not impossible. But with RPA, you do not have to give up the data that you want to work with.

X. Lhuer, The Next Acronym You Need to Know about: RPA (2016), McKinsey Digital


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK