26

Getting Stuff Done at Hackathons for Rookies

 4 years ago
source link: https://www.tuicool.com/articles/uIfQj2R
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

JzAzi2i.jpg!web

I thoroughly enjoyed my first hackathon ( you can read about my experience about scope from a previous post ). The opportunity arose through BetaNYC to participate in the Mobility for All Abilities Hackathon, part of the larger National Day of Civic Hacking of 2019.

I was on the Reliable Access to Subways team, partnered with TransitCenter and United for Equal Access NY and our prompt was this:

We want to explore *why* elevators and escalators breakdown, and present the data in a way that illuminates patterns and insight into solutions.

Here are some things to keep in mind if you’re going to your first hackathon from a hackathon sophomore and I’ll work them in between the work I did.

Take Stock of Your Team

It’s tempting to get started coding immediately, but if you jump in right away you’ll never have the focus necessary as a team to succeed. Is everyone technical? Does everyone write in the same language? What're your backgrounds? What does everyone have an interest in?

Get an understanding of the problem at hand

In our case, the problem was much larger than an afternoon of work could solve. We had two non-technical subject matter experts on our team, Colin from TransitCenter and Dustin from United for Equal Access NY, who helped guide the conversation around our prompt.

The MTA releases a PDF from its quarterly transit and bus committee meetings. Each is hundreds of pages long, but what we were focused on was the report of the elevators and escalators with less than 85% availability. Buried in this 491 page report from Q2 of 2019 is a table on pages 380–384.

Each row is assigned to an elevator or escalator, each of which is at a subway station, its availability as a % of the quarter, and comments. The comments are the most problematic part of the report. Here’s an example of the comments from escalator ES622 at the Hudson Yards station:

The escalator was out of service from 4/15/19 to 4/24/19 to repair and adjust the combstop and impact safety devices. The controller was also repaired due to a loose wire. The sprinkler system failed and caused a flood. The water was pumped out and the sprinkler system was repaired; the escalator was tested and returned to service. The escalator was out of service between 5/17/19 to 5/23/19 due to a safety check and related repair work. The combstop safety device was adjusted, the left handrail chain was replaced and adjusted; the escalator was tested and returned to service.

Within this, there were two outages and for each there were multiple things that caused the outage and multiple things done to fix it. The information isn’t available anywhere else, so our goal was to extract the table from the PDF.

Take an assignment

The first step to making this information usable is to extract it from the PDF, from there we could start to separate out each incident form the comments.

Someone was assigned to determine all the types of reasons the elevators and escalators broke down and how they were fixed, someone was assigned to attempt to build script to interpret the different types of problems and repairs. Multiple people took assignments to find different ways to convert the PDF table into a database that’s able to be manipulated.

Heads Down, Do Some Work

You’ve got your assignment, you know how it’s going to play into the larger now it’s time to do some work. In my case, I was to create csv files from the PDFs.

NZBv2ui.png!web

Here’s the first page of the PDF I needed to convert

To get everything working quickly, I used PyPDF2 to read the PDF. I had initially intended on using something more complex to read the file, but by the time I started coding I had less than four working hours to get something functional.

PyPDF2 pulled the text and revealed some problems with the formatting of the file. Here is an example:

‘ES235’,
’34 St-Herald Sq ‘,
‘BDFM84.34%’,
‘The escalator was out of service from 12/4/18 to 12/11/18 due to worn out handrail and countershaft chains ‘,
‘as well as defective brake assemblies. The countershaft assembly and chain were replaced and adjusted. ‘,
‘The right handrail chain was adjusted. The main brakes were replaced and adjusted as well as a controller ‘,
‘’,
‘relay; the escalator was tested and returned to service. The escalator was out multiple times due to the ‘,
‘’,
‘activation of various safety devices. Those safety devices were tested and adjusted as needed.’,

The line breaks are hard-coded, the subway lines are on the same line as the percentage and there are a bunch of extra line breaks in the comments for no explained reason. It’s messy, but it’s what we can work with.

Take A Break, Grab A Snack

Remember you’re at an event with other people. Socialize, if there’s food you should get a meal in and check in with the rest of the room too. These are great ways to meet people.

Ask Around

If you’re working on a portion of the project solo, you’re probably sitting with a bunch of people who do similar work and, in the case of this hackathon, there were a few people working towards the same goals. I happened to be sitting near my Flatiron School cohort-mate Jen McKaig and being able to talk through some of the issues we were encountering was immensely helpful.

Let’s take a look at where I landed with my function:

def convert_transit_pdf_to_csv(pdf, start, end):

'''
Input is a PDF file in the local directory, the first page
of the PDF and the last page of the PDF. The function adjusts
the first page to account for zero-based numbering. It will
output a csv file back into the directory.

There are a few issues with the code as-written. If an escalator
or elevator is avialable 0.00% of the time, it will add an
additional digit or character to the availability column. There
is one other issue I've encountered where the subway lines
aren't formatted properly.

The comments will occaisonaly cut off and a fix for that is the
first priority once this code is re-visited.
'''
page_range = list(range(start - 1,end))pdfFileObj = open(pdf, 'rb')pdfReader = PyPDF2.PdfFileReader(pdfFileObj)lines = []
availability = []
units = []
stations = []
conditions = []
condition = ''for page in page_range:
pageObj = pdfReader.getPage(page)
current_page = pageObj.extractText().split('\n')
# the last two lines of each page are the page identifiers, so it's the current page without the last two lines
for i in range(len(current_page[:-2])):

# removes some titles that would otherwise be caught
if not re.match(string = current_page[i], pattern = '.* THAN 85% AVAILABILITY'):
if len(current_page[i]) > 1:
# this is less than ideal and occasionally cuts off the last line
# of a comment if it's under 40 characters. This was about as quick
# and dirty as it comes.
if len(current_page[i]) > 40:
condition += current_page[i]
# this would be [-6:] if all availabilities were above 10%,
# but some are available 0.00% of the time
if re.match(string = current_page[i][-5:], pattern = '\d\.\d{2}\%'):
availability.append(current_page[i][-6:])
lines.append(current_page[i][:-6])
# identifies the elevator or escalator unit
if re.match(string = current_page[i], pattern = 'E[LS]\d{3}'):
units.append(current_page[i])
stations.append(current_page[i + 1])
if len(condition) > 1:
conditions.append(condition)
condition = ''
# specifically looks for the end of the page and ends the 'condition'
if i == len(current_page[:-2]) - 1:
conditions.append(condition)
condition = ''
df_stations = pd.DataFrame(
{'units': units,
'stations': stations,
'lines' : lines,
'availability' : availability,
'condition': conditions
})
df_stations.to_csv(pdf + ' converted.csv')

You can see the latest update to this on my Github repository for the BetaNYC Mobility for All Abilities hackathon .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK