Introduction to data science at Trainline

Hello and welcome to Trainline’s data science team! We know that the role of a data scientist can vary from company to company — this blog post aims to introduce you to who we are and how we work here at Trainline.

One team

As data scientists, we are part of a wider data team, along with Data Engineering and Business Intelligence. Even though we are a single team, we’re usually all working on different projects with teams across the business — though sometimes we pair when it’s a particularly large, or time sensitive project. Most of the team work on more general data science projects — a mixture of building data products with the product team as well as investigative work providing insights for teams across the business — but we’ve also got data scientists who focus purely on experimentation, or real-time data.

We’re a close team, and meet daily to check in on each other’s projects, and discuss any problems we’re having to help develop solutions together. It’s so often the case that one person in the team might have already solved an issue that someone else might be tackling — talking things through together saves so much time and frustration! We also tend to hold team data discovery sessions when one of the team is kicking off a new project, again helping to come up with new ideas that one person alone might not have thought of.

We’re a pretty diverse team and have all become data scientists through different paths. Some of us have physics or maths backgrounds, whereas others of us studied Computer Science, Economics, Bio-Med or even Psychology as undergraduate degrees and beyond. There really is no right way into data science, and we think the variety really helps us to see things from different viewpoints, bringing different insights and ideas to the table. Beyond the coding and tech knowledge, the most important thing for us is that we all have a quantitative focus and an analytic, enquiring mind!

A Typical Project

Our work tends to fall into one of two categories: building data products (such as SplitSave or Crowd Alerts), or analysing data to provide insights for a team.

Data products start with a big discovery session with the rest of the product team. These sessions can last for up to a week and involve lots of post-it notes and sketching to fix on the product’s design. When building SplitSave, for example, we worked alongside Customer Research, Design, product owners and developers to understand the problem and customers’ opinions. We also discussed potential designs, particular pain points we wanted to solve, and debated how much was actually within the possibility of technology! Data science can often be seen as a magical solution to everything, and it’s important for our team to be realistic about what we can achieve!

When working on more investigative projects, we will kick off our sessions by sitting down with the team and really understanding the problem they want to solve — for example, helping the CRM team to understand customer bookings. With our knowledge of the data, we’re often able to suggest additional questions to explore and help guide the analysis forward — for example, we could look into whether customers tend to rebook the same journeys at regular intervals, or whether there are related stations, that customers who travel to one station, also tend to visit. These can then lead on to new data products later on — we’ve now built a rebooking model to help customers rebook their regular journeys earlier before prices rise, and an inspiration model to help customers find new destinations based on where they might have visited in the past.

Preparation is key

Once we have agreed the scope of the project, the next step is to make sure we have a data science plan in place to solve this. Sometimes this might be very obvious — we can get really powerful insights simply from basic maths and aggregations — but sometimes projects require a new machine learning model that we haven’t used before. As a team, we run regular “Journal Club” sessions to keep each other updated on the latest developments in the world of data science, which can really help when sketching out new projects. As this can be very experimental, it’s really important for us to allow enough time for this research, and not to over-promise before we know a technique can work!

Before we can get working with the data, it’s also really important to spend time preparing the data. This part of a project can often get brushed over — it’s a lot less glamourous than machine learning — but its’s also a really important step. As they say, “Garbage In = Garbage Out”, and if we don’t spend enough time making sure our data is ready for an algorithm to learn from, then we can’t really expect to get anything useful out the other end. This work can involve dealing with badly formatted data, fixing incorrect data, or finding alternative sources if we are missing or have very little data. If we’re going to be using machine learning, we also need to do some feature engineering to ensure that the features of the data are ready for a model to learn, as well as having a clear target. If we wanted to train a classifier to identify between a customer travelling for leisure or for business for example, we need to give the model examples of both types of customers to learn from: features could perhaps include the time of day or the ticket type, and the label of whether this journey was indeed leisure or business.

Focus on impact

This middle coding section is where we spend most of our time. Depending on the project, we might be focused on building and training a machine learning model, exploring the data and producing graphs showing relationships between different features, or setting up pipelines to ensure jobs continue to run automatically each day.

When finishing a data product, we usually build an API for the app/web developers to integrate with. Through this they give us information about the customer’s session and we can return the relevant results. For example, when a customer is searching a UK journey in our app, our Split Ticket Recommender predicts the best place to split the ticket for that journey in order to maximise savings and then serves this information back to the customer, as part of their search results. It often takes a bit of work and testing to make sure that the API requests and results are in the format everyone is expecting.

When we’re providing insights back to a team, this is usually done in the format of a presentation. Being able to communicate clearly is a valued skill with data science — we often have to explain some really complicated ideas when presenting our findings and recommendations. It’s always great to help teams find additional understanding in data they use so often, and the possibilities of where to go next are often endless!

Introduction to data science at Trainline

Introduction to data science at Trainline

Recommend

Consumer Behavior Building Marketing Copyright PDF 0848ef05f

Introducing webpack-bundle-delta

招人啦 | 欢迎你加入 Chainlink 中国团队！

Bridge Mutual 为 Boson Protocol 提供风险保护

货拉拉前车之鉴高德打车存隐忧

NFT 为艺术家创收，但其引发的侵权问题也不容忽视

听说 Casper Labs CEO 开撕 Vlad？这个项目到底是怎么回事？

三四线共享电单车上热搜两轮行业再“迎春”？

十四五”规划纲要中“区块链”的七大关键词

Neo4j Connections: Graphs for Cybersecurity in APAC

About Joyk