How a Wildlife AI Platform Solved its Data Challenge

WildMe is a non-profit machine learning service provider for field biologists studying wildlife and conservation. But before you can create whale shark algorithms, you need good data.

Credit: WildMe

Anyone working in data management and data science can attest to the challenge and time-consuming nature of mapping a set of data from a new source into a platform where it can be cleaned, validated, and ultimately analyzed and used to train algorithms. After all, your algorithms are only as good as the data used to train them.

Now imagine if these data sets are coming from hundreds of external users who have employed any number of systems to collect this data, from Excel files to actual shoeboxes full of photos. That is the challenge that non-profit wildlife conservation machine learning and artificial intelligence service provider WildMe has faced over its more than a decade of operation. The organization builds open software and AI for the conservation research community. The organization is made up of technologists -- software and machine learning pros -- and it is designed to be the "trusted engineering powerhouse for wildlife biologists across the globe."

This AI software enables researchers to track individuals among different species -- whale sharks for example -- identifying them by unique patterns of spots. WildMe created this initial use case algorithm and technology through a modification of a Hubble space telescope algorithm that looked at the pattern of stars in the night sky, according to Jason Holmberg, the organization's executive director, co-founder, and director of engineering.

Jason Holmberg Credit: via WildMe

During a scuba trip in Djibouti in 2002, he saw his first whale shark and learned how researchers physically tagged and tracked the animals. He thought there might be a better way, through computer vision algorithms that could identify individuals by their unique spot patterns. This work turned into Whaleshark.org, a library of encounters and individual whale sharks used and maintained by marine biologists.

But that was just the first use case. From there WildMe expanded as a platform for other animal researchers, allowing them to upload their data to catalog a series of other species from manta rays to giraffes to sea dragons. The platform serves more than 200 organizations and nearly 1,000 researchers tracking nearly 90,000 animals around the world with close to 444,000 sightings in its database.

The challenge of moving biologists' catalogs of encounters and sightings and individuals into the WildMe platforms has been a thorny problem from the start.

"It's been an evolving process," said Holmberg. "When we first started working with biologists across the globe, we would write custom importers for every piece of data. That custom one-off code would take weeks."

Ben Scheiner, a WildMe senior software engineer describes it this way: "We had our own hand-rolled JavaScript framework for doing data imports. But it was buggy. We are focused on ecological problems, and AI and machine learning is our key service. Understanding this data onboarding deserves its own company and suite of solutions. That's something we were unable to do on a non-profit bank account."

Ben Schneiner Credit: via WildMe

There were no universal standards for how individual researchers cataloged their data. Each researcher created their own system.

Because of this, the idea of a "universal data importer is sort of farcical," Holmberg said. "But we were able to solve half the problem." WildMe started using a tool to let field biologists begin mapping their data to a common set of fields and descriptors. These biologists could review the data in the system and then approve it.

While this streamlined the process and made it faster, there were still issues that could be improved. The system wasn't all that scalable, and it didn't let the researchers validate their own data. WildMe began piloting a tool from a company called Flatfile, designed to solve the issues of processing and validating external data from multiple sources.

David Boskovic founded the Flatfile after working at a few different SaaS companies and running into the same annoying problem each time: how to get new customers' data into the system when each customer had used different systems.

"It has been a universal problem. The cost and effort of bringing data in is one of the costs of innovation," Boskovic said. But it was very frustrating. "I like to say I rage-designed this product."

David Boskovic Credit: via Flatfile

The other aspect of bringing data into a system is that your customers need to maintain ownership and control of that data. That's important for marketers. It's also important for field biologists. It's one of the reasons why WildMe pursued the pilot with Flatfile.

"It's an intuitive system whereby a field biologist can maintain ownership of their data through the process of importing it into our system, and it will do things that we didn't currently have like data validation," Holmberg said. For instance, it will help "make sure all the GPS coordinates are in the right format. These are human-curated data catalogs. They do have errors."

During the validation process anomalies are presented back to the biologists who curated the data so that they can go back and clean up the data. This lets biologists see their data in one of the WildMe platforms and work with that data in the platform.

The platforms are changing biologists' knowledge of the species they study.

"When I first started on whale shark research, everyone thought the Indian ocean was the big spot for that," Holmberg said. "As we built these online platforms, we could identify the movement of individuals...We now see that the Gulf of Mexico as one of the biggest hotspots for studying whale shark behavior."

In many cases, WildMe is a researcher's first experience with cloud computing and storage and analysis for their data, so the goal is to make the system easy to use for people whose primary job is not technology.

Holmberg said that the data processing needs to be fast so that biologists can react to population changes with better conservation policy and strategies.

"Maybe that means to put up a fence, or take down a fence, or allow fishing, or ban fishing, depending on how variables impact population numbers," he said. "The faster we can estimate population numbers, the faster we can respond to changes and make sure our conservation strategies are iterating towards evermore successful solutions that help increase population numbers, especially for threatened and endangered animals."

What to Read Next:

From AI to Teamwork: 7 Key Skills for Data Scientists
Machine Learning Basics Everyone Should Know
How to Recruit AI Talent and Keep Them Happy
Becoming a Self-Taught Cybersecurity Pro

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

More Insights

How a Wildlife AI Platform Solved its Data Challenge

How a Wildlife AI Platform Solved its Data Challenge

Recommend

下一款车叫Model 2？马斯克在线否决原因很特斯拉

Open3D-ML快速教程【点云分析】

蒙特卡洛方法与定积分计算

蒲丰投针问题的推广

快手小店“又上新”：进口店、分期免息、极速回款

4 Skills Will Set Apart Tomorrow's Data Scientists - InformationWeek

联发科发布5G开放架构：厂商自由定制天玑1200，消费者更多选择

第三届中国 R 语言会议（北京会场）纪要

神策数据成林松：漏斗分析的价值思考和业务实践

第四届中国 R 语言会议（上海会场）纪要

About Joyk