4

refinery - Treating training data as if it was source code | Product Hunt

 1 year ago
source link: https://www.producthunt.com/posts/refinery-2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Ranked #4 for today

refinery

Treating training data as if it was source code

refinery is the data-centric sibling of your favorite programming IDE. It provides an easy-to-use interface for weak supervision and data management, neural search, and monitoring to ensure that the quality of your training data is as good as possible.
Clever Ads for Slack
Ad
Connect Slack with Google, Microsoft, Facebook & Twitter Ads

Hey ProductHunt community,

I'm Johannes, a data engineer + co-maker of refinery. We've built refinery with the belief that working on training data should, at some point, feel like you're programming.

Why? Because we believe that otherwise, engineers and scientists with great ideas (and businesses with core processes, too) are limited in what they can build. When we think of natural language interfaces of the future, we're sure they simply can't be built with today's tools. And this is what we aim to change.

What does that mean? - We believe that developers should be able to debug and document training data - Building training data must be easy and quick, such that you can build prototypes with ease - On the other hand, if you see that your use case is working, building training data is no one-time job, so you should be able to improve the data in a structured manner

We believe in open-source and communities, so we've published the source code on GitHub, and we have a community on Discord. Also, we have an online playground. Check it out :)

We're getting there step-by-step, with the goal to, at some point, not only treat training data as if it was source code but to essentially make complex NLP problems easier.

Community, we're so excited to share this with you today. Feel free to leave a comment below or on GitHub, join our Discord, or reach out via Twitter. Cheers! 🙏🏻

Hi, my name is Anton and I have been with Kern since the company's founding.

In my bachelor's, I studied economics and transitioned to business for my master's. During my time at Kern I have worked on a wide variety of topics regarding the sales and marketing side of things. Over the past few months, my primary role was community manager, which entails interacting with our community on discord or in our comment sections, as well as organizing the newsletter.

I am going to leave the company soon, but it was a great tourney up to this point!

Hello everyone 👋

I'm Jens, the CTO of kern.ai.

Johannes and I met about two jobs back in a totally different environment - SAP Data Migration. Fun times indeed, however, as life goes we went our different paths after a few years of pushing data from left to right. He to study and start his first company - me working as a lecturer and creating one or two games you might find online.

Back then we wouldn't or better couldn't even imagine where we are now.

So enough about the foreplay, let's talk about the product 🙂

With the app, you have many possible applications. One option: you can optimize your AI label workflow. Let me give you an example:

I always wanted to have a personalized newsfeed of different sources matched to my specific taste. But who has the time to scroll through 1.000s of articles every day? So let's slap some AI on that problem, right?

Now some of you might know: AI works best with a lot of training data. But again... who has the time to label a bunch of old articles?

I didn't so no dice I guess. Enter refinery. Not only did it help me to get a better overview of my data points (e.g. by using embedding-based similarity) but it also helped me to extrapolate the given information through a combination of heuristics, active learning, and weak supervision. I know, I know a lot of technical terms to throw around but it's an application for you, the data scientist. To keep it simple: Instead of manually labeling 2.500 articles, I scraped from different websites I achieved good results after the first 50 manual labels or so. Even better after some data exploration but that's going a bit too far for now.

If you have further questions please don't hesitate to reach out 😀

So please let us know what your favorite features are (or what you'd love to see added) in the comments 👇

Bonus points for the first to find the easter egg I've hidden in the application. Without spoiling too much in our team the current high score on medium difficulty is 81 😉

Hello everyone 😊 I am Felix, developer at kern.ai.

Since day one as dev, I love working with open source. I want to build applications the way I want, without being forced to follow a prescribed path. OS enables me to do this and my personal goal with refinery is to allow data scientists to do so as well.

Customisability is a key feature of refinery. For that, we introduced IDE-based interfaces into the application. You can write labeling functions and active learning algorithms the way you want. But if you prefer to follow a template, refinery provides you with it too. The best is that we want to extend this approach to further features of the application, next up is embedding creation.

I am very curious about your thoughts on refinery 😄


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK