5

Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural S...

 3 years ago
source link: https://hackernoon.com/jina-a-deep-learning-powered-search-framework-can-help-you-build-your-neural-search-zx7635gs
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Search

May 2nd 2021 new story
6
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png

@alexcgAlex C-G

Developer Relations Lead at Jina AI. Maker of animatronic butterflies and AIs that write bad Star Trek

Do you ever think, “Darn this stupid cloud. Why can’t there be an easier way to build a neural search on it?”

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Well, if you have, this article is for you. I’m going to walk through how to use Jina's new Streamlit component to search text or images to build a neural search front end. Want to jump right in? Check out our text search app or image search app, and here's the component's repo.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Why use Jina to build a neural search?

Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud. Essentially, it lets you build a search engine for any kind of data with any kind of data.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

So you could build your own text-to-text search engine ala Google, a text-to-image search engine ala Google Images, a video-to-video search engine, and so on. Companies like Facebook, Google, and Spotify build these searches powered by state-of-the-art AI-powered models like FAISS, DistilBERT, and Annoy.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Why use Streamlit with Jina?

I was a big fan of Streamlit before I even joined Jina. I used it on a project to create terrible Star Trek scripts that later turned into a front-end for text generation with Transformers. So I'm over the moon to be using this cool framework to build something for our users.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Building a Streamlit component helps the data scientists, machine learning enthusiasts, and all the other developers in the Streamlit community build cool stuff powered by neural search. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Out of the box, the streamlit-jina component has text-to-text and image-to-image search, but Jina offers a rich search experience for any kind of data with any kind of data so there's plenty more to add to the component!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

How does it work?

Every Jina project includes two Flows:

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Indexing: for breaking down and extracting rich meaning from your dataset using neural network models

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Querying: for taking a user input and finding matching results

0 reactions
heart.png
light.png
money.png
thumbs-down.png
  1. Our Streamlit component is a front end for end-users, so it doesn't worry about the indexing part.
  2. Admin spins up a Jina Docker image:
    docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1
  3. User enters a query into the Streamlit component (currently either a text input or an image upload) and hits 'search'
  4. The input query is wrapped in JSON and sent to Jina's query API
  5. The query Flow does its thing and returns results in JSON format (along with lots of metadata)
  6. The component parses out the useful information (e.g. text or image matches) and displays them to the user

Example code

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Let's look at our text search example since it's easier to see what's going on there:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "http://0.0.0.0:45678/api/search"

st.title("Jina Text Search")
st.markdown("You can run our [Wikipedia search example](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences) to test out this search")

jina.text_search(endpoint=endpoint)

As you can see, the above code:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
  • Imports streamlit and streamlit_jina
  • Sets the REST endpoint for the search
  • Sets the page titleDisplays some explanatory text
  • Displays the Jina text search widget with endpoint defined

For the Jina Streamlit widgets, you can also pass in other parameters to define the number of results you want back or if you want to hide certain widgets.

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Behind the scenes

The source code for our module is just one file,

__init__.py
. Let's just look at the high-level functionality for our text search example for now:
0 reactions
heart.png
light.png
money.png
thumbs-down.png

Set configuration variables

0 reactions
heart.png
light.png
money.png
thumbs-down.png
headers = {
    "Content-Type": "application/json",
}

# Set default endpoint in case user doesn't specify and endpoint
DEFAULT_ENDPOINT = "http://0.0.0.0:45678/api/search"

Render component

0 reactions
heart.png
light.png
money.png
thumbs-down.png
class jina:
    def text_search(endpoint=DEFAULT_ENDPOINT, top_k=10, hidden=[]):
        container = st.beta_container()
        with container:
            if "endpoint" not in hidden:
                endpoint = st.text_input("Endpoint", endpoint)

            query = st.text_input("Enter query")

            if "top_k" not in hidden:
                top_k = st.slider("Results", 1, top_k, int(top_k / 2))

            button = st.button("Search")

            if button:
                matches = text.process.json(query, top_k, endpoint)
                st.write(matches)

        return container

In short, the

jina.text_search()
method:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
  • Creates a Streamlit container to hold everything, with sane defaults if not specified
  • If widgets aren't set to hidden, present them to user
  • [User types query]
  • [User clicks button]
  • Sends query to Jina API and returns results
  • Displays results in the component

Our method's parameters are:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
jina.text_search()
calls upon several other methods, all of which can find in
__init__.py
. For image search there are some additional ones:
0 reactions
heart.png
light.png
money.png
thumbs-down.png
  • image.encode.img_base64()
    encodes a query image to base64 and wraps it in JSON before passing to Jina API
  • Jina's API returns matches in base64 format. The
    image.render.html()
    method wraps these in
    <IMG>
    tags so they'll display nicely

Use it in your project

In your terminal:

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Create a new folder with a virtual environment and activate it. This will prevent conflicts between your system libraries and your individual project libraries:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
mkdir my_project
virtualenv env
source env/bin/activate

Install the Streamlit and Streamlit-Jina packages:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
pip install streamlit streamlit-jina

Index your data in Jina and start a query Flow. Alternatively, use a pre-indexed Docker image:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1

Create your app.py:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "http://0.0.0.0:45678/api/search" # This is Jina's default endpoint. If your Flow uses something different, switch it out

st.title("Jina Text Search")

jina.text_search(endpoint=endpoint)

Run Streamlit:

0 reactions
heart.png
light.png
money.png
thumbs-down.png
streamlit run app.py

And there you have it – your very own text search!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

For image search, simply swap out the text code above for our image example code and run a Jina image (like our Pokemon example.)

0 reactions
heart.png
light.png
money.png
thumbs-down.png

What to do next

Thanks for reading the article and looking forward to hearing what you think about the component! If you want to learn more about Jina and Streamlit here are some helpful resources:

0 reactions
heart.png
light.png
money.png
thumbs-down.png

A big thank you!

Major thanks to Randy Zwitch, TC Ricks and Amanda Kelly for their help getting our component live. And thanks to all my colleagues at Jina for building the backend that makes this happen!

0 reactions
heart.png
light.png
money.png
thumbs-down.png

Previously published at https://blog.streamlit.io/streamlit-jina-neural-search/

0 reactions
heart.png
light.png
money.png
thumbs-down.png
6
heart.pngheart.pngheart.pngheart.png
light.pnglight.pnglight.pnglight.png
boat.pngboat.pngboat.pngboat.png
money.pngmoney.pngmoney.pngmoney.png
by Alex C-G @alexcg. Developer Relations Lead at Jina AI. Maker of animatronic butterflies and AIs that write bad Star TrekClone the Jina repo
Join Hacker Noon

Create your free account to unlock your custom reading experience.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK