

Vespa.ai and the CORD-19 public API
source link: https://mc.ai/vespa-ai-and-the-cord-19-public-api/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Vespa.ai and the CORD-19 public API
A taste of what you can do with Vespa
The Vespa team has been working non-stop to put together the cord19.vespa.ai search app based on the COVID-19 Open Research Dataset (CORD-19) released by the Allen Institute for AI . Both the frontend and the backend are 100% open-sourced. The backend is based on vespa.ai , a powerful and open-sourced computation engine. Since everything is open-sourced, you can contribute to the project in multiple ways.
As a user, you can either search for articles by using the frontend or perform advanced search by using the public search API . As a developer, you can contribute by improving the existing application through pull requests to the backend and frontend or you can fork and create your own application, either locally or through Vespa Cloud , to experiment with different ways to match and rank the CORD-19 articles . My goal here with this piece is to give you an overview of what can be accomplished with Vespa by using the cord19 search app public API. This only scratches the surface but I hope it can help direct you to the right places to learn more about what is possible.
Simple query language
The cord19.vespa.ai query interface supports the Vespa simple query language that allow you to quickly perform simple queries. Examples:
Additional resources:
Vespa Search API
In addition to the simple query language, Vespa has also a more powerful search API that gives full control in terms of search experience through the Vespa query language called YQL. We can then send a wide range of queries by sending a POST request to the search end-point of cord19.vespa.ai . Following are python code illustrating the API:
import requests # Install via 'pip install requests'endpoint = 'https://api.cord19.vespa.ai/search/'
response = requests.post(endpoint, json=body)
Search by query terms
Let’s break down one example to give you a hint of what is possible to do with Vespa search API:
body = {
'yql': 'select title, abstract from sources * where userQuery() and has_full_text=true and timestamp > 1577836800;',
'hits': 5,
'query': 'coronavirus temperature sensitivity',
'type': 'any',
'ranking': 'bm25'
}
The match phase: The body parameter above will select the title and the abstract fields for all articles that match any ( 'type': 'any'
) of the 'query'
terms and that has full text available ( has_full_text=true
) and timestamp greater than 1577836800.
The ranking phase: After matching the articles by the criteria described above, Vespa will rank them according to their BM25 scores
( 'ranking': 'bm25'
) and return the top 5 articles ( 'hits': 5
) according to this rank criteria.
The example above gives only a taste of what is possible with the search API. We can tailor both the match phase and ranking phase to our needs. For example, we can use more complex match operators such as the Vespa weakAND, we can restrict the search to look for match only in the abstract by adding 'default-index': 'abstract'
in the body
above. We can experiment with different ranking function at query time by changing the 'ranking'
parameter to one of the rank-profiles
available in the search definition file
.
Additional resources:
- The Vespa text search tutorial show how to create a text search app on a step-by-step basis. Part 1 shows how to create a basic app from scratch. Part 2 shows how to collect training data from Vespa and improve the application with ML models. Part 3 shows how to get started with semantic search by using pre-trained sentence embeddings.
- More YQL examples specific to the cord19 app can be found in cord19 API doc .
Search by semantic relevance
In addition to searching by query terms, Vespa supports semantic search.
body = {
'yql': 'select * from sources * where ([{"targetNumHits":100}]nearestNeighbor(title_embedding, vector));',
'hits': 5,
'ranking.features.query(vector)': embedding.tolist(),
'ranking.profile': 'semantic-search-title',
}
The match phase: In the query above we match at least 100 articles ( [{"targetNumHits":100}]
) which have the smallest (euclidean) distance between the title_embedding
and the query embedding vector
by using the nearestNeighbor operator
.
The ranking phase: After matching we can rank the documents in a variety of ways. In this case we use a specific rank-profile named 'semantic-search-title'
that was pre-defined to order the matched articles the distance between title and query embeddings.
The title embeddings have been created while feeding the documents to Vespa while the query embedding is created at query time and sent to Vespa by the ranking.features.query(vector)
parameter. This Kaggle notebook
illustrate how to perform semantic search in the cord19 app by using the SCIBERT-NLI model
.
Additional resources:
- Part 3 of the text search tutorial shows how to get started with semantic search by using pre-trained sentence embeddings.
- Go to the Ranking page to know more about ranking in general and how to deploy ML models in Vespa (including TensorFlow, XGBoost, etc).
Recommend
-
306
The open big data serving engine - Store, search, organize and make machine-learned inferences over big data at serving time. This is the primary repository for Vespa where all development is happening. New production releases from this rep...
-
146
By Jon Bratseth, Distinguished Architect, VespaEver since we open sourced Hadoop in 2006, Yahoo – and now, Oath – has been committed to opening up its big data infrastructure to the la...
-
145
雅虎开源其搜索引擎 Vespa
-
146
雅虎或Oath宣布开源其大数据处理和服务引擎Vespa,源代码托管在GitHub上。雅虎/Oath的杰出架构师JonBratseth在新闻稿中指出,Vespa被用于Yahoo.com、YahooNews、YahooSports、YahooFinance、YahooGemini、Flickr等众多产品,每天处理和服务数十亿次的文档访问请求...
-
6
This customer needed a much longer extension cord Working in tech support introduces you to a whole bunch of problems. This is what you expect. After all, when things are going properly, you're not going to hear about it. It's...
-
13
Cut the cord and go do interesting things It occurs to me that bootstrapping is particularly difficult if you're in a bad situation. A bad situation might be one where you feel forbidden to work on any other project on your own t...
-
4
-
6
Sennheiser IE 600 Are $700 Earbuds That Keep The Cord
-
5
POSTED ON MARCH 15, 2022 TO DevInfra, ML Applications VESPA: Static profiling for...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK