2

12 ways to improve your search index

 1 year ago
source link: https://www.algolia.com/blog/engineering/12-ways-to-improve-your-search-index/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

12 ways to improve your search index

Nov 11th 2022 engineering

12 ways to improve your search index

Search indexing is often the first topic we discuss with customers when starting a new business engagement. Whether it’s a large enterprise-scale site or a small ecommerce store, the first step to adding search to a website is indexing your content through a website crawler or API. Your site’s architecture, schema, and content can all affect indexing.

In this article, we’ll cover a lot of the topics we discuss with our customers and share some actionable tips for improving a search index.

Note: this is not an article about SEO. While on-site search optimization and SEO are related — the work you do to optimize your search index for on-site search also helps with Google search or Bing visibility — they address different needs. SEO is geared towards internet visibility, whereas on-site search addresses user experience. However, the XML sitemaps, internal links, meta tags, etc., you create for one will help the other!

What is a search index?

A search index helps users quickly find information on a website. It is designed to map search queries to web pages, documents, or other site content. It’s analogous to an index in a book. It allows the user to quickly find useful information using keywords, but has many technological advantages over the ones in books such as helping visitors find what they want faster. Search indexes can be created either through web crawlers or via API access, but both have their benefits for different situations.

What is full-text search?

Full-text search entails indexing each word on your site in order to make search engine navigation through many records easy. Traditionally, full-text search engines used an “inverted index” — essentially, a map of all the keywords in your document and the locations of those keywords.

full text search

In the example above, the keywords “portable” and “sound” aren’t in the index, but an AI-powered search engine understands context to deliver great results.

AI-powered search engines can now go beyond keywords to understand context to provide richer results. Take the query “portable sound” as an example. If a keyword based search engine has the terms “portable” and “speaker” in the index, the results page may include the correct item. With machine learning, you can get good results even if the keywords aren’t on the site by detecting context and similarities between words. A machine can learn, for example, that the word “portable” is similar to “handheld”, “mobile”, and “phone”, all of which are near in meaning, but not necessarily synonymous.

Search crawlers and APIs

There are two primary ways to build a search engine index — a search crawler or directly pulling data from a database via APIs. Each of these has benefits for different situations.

For example, for most static websites, a crawler is fine. It’s fast and comprehensive. API-driven indexing is ideal for sites with dynamic or constantly changing data. APIs have their own set of advantages such as the ability to quickly add new data sources.

What is fast indexing?

When you add new content or change existing content, you want results to be searchable in real time. Fast indexing is a must have for retailers and brands selling new products or launching campaigns. On occasion, when our customers have problems with fast indexing, it’s typically due to an issue such as:

  • Content isn’t getting indexed fast enough due to complex architecture of an API issue
  • Content is in the index, but not getting displayed in results
  • PDF and DOC files fail to index

Most problems can be resolved relatively quickly. The first thing to do is check how the crawler views your website documents, of if your data pipeline is blocking. Using a sitemap.xml file to assist the crawler is always a good practice and can help with getting your content indexed quickly. If you’re indexing your site via API, it’s likely that there is an integration issue that needs to be resolved. 

To help with all this, and to simplify the indexing process, we offer API clients in many programming languages, dashboards that hlpe you visualize the index and crawling processes, and an CLI tools for interacting with the API in a variety of convenient ways.

12 ways to optimize and enrich your search index

Whether you are using a search crawler or connecting your site via API, there are many ways to configure and improve a search index. The real-world suggestions below come directly from the conversations we often have with customers who are building their index via crawler or API. Some of these methods are more appropriate for crawler-based index, others are relevant to API-indexing, and a few are relevant for both.

Here are 12 ways you can optimize your search index:

1. Open Graph metadata

Facebook released their Open Graph protocol in 2010 and since then it has become widely used by search engines. Search results often include an image preview, and most often this is powered by Open Graph.

By adding open graph tags to your content you can improve a search index with information such as:

  • Title with type of content
  • Image and URL
  • Add additional open graph data

There are heaps of other data you can use with Open Graph to enrich a search index besides just title, description, and images, but many people don’t know or use them all. For more information, visit https://ogp.me/

2. Schema.org formats

Open Graph is just one of several open protocols for enriching web and search engine indexing data. There are different kinds of schemas you can mark up your page content with. For example, if you’re a recipe site, you will have different standards for how you mark up content than, say, an event website.

Schema.org publishes and maintains different schema vocabulary for different kinds of sites. For example, for events, such as a concert, lecture, or festival, ticketing information may be added via the markup in HTML (or JSON-LD) format like <a class=”localLink” href=”/offers”>offers</a> property. Repeated events may be structured as separate Event objects.

3. Article publish and modified times

The article publish and article modified dates/times are super important for being able to sort content by recency. The time stamps are supported both within open graph or schema.org formats.

article:published_time – datetime – When the article was first published.
article:modified_time – datetime – When the article was last changed.

4. Identify header and footer content

Miscellaneous content such as your nav, footer, and anything not specific to the page, should be within the header and footer tag so search engines know to ignore it. By marking up the header and footer content, you give the search engine a better chance of understanding what the page is about so it can be indexed properly — in this case, navigational data vs body data.

5. Augmenting your search index

Search indexes can be enriched with data in a variety of ways such as:

  • Adding color metadata via the Google Vision API
  • Using third-party data such as product ratings
  • Extract incoming data to be used for creating filters and facets

As new information is added to the index, data may be enhanced. This data is utilized by search engines in order to provide better results and make it simpler for consumers to locate what they are looking for faster. Ecommerce sites frequently update their items on a regular basis, and the enriched data can be incorporated during updates.

6. Business performance data

Your index is more than your content. Off-site data, such as product ratings, margins, inventory levels, etc., can be very useful for a search index to assist with result ranking. There may be many products which are relevant to a customer searching your site, but your business data can be used to enhance results to ensure the best ones are pushed to the top. We offer custom ranking and boosting that help customers build conversion flywheels using this kind of business data.

7. Merchandising and Campaign data

Many retailers run quarterly, seasonal, or holiday sales. By adding merchandising and campaign data to your site index, you can adjust results to display sale items.

You could add a specific sales field or use a discount field to calculate when there is a sale. In the latter case, the search engine will know that your display price is lower than your regular price, which can be helpful for sorting on discounted items to help visitors find the best savings. You can also then use an algorithm (via our ranking formula) to give different items a boost based on their sales status or other properties.

merchandising

The search index should include fields and data that can be used for building filters and facets

8. Filters

Search filters and facets can be built using your search index. We can infer and create filters automatically (with Query Categorization, for instance), but you can also design custom filters when needed. Determining the best filters to offer comes down to understanding your customers and how they want to slice and dice your products. Check out our guide on filters and facets for more.

9. Content type

There are different meta tags available to help a search index understand content by type. Is the content going to take visitors to a video, a document, page, or something else? Use HTML or JSON-LD tags to identify your content as a video, audio, abstract, etc., to help your search index sort or filter content by type.

10. Personalization

Customers expect and want search results to be personalized. If you offer free shipping for members, that information should be in the data. If there’s a discount by location, then you’ll want to have geo data in your records, too. By connecting your search index to this data, you can easily personalize search results

11. Integration with other third-party systems

Big businesses often have complex infrastructure with data coming from various systems. Need to integrate data with your supply chain management or PIM? You’ll want your search solution to support an API to enable instant indexing of data between systems.

12. Review your analytics and search metrics

Site owners should plan to spend some amount of time reviewing their analytics and search metrics to identify the keywords customers are querying. Understanding how customers search can help identify opportunities to enrich the index, add or adjust filters, and improve search engine results.

It’s all about indexing great content

Building a rich search index can greatly improve search performance and customer satisfaction. By understanding the different types of data that can be included, site owners can make sure they are providing the best possible search experience for their customers.

To learn more about how to set up your search index or take advantage of our personalization and custom ranking features, contact us today! We offer a free trial and demo so you can explore all that our solution has to offer.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK