2

Building a multi-lingual rhyming dictionary with Wiktionary and Algolia — part 2...

 4 weeks ago
source link: https://www.algolia.com/blog/engineering/building-a-multi-lingual-rhyming-dictionary-part-2-the-gui/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Building a multi-lingual rhyming dictionary with Wiktionary and Algolia — part 2, the GUI

Mar 29th 2024Engineering

Building a multi-lingual rhyming dictionary with Wiktionary and Algolia — part 2, the GUI

In the last article in this series, we built out a dataset of all the words in Portuguese by scraping Wiktionary for their definitions and pronunciations. Since pronunciation is listed on Wiktionary using a super regularized system called IPA (short for the International Phonetic Alphabet), we can directly compare the IPA representations of any two words, and without any more context, determine if they rhyme (and if the word stress comes on the same syllable, something important to people who would also be concerned about rhyme, like poets and songwriters). We also generalized this process so that with a settings tweak in our Algolia index, we could automate building the same index for any language on Wiktionary. If you haven’t seen that first article, it’s not completely necessary to understand what we’re doing here, but you might like to see how we accomplished all of that, so read it here. In this article, we’re going to add the frontend and the logic that actually lets us run the searches. Let’s get started!

It always helps me when designing some algorithm to first think of a high-level goal, and then break into smaller and smaller steps until my algorithm has essentially built itself. In this case, our high-level goal is to take some word that the user types in and search through the database for all words that rhyme with it. Your first thought here might be the same as mine, namely to just use a template Algolia app and let the searchBox InstantSearch component do its job. However, we’re not searching by matching words in our results here like how searches normally work. We’re searching by IPA, which brings some interesting and unique requirements. For example, the word “Algolia” would be represented in the American dialect as ælˈɡəʊ.li.ə. The . symbols represent syllable breaks, and the ˈ represents that the next syllable is stressed. All of the other symbols represent specific sounds that you can make with your mouth to form the word “Algolia” (in my particular accent, specifically). When you compare words in this representation (which describes pronunciation instead of spelling), you can compare them end to beginning, syllable by syllable, the determine rhyme matches. For example, somebody who just loves words and wordplay might say they have “logophilia” (pronunciation in IPA, loʊ.ɡoʊˈfɪ.li.ə). That word has the same ɡoʊ syllable as the word “Algolia” has, but it’s in a different place, so it doesn’t count — only the last two syllables match perfectly, meaning this is a two-syllable rhyme, whereas the country name Mongolia (pronunciation in IPA, mɑːŋˈɡoʊ.li.ə) has the final three consecutive syllables matching Algolia’s IPA perfectly, so it’s a three-syllable rhyme. Do you see where our out-of-the-box search system might struggle here? Requirements like this are uncommon, so they’re not built directly into Algolia’s search algorithms.

Luckily, we can fix that. Let’s make the executive decision now that we’ll only return up to three-syllable matches, or less if the user allows it. This means we can split up the IPA strings by their syllable breaks, toss out anything before the third-to-last (or “antepenultimate”) syllable, and save a record of which syllable was marked as stressed with the apostrophe symbol. Then, adding dictionary definitions and the whole IPA string for display later on, we end up with a record that looks like this:

{
	"word": "anagrama",
	"definitions": [
		"noun: anagrama (plural anagramas); anagram"
	],
	"pronunciation": "a.naˈɡɾɐ.ma",
	"ultimate_syllable": "ma",
	"penultimate_syllable": "ɡɾɐ",
	"antepenultimate_syllable": "na",
	"stress_from_end": 2
}

You might recognize this as the form in which we stored our dataset in the last article. This little venture into linguistic theory explains why we did that — now it’s much, much easier to break out big-picture goal of searching through the dataset for rhymes into several discrete tasks, now involving facets instead of a generic search:

  1. First, send some user-inputted word to Algolia to get back the object associated with it.
  2. Second, we allow the user to choose whether they want to see only full three-syllable rhyming matches, or if two- or one-syllable matches will do. We’ll also have them choose whether the meter (how far the stressed syllable is from the end of the word) matters to them.
  3. Next, we prepare up to four facets (for the ultimate_syllable, penultimate_syllable, antepenultimate_syllable, and stress_from_end keys) and apply only the ones the user selected in step four to our actual search results.
  4. Then we give the user the true searchBox and let them search through our dataset, but restricted by the filters we set up so that the only hits displayed to them will be rhymes of their original query.

That’s far simpler of a logical flow and a far better use of Algolia’s strengths.

Before we get started implementing these, I want to show you some of the boilerplate I’ll be using. This HTML includes the Algolia InstantSearch JS library, its CSS minimal stylings, and a link to my custom CSS and JS:

<!DOCTYPE html>
<html lang="en">
	<head>
		<meta charset="utf-8" />
		<meta
			name="viewport"
			content="width=device-width, initial-scale=1, shrink-to-fit=no"
		/>

		<link
			rel="stylesheet"
			href="<https://cdn.jsdelivr.net/npm/instantsearch.css@7/themes/algolia-min.css>"
		/>
		<link rel="stylesheet" href="./index.css" />

		<title>Portuguese Rhyming Dictionary</title>
	</head>

	<body>
		<main>
			<h1>Portuguese Rhyming Dictionary</h1>

			<!-- the fun stuff will happen here -->
		</main>

		<script
			src="<https://cdn.jsdelivr.net/npm/[email protected]/dist/algoliasearch-lite.umd.js>"
			integrity="sha256-dImjLPUsG/6p3+i7gVKBiDM8EemJAhQ0VvkRK2pVsQY="
			crossorigin="anonymous"
		></script>
		<script src="<https://cdn.jsdelivr.net/npm/instantsearch.js@4>"></script>
		<script src="./index.js"></script>
	</body>
</html>

It’s not terribly important for this article, but here is the custom CSS I wrote for this project if you’d like to come back to it as you see each component implemented.

/* boilerplate */

* {
	margin: 0;
	padding: 0;
	outline: 0;
	border: 0;
	box-sizing: border-box;
	font-family: sans-serif;
	text-align: center;
}

body {
	background: #eee;
	color: #3a4570;
}

input[type=text], select {
	padding: 0.3rem;
	border: 1px solid #c4c8d8;
    border-radius: 5px;
	color: #3a4570;
	font-size: 1rem;
	max-width: 100vw;
}

input[type=checkbox] {
	cursor: pointer;
}

/* layout */

main {
	width: 100%;
	padding: 5vmin;
	display: flex;
	flex-direction: column;
	min-height: 100vh;
	justify-content: space-evenly;
	align-items: center;
}

main > * {
	margin: 2vmin 5vmin;
}

/* specific elements */

#rhyme-search-row {
	display: flex;
	align-items: center;
}

#rhyme-search-row > span {
	padding: 1vh;
}

#filters-row {
	display: flex;
	align-items: center;
	justify-content: space-around;
	width: 100%;
}

#filters-row > * {
	margin: 1vh;
}

@media (max-width: 800px) {
	#filters-row {
		flex-direction: column;
	}
}

#rhymes-with {
	flex: 1;
}

#definitions:empty {
	margin: 0;
}

#definitions > li {
	text-align: left;
	font-size: 1.5vh;
}

#searchbox {
	margin-top: 10px;
}

#hits.hidden {
	display: none;
}

.ais-Hits-item {
	border: 0;
	box-shadow: none;
	padding: 0;
	margin: 0.5vh 2vw;
	display: inline-block;
	width: auto;
	font-size: 2vh;
}

.hit {
	cursor: pointer;
}

.result-definitions-container {
	opacity: 0;
	pointer-events: none;
	z-index: 2;
	position: absolute;
	top: 0;
	left: 0;
	width: 100vw;
	height: 100vh;
	transition: opacity 0.5s;
	display: flex;
	justify-content: center;
	align-items: center;
	background: rgba(0, 0, 0, 0.7);
	cursor: pointer;
}

.hit.activated > .result-definitions-container {
	opacity: 1;
	pointer-events: all;
	transition: all 0.8s;
}

.result-definitions {
	background: #eee;
	border-radius: 5px;
	padding: 5vmin;
	border: 1px solid #c4c8d8;
	cursor: text;
	max-width: 80%;
	max-height: 80%;
}

And in my index.js JavaScript file, we set up a few things just to get InstantSearch up and running and our workflow mapped out:

let filters = {
	ultimate_syllable: "",
	penultimate_syllable: "",
	antepenultimate_syllable: "",
	stress_from_end: ""
};

let currentlyAppliedFilters = {};

const getSearchWordData = async e => {};

const widgets = [];

const onload = () => {
	window.searchClient = algoliasearch('3HQRD5FDNO', 'e1ceb9c8c9e6e287d14eb1ba8bc433d1');
	window.search = instantsearch({
		indexName: 'rhyme-dictionary',
		searchClient
	});
	search.addWidgets(widgets);
	search.start();
};

window.addEventListener("load", onload);

A few miscellaneous notes:

  • This boilerplate fires up InstantSearch when the page finishes loading.
  • Feel free to use my application ID and public key if you’d like to mess around with this yourself!
  • Note that I made searchClient and search global variables. That’ll come in handy later when other functions have to run searches of their own.

Let’s get started with the logic!

Getting the initial word

To allow the user to type in the word for which they want to find rhymes, let’s add an input in the <main> element. We’ll also add the definitions of the input word all in the same function in a second.

<div id="rhyme-search-row">
	<span>What rhymes with</span>
	<input id="rhymes-with" type="text"/>
	<span>?</span>
</div>
<ul id="definitions"></ul>

Then, in the outermost level of the JavaScript file, let’s hook up the getWordSearchData function to be run anytime that input is touched by the user.

document.getElementById("rhymes-with").addEventListener("input", getSearchWordData);

Now, we can start putting some search logic in the getWordSearchData function. This function isn’t going to affect the main results we’re displaying on the page (since we’re just trying to find the details of a single result to set the filters for the real search later), so we’re going to do it outside of InstantSearch. Here’s a commented version of the code to help you follow what its doing:

const getSearchWordData = async e => {
	// Step 1: get a reference to the index that contains our word data.
	const index = searchClient.initIndex('rhyme-dictionary');

	// Step 2: search whatever is in the input box through our index and pull the first hit out into a variable called `result`.
	const result = (await index.search(e.target.value.toLowerCase())).hits[0];

	// Step 3: get the element where we'll put the definitions of the inputted word
	const ul = document.getElementById("definitions");

	// Step 4: if we got a result from the search at all, and that first result matches exactly the word the user typed in, then they typed in a legitimate query, so...
	if (!!result && result.word == e.target.value.toLowerCase()) {
		// list out all of its definitions...
		ul.innerHTML = "<li>" + result.definitions.join("</li><li>") + "</li>";

		// fill the `filters` object with the applicable facet data...
		[
			"ultimate_syllable",
			"penultimate_syllable",
			"antepenultimate_syllable",
			"stress_from_end"
		].forEach(key => {
			filters[key] = result[key].toString();
		});

		// refresh InstantSearch...
		search.refresh();

		// and make sure the results of the new search aren't hidden
		document.getElementById("hits").classList.remove("hidden");
	} else {
		// The user didn't enter a legitimate query, so just empty the definitions unordered list to avoid confusion.
		ul.innerHTML = "";
		document.getElementById("hits").classList.add("hidden");
	}
};

Now our app can take in user input, get data about the inputted word from Algolia, and decide what filters should be applied to our eventual search query so that its results all rhyme with the original query.

Setting the filters

Now that the filters object is filled with all of the information needed about the currently-inputted word, we need to figure out how many of those filters should actually be applied to the search. If the user is fine with one-syllable rhyme matches, then we only need to apply the ultimate_syllable filter. If the user wants two- or three-syllable matches, we’ll need to apply the penultimate_syllable and antepenultimate_syllable filters too. Lastly, if the user wants the stress the fall on the same relative syllable in the result words as it does in the input word, we’ll need to apply the stress_from_end filter as well. Let’s let the user make these choices using a <select> and an <input type=checkbox> in our <main> element:

<div id="filters-row">
	<select id="syllable-match-count">
		<option value="ultimate_syllable">One syllable matches will do</option>
		<option value="penultimate_syllable">Always match at least the final two syllables</option>
		<option value="antepenultimate_syllable">Always match at least the final three syllables</option>
	</select>

	<label>
		<span>Match stress?</span>
		<input id="stress-checkbox" type="checkbox" />
	</label>
</div>

Then, we need to set up our facets. Here’s the thing though — in InstantSearch, typically these are handled by one of quite a few different types of refinement widgets, which all have slightly different abilities. To stay in line with this spirit of componentization, let’s just hijack the menu refinement component to handle this logic generically without actually displaying the typical menu choice GUI that would otherwise look like this:

refinement-component.jpg

To do that, we need to create a function that “renders” our custom menu component. Of course, since we don’t want this component to actually render anything, this component will only add the filters that it’s told to and refresh the search results. Let’s call that function renderMenu, and then add four instances of the new custom menu component to our widgets array (which is currently empty):

const widgets = [
	instantsearch.connectors.connectMenu(renderMenu)({
		attribute: "ultimate_syllable",
		on: () => true, // we always want to match at least the final syllable
		eventListenerTargetQuerySelector: "#syllable-match-count",
		eventListenerType: "change"
	}),
	instantsearch.connectors.connectMenu(renderMenu)({
		attribute: "penultimate_syllable",
		on: () => document.getElementById("syllable-match-count").value != "ultimate_syllable",
		eventListenerTargetQuerySelector: "#syllable-match-count",
		eventListenerType: "change"
	}),
	instantsearch.connectors.connectMenu(renderMenu)({
		attribute: "antepenultimate_syllable",
		on: () => document.getElementById("syllable-match-count").value == "antepenultimate_syllable",
		eventListenerTargetQuerySelector: "#syllable-match-count",
		eventListenerType: "change"
	}),
	instantsearch.connectors.connectMenu(renderMenu)({
		attribute: "stress_from_end",
		on: () => !!document.getElementById("stress-checkbox").checked,
		eventListenerTargetQuerySelector: "#stress-checkbox"
	})
];

Each of these custom components is given a few input settings:

  • attribute: This is the attribute that we’re refining with this custom menu component. We’ll add one menu for each of the four attributes.
  • on: This is a function that returns a boolean representing whether that filter should be active at the time of execution. This is where our logic for our filter activation lives — the ultimate_syllable filter is always on, the penultimate_syllable filter is on only if the <select> we made earlier isn’t allowing matches of only the final syllable, the antepenultimate_syllable filter is only on if the <select> is definitely set to the most stringent filter level, and stress_from_end is on if the checkbox is selected.
  • eventListenerTargetQuerySelector: We’ll need to refresh the search when the user either changes the <select> or checks the checkbox, and we’ll do that with an event. This is a CSS query that’ll tell us which elements to set the event on.
  • eventListenerType: This is the type of event listener we’ll put on those elements. It’ll default to click, so the stress_from_end checkbox can just use the default, but the <select> will need to trigger the event on change.

Those objects get fed into our renderMenu function as the widgetParams property of its first parameter. Here’s how we might define that rendering function to consume this data:

const renderMenu = (renderOptions, isFirstRender) => {
	// Step 1: break down our inputs
	const {
		items, // this is an array of filter options, with the activated ones at the top
		refine, // this is a function that, when called, toggles refinement on whatever filter option you pass it
		widgetParams // this is the object we passed into this in the widgets array
	} = renderOptions;

	// Step 2: if we're rendering this component for the first time and we have a query selector telling us where to stick our refresh event, stick the refresh event there
	if (
		isFirstRender
		&& !!widgetParams.eventListenerTargetQuerySelector
	) {
		document.querySelector(
			widgetParams.eventListenerTargetQuerySelector
		).addEventListener(
			widgetParams.eventListenerType ?? "click",
			e => {
				// as an exception, we won't refresh the search if we've put this event on a select and the user made that select not affect this particular filter. another instance of this component for that other filter will take care of that.
				if (
					widgetParams.eventListenerType == "change"
					&& e.target.value != widgetParams.attribute
				) return;
				search.refresh();
			}
		)
	}

	// Step 3: if the user just just switched this filter off, then delete it from our record of currently applied filters and toggle the refinement off. if the filter was just switched on, and it is definitely switch-on-able, then add it to that record and toggle the refinement on
	if (
		!widgetParams.on()
		&& !!currentlyAppliedFilters[widgetParams.attribute]
	) {
		console.log(`Removing refinement on ${widgetParams.attribute}`);
		delete currentlyAppliedFilters[widgetParams.attribute];
		refine(filters[widgetParams.attribute]); // toggles refinement off here
	} else if (
		widgetParams.on()
		&& currentlyAppliedFilters[widgetParams.attribute] != filters[widgetParams.attribute]
		&& !!filters[widgetParams.attribute]
		&& items.filter(x => x.value == filters[widgetParams.attribute]).every(x => !x.isRefined)
	) {
		console.log(`Refining ${widgetParams.attribute} to ${filters[widgetParams.attribute]}`);
		refine(filters[widgetParams.attribute]); // toggles refinement on here
		currentlyAppliedFilters[widgetParams.attribute] = filters[widgetParams.attribute];
	}
};

Those if conditions look a little verbose, but they’re just handling a few edge cases I ran across in testing. The comments explain that block a lot better than reading the code itself does.

Displaying the results (with the real searchbox)

The last step here is just to add our actual, non-filter-based searchbox, a place to put our results, and some pagination logic. Those three things are wrapped up in three neat Algolia InstantSearch components, which we can add with some HTML…

<div>
	<h2>Search among results</h2>
	<div id="searchbox"></div>
</div>

<div id="hits" class="hidden"></div>
<div id="pagination"></div>

… and a few more entries in our widgets array:

const hitClicked = word => {
	document.getElementById(`word-${word}`).classList.toggle("activated");
};

const widgets = [

	...

	instantsearch.widgets.searchBox({
		container: '#searchbox'
	}),
	instantsearch.widgets.hits({
		container: '#hits',
		templates: {
			item: hit => `
				<div class="hit" onclick="hitClicked('${hit.word}')" id="word-${hit.word}">
					<span class="result-word">${hit.word}</span>
					<div class="result-definitions-container")">
						<div class="result-definitions" onclick="event.stopPropagation()">
							<h3>${hit.word} (pronounced ${hit.pronunciation})</h3>
							<ul>${
								hit
									.definitions
									.map(definition => `<li>${definition}</li>`)
									.join("")
							}</ul>
						</div>
					</div>
				</div>
			`,
		}
	}),
	instantsearch.widgets.pagination({
		container: '#pagination'
	})
];

This includes the neat bit of functionality of making each result expandable. Clicking on it brings up a box with the result word’s definitions and pronunciation (remember, our intended userbase here is poets and songwriters). It’ll look something like this:

rhymes-with.jpg

And that’s it! Let us know on Discord at Algolia if you get to improve this project further, or extend it to other languages!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK