Improving Search on TheChocolateLife

Improving Search on TheChocolateLife

We are in the process of configuring a new search engine for TheChocolateLife that will greatly improve search capabilities.

While I really like the new publishing platform for TheChocolateLife, there is no native search capability. Search (like comments) is supported by programs (many, but not all open-source) created and maintained by third-party developers.

For technical reasons having to do with browser memory limitations, the current search engine only indexes the titles of posts. If you’re looking for something and the word or phrase is not in the post’s title it is as if the post does not exist.

The Single Most Important Improvement

When the new search engine (Meili Search) is completely configured and running (it has been installed on the server), every word in every post – including archived comments but, sadly not new comments in Cove because those are in a separate database – will be indexed and searchable.

Other Improvements in the Works

  1. I am  hoping to deliver a real-time interface that updates results as you type. This should speed up searches tremendously.
  2. Spelling mistakes will be, to some extent, handled automatically.meil
  3. Organizing results by Tags is a high priority for me. This is handy when, for example, looking for a word like “melanger” or “melangeur”. It might appear in posts in Classifieds, Originals, AskTCL, or in the Archive. Knowing where the result is found will make it easy to figure out what you’re not interested in.

Embedding A Simple Semantic Network

While tedious, the process is very simple – it requires embedding a “dictionary” in a configuration file. This dictionary is formatted using the JSON (Javascript Object Notation) syntax.

What is necessary is to create a matrix of related terms. In the following example, the goal is to be able retrieve all of the posts which refer to cocoa, irrespective of the way it is spelled. Searching for “cocoa” will also return posts that contain “cacao” or “cacau”. Search results will be ordered with posts with “cocoa” in them ranking higher than posts with “cacao” or “cacau” in them.

{
  "cocoa": ["cacao", "cacau"],
  "cacao": ["cacau", "cocoa"],
  "cacau": ["cocoa", "cacao"]
  }

While for cocoa the part of the engine that handles misspelling might catch all these instances, by explicitly referencing them I know every version will be caught and I can, to some extent, control how the results will be displayed.

The questions to be answered when creating this network can get very deep. For example, how far should I take the word chocolate?

Following a list is just some of the spellings of chocolate in different languages – and only those languages that use a Latin alphabet! There are chocolate makers in countries that use these languages. Let me know in the comments what you think. Should I include all of the languages? Some of them? Which ones are important to keep? Which ones are less (or not) important to keep. I can argue that Latin is not all that important, but what do you think?

{
txokolatea (Basque)
čokoláda (Czech)
chokolade (Danish)
chocola (Dutch)
chocolate (English)
ĉokolado (Esperanto)
suklaa (Finnish)
chocolat (French, Portuguese)
schokolade (German)
chokola (Haitian creole)
kokoleka (Hawaiian)
csokoládé (Hungarian)
súkkulaði (Icelandic)
cokelat (Indonesian)
seacláid (Irish)
cioccolato (Italian)
scelerisque (Latin)
šokolado (Lithuanian)
coklat (Malay)
tiakarete (Maori)
sjokolade (Norwegian)
czekolada (Polish)
ciocolată (Romanian)
sukalati (Samoan)
choklad (Swedish)
}

Two others I think deserve to be on the list include:

{
  "melanger": ["melangeur"],
  "melangeur": ["melanger"],
  "gianduja": ["gianduia"],
  "gianduia": ["gianduja"]
  }

Again, while the misspelling engine may pick these up, by naming them explicitly I can be guaranteed they will be found.

Examples of search terms that may be less useful to include are abbreviations. I am happy to add the ones in that people think are helpful.

{
  "san francisco": ["sf"],
  "sf": ["san francisco"],
  "fcia": ["fine chocolate industry association"],
  "fcci": ["fine chocolate and cocoa institute"]
  }

Extending the Semantic Network

An area that deserves a lot of thought is extending the dictionary to a more extensive semantic network. (A never-ending effort to create and maintain.) Some of these entries could be one-way, that is, a search for “piura” would also return “peru” where a search for “peru” need not (necessarily) find ”piura”. Entries can be associative (two or more way) and need not be symmetrical.

{
  "piura": ["peru"],
  "chuncho": ["peru"],
  "maranon": ["peru"],
  "ucayali": ["peru"],
  "peru": ["chuncho", "maranon", "piura", "ucayali"],
  "tcga": ["belize"],
  "chuno": ["ingemann", "nicaragua"],
  "rugoso": ["ingemann", "nicaragua"],
  "porcelana": ["criollo", "white bean"],
  "criollo": ["porcelana"]
  }

Let me know what you think. For example, should all geographic entries be associative (two-way)?

Future Development

Once Meili Search is configured and operating, we’ll be looking into adding new features that improve the quality (relevance) of the search results and add the ability to sort and filter results based on additional criteria.

How You Can Help

In the comments below, add common words, spellings, concepts, and topics in chocolate that you think would be a helpful addition to this dictionary as well as your ideas on how useful you think this effort is.


Featured image credit: Original by Joshua Hoehne on Unsplash

Comments

Sign in or become a The Chocolate Life member to join the conversation.
Just enter your email below to get a log in link.