2

Finding Patterns in Documents

 1 year ago
source link: http://heidloff.net/article/finding-patterns-in-documents/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Finding Patterns in Documents

Watson Discovery is an IBM offering to search and analyze information in various types of documents. This post describes how to find patterns in documents via a graphical experience provided by Watson Discovery.

There are several ways to find patterns in texts. Regular expressions are very powerful, but also not trivial to define. Plus they are not really optimal to identify longer repeating expressions in different formats.

Let’s look at an example. As data the publicly available IBM earning report 2018 is used. The goal is to find all occurrences of ‘revenue of …’ with different amounts. In Watson Discovery you can simply select occurrences of this pattern.

wd-04-08.png

After you’ve selected multiple occurances, Watson starts learning. To improve and validate patterns, Discovery provides a list of further suggestions which can be confirmed or rejected.

wd-04-09.png

The next screenshot shows how well the pattern recognition works.

wd-04-10.png

When searching for documents or parts of documents, the found patterns are annotated.

wd-04-17.png

To find out more about this topic, check out the Watson Discovery documentation.

Share this:

Niklas Heidloff

4Y7B9422-4.jpg
Hi, my name is Niklas Heidloff. I work for IBM as a Developer Advocate. I like learning, cloud technologies, Java, JavaScript and AI. I'm a proud father of five and love BBQ.

Subscribe

Email Address

Latest Tweets

DZone

Disclaimer

The postings on this site are my own and don't necessarily represent my employer IBM's positions, strategies or opinions.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK