8

Structuring semi-structured Documents

 1 year ago
source link: http://heidloff.net/article/structuring-semi-structured-documents/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Structuring semi-structured Documents

Watson Discovery is an IBM offering to search and analyze information in various types of documents. This post describes how the graphical interface in Discovery can be used to structure information in documents to reduce noise.

In the sample scenario data is used from the US Securities and Exchange Commission. The goal is to improve the quality of the result for the question “What is the purpose of Rule 15c3-5?”.

By default Watson Discovery returns the right answer from page 81.

Screenshot-2022-12-13-at-09.24.07.png

But the result includes the ‘noise’ from the previous sentence.

wd-03-06.png

To improve this, the document can be structured by identifying sections graphically. The following screenshot shows how footnotes, subtitles, headers, etc. can be defined by selecting text in the right column.

wd-03-10.png

As result Discovery understands the different sections in the document and only returns the actual sentence of the answer.

wd-03-13.png

To find out more about Watson Discovery, check out the documentation.

Share this:

Niklas Heidloff

4Y7B9422-4.jpg
Hi, my name is Niklas Heidloff. I work for IBM as a Developer Advocate. I like learning, cloud technologies, Java, JavaScript and AI. I'm a proud father of five and love BBQ.

Subscribe

Email Address

Latest Tweets

DZone

Disclaimer

The postings on this site are my own and don't necessarily represent my employer IBM's positions, strategies or opinions.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK