Structuring semi-structured Documents
source link: http://heidloff.net/article/structuring-semi-structured-documents/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Structuring semi-structured Documents
Watson Discovery is an IBM offering to search and analyze information in various types of documents. This post describes how the graphical interface in Discovery can be used to structure information in documents to reduce noise.
In the sample scenario data is used from the US Securities and Exchange Commission. The goal is to improve the quality of the result for the question “What is the purpose of Rule 15c3-5?”.
By default Watson Discovery returns the right answer from page 81.
But the result includes the ‘noise’ from the previous sentence.
To improve this, the document can be structured by identifying sections graphically. The following screenshot shows how footnotes, subtitles, headers, etc. can be defined by selecting text in the right column.
As result Discovery understands the different sections in the document and only returns the actual sentence of the answer.
To find out more about Watson Discovery, check out the documentation.
Share this:
Niklas Heidloff
Subscribe
Email Address
Latest Tweets
DZone
Disclaimer
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK