A semantic tool for our chat logs
source link: https://www.tuicool.com/articles/hit/6bIjQbF
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
I recently met Sébastien who is the sales person of Semdee , a small company writing a tool to extract insightful infos from raw textfiles.
semdee
The tool works a bit like Lucene . It uses contextual word similarities to extract facets (group of words in the same field). They are called clusters in semdee. semdee is language agnostic and can take any format as input.
The user can navigate in the data by clicking words in tag clouds.
Sébastien ran Semdee on the IRC logs generated by itsumi (the first bot of the chan made bysatou). We have around 6 months of log files in CSV format.
tag clouds
Here is the root for the most frequent words on our logs:
If we select “IRC” only:
I ofen say people an IRC client is “comfier”.
“lol” seems to be my favorite expression:
I’m also someone obsessed by “programming languages”.
lists
I imagine the tool computes a distance between words (like Levenshtein ). We see the value in the “Proximity” column.
We can see the facets in action here for the query “Yeah, dpt is as fun as ever” . The tool understands people who disagree.
If you find that inspiring, you can join us on thechat.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK