0
Readability and JSDOM
source link: https://willschenk.com/labnotes/2024/readability_and_jsdom/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Been pulling a lot of data from webpages recently, and here's a simple way to get the text. It's not perfect but it's easy.
package.json
:
{
"type": "module",
"dependencies": {
"@mozilla/readability": "^0.5.0",
"jsdom": "^24.0.0"
}
}
import { Readability } from '@mozilla/readability';
import { JSDOM } from 'jsdom';
async function extractText(url) {
const doc = await JSDOM.fromURL(url);
let reader = new Readability(doc.window.document);
let article = reader.parse();
return article;
}
const text = await extractText( "https://willschenk.com/fragments/2024/discovering_idagio/" );
console.log( text.title );
console.log( text.textContent )
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK