0

Readability and JSDOM

 4 weeks ago
source link: https://willschenk.com/labnotes/2024/readability_and_jsdom/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Readability and JSDOM

took me a while to get this little code

Published April 11, 2024

Contents

Been pulling a lot of data from webpages recently, and here's a simple way to get the text. It's not perfect but it's easy.

package.json:

{
    "type": "module",
    "dependencies": {
        "@mozilla/readability": "^0.5.0",
        "jsdom": "^24.0.0"
    }
}
  import { Readability } from '@mozilla/readability';
  import { JSDOM } from 'jsdom';

  async function extractText(url) {
      const doc = await JSDOM.fromURL(url);
      let reader = new Readability(doc.window.document);
      let article = reader.parse();

      return article;
  }

  const text = await extractText(  "https://willschenk.com/fragments/2024/discovering_idagio/" );

  console.log( text.title );
  console.log( text.textContent )

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK