80

Page.REST - An HTTP API to extract OpenGraph, oEmbed or any other content from a...

 6 years ago
source link: https://page.rest
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

An HTTP API to unfurl and extract content from any web page as JSON.logo.svg

Page.REST

This is an HTTP API you can use to unfurl and extract content from any web page as JSON. You can get the title, description, open graph, embed content or any other information available at a given public URL (check examples below).

You can use this for building:

  • Chatbots
  • Rich text inputs (auto-linking)
  • Browser extensions
  • Site monitoring & validation tools
  • Sales & marketing tools
  • Data scrapers for research

You will need an access token to use Page.REST API. An access token costs $5 for 10,000 requests.

Why use this?

You might be wondering why you should use Page.REST API rather than coding it yourself.

Here are some reasons:

  • It handles the nitty gritty edge cases (HTML parsing is 😤 😰 😩 )
  • You save network bandwidth (only download what you need from a page)
  • Hosted using Google Cloud Functions - so it will have high availability
  • You want to hack something quickly!

How to use

Try the examples to see what API returns. You can edit the code to try different URLs. (alternatively, you can run it on Postman)

Basic

The default request grabs site’s title, description, logo, favicons, canonical URL, status code, and Twitter handle.
async () => {
  const token = "YOUR_TOKEN";
  const url = "https://www.stripe.com";
  const res = await fetch(
`https://page.rest/fetch?token=${token}&url=${url}`);
  return await res.json();
}
{
    "error": "Sorry, this service is discontinued."
}

Selector queries

This is probably the most useful feature. You can use CSS selectors to retrieve content from matching elements. In the example, we use selectors to retrieve the businesses and their founders featured in IndieHackers. (You can use up to 10 selector queries.)
xxxxxxxxxx
async () => {
  const token = "YOUR_TOKEN";
  const url = "https://www.indiehackers.com/businesses";
  const res = await fetch(
    `https://page.rest/fetch?token=${token}&url=${url}&selector=.interview-link__name&selector=.interview-link__founder-name`);
  return await res.json();
}
xxxxxxxxxx
{
    "error": "Sorry, this service is discontinued."
}

Pre-render content

Append &prerender=1 to the request URL to extract content from pages that render on client-side using JavaScript. In the example, we extract currently available engineering jobs from Tesla’s career page which is built using React. The selector won’t return anything if you drop the prerender parameter.
xxxxxxxxxx
async () => {
  const token = "YOUR_TOKEN";
  const url = "https://www.tesla.com/careers/search%23/?department=1";
  const res = await fetch(
    `https://page.rest/fetch?token=${token}&url=${url}&selector=.listing-title&prerender=1`);
  return await res.json();
}
xxxxxxxxxx
{
    "error": "Sorry, this service is discontinued."
}

Embed content

Append &embed=1 to the request URL to get the oEmbed content for the page as part of the response (only if available).
xxxxxxxxxx
async () => {
  const token = "YOUR_TOKEN";
  const url = "https://youtu.be/dQw4w9WgXcQ";
  const res = await fetch(
`https://page.rest/fetch?token=${token}&url=${url}&embed=1`);
  return await res.json();
}
xxxxxxxxxx
{
    "error": "Sorry, this service is discontinued."
}

Open Graph

Append &og=1 to the request URL to get the OpenGraph content for the page as part of the response (only if available).
xxxxxxxxxx
async () => {
    const token = "YOUR_TOKEN";
    const url = "https://www.nytimes.com/2017/09/12/technology/apple-iphone-event.html";
    const res = await fetch(`https://page.rest/fetch?token=${token}&url=${url}&og=1`);
    return await res.json();
}
xxxxxxxxxx
Loading response...

Response headers

Get any HTTP headers defined in the response. In the example, we check security headers of github.com.
xxxxxxxxxx
async () => {
  const token = "YOUR_TOKEN";
  const url = "https://github.com";
  const res = await fetch(`https://page.rest/fetch?token=${token}&url=${url}&header=X-Frame-Options&header=X-XSS-Protection&header=Content-Security-Policy`);
  return await res.json();
}
xxxxxxxxxx
{
    "error": "Sorry, this service is discontinued."
}

Sorry, not accepting new customers for the service.

Created by Lakshan Perera. Need to take screenshots of lots of web sites? Try Screen.rip


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK