1

Pre-rendering static websites with wget

 3 years ago
source link: https://apex.sh/blog/post/pre-render-wget/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Pre-rendering static websites with wget

In this short article, I’ll give you an example of how you can utilize any HTTP framework you like to create a website, and then pre-render it as a static site using the 23 year-old command-line program wget.

When I started to build the new website I had a few simple requirements in mind: I wanted the website layout to facilitate a wide variety of products — placing an emphasis on the content and typography, the site must load as quickly as possible, and finally, I wanted to avoid bloated solutions. That’s why I decided to skip all the fancy solutions and go straight to good-old HTML templates.

Building the website

As I mentioned, with the use of pre-rendering you can use any framework you want to create the website. It’s worth noting that the performance of your web server or framework is not very important here, since it’s only a compile-time cost, and pages can be copied in parallel.

To show you an example I’m going to use Go’s standard library here to create a small blog server which reads content and templates from disk. Its structure is this:

website
├── content
│   ├── post-a.md
│   ├── post-b.md
│   └── post-c.md
├── main.go
└── templates
    ├── post.html
    └── posts.html

I won’t go into details about this code since it’s not a Go article, but to illustrate that any web framework or language can be used for this, here’s the main.go implementation. It serves a listing of blog posts from /, and the contents of each post at /blog/<slug>.

package main

import (
  "html/template"
  "io/ioutil"
  "log"
  "net/http"
  "path/filepath"
  "strings"
)

type Post struct {
  Slug    string
  Content template.HTML
}

func main() {
  t, err := template.ParseGlob("templates/*.html")
  if err != nil {
    log.Fatalf("error parsing templates: %s\n", err)
  }

  http.HandleFunc("/", posts(t))
  http.Handle("/blog/", http.StripPrefix("/blog/", http.HandlerFunc(post(t))))
  log.Fatal(http.ListenAndServe(":3000", nil))
}

// posts renders a list of blog posts.
func posts(templates *template.Template) http.HandlerFunc {
  return func(w http.ResponseWriter, r *http.Request) {
    files, err := ioutil.ReadDir("content")
    if err != nil {
      http.Error(w, "Error listing files", http.StatusInternalServerError)
      return
    }

    var posts []Post

    for _, f := range files {
      b, err := ioutil.ReadFile(filepath.Join("content", f.Name()))
      if err != nil {
        http.Error(w, "Error reading file", http.StatusInternalServerError)
        return
      }

      posts = append(posts, Post{
        Slug:    strings.TrimSuffix(f.Name(), ".md"),
        Content: template.HTML(b),
      })
    }

    templates.ExecuteTemplate(w, "posts.html", struct {
      Posts []Post
    }{
      Posts: posts,
    })
  }
}

// post renders a blog post.
func post(templates *template.Template) http.HandlerFunc {
  return func(w http.ResponseWriter, r *http.Request) {
    slug := r.URL.Path

    b, err := ioutil.ReadFile(filepath.Join("content", slug+".md"))
    if err != nil {
      http.Error(w, "Error reading file", http.StatusInternalServerError)
      return
    }

    post := Post{
      Slug:    slug,
      Content: template.HTML(b),
    }

    templates.ExecuteTemplate(w, "post.html", struct {
      Post
    }{
      Post: post,
    })
  }
}

CSS minification

Modern browsers allow you to import CSS files from within CSS, using the @import <url> rule. For the Apex website I have a single index.css which is included within each page, it in turn imports many other styles, something like this:

@import "./Button.css";
@import "./Customers.css";
@import "./Documentation.css";
@import "./Features.css";
@import "./Help.css";
@import "./Highlights.css";
@import "./Image.css";
@import "./Logo.css";
@import "./Menu.css";
@import "./Navigation.css";
@import "./Pricing.css";
@import "./Question.css";
@import "./Section.css";
...

This is great during development since you don’t need anything but a browser, however, in production, there’s a cost for sending potentially dozens of requests to fetch each stylesheet — so you should typically aggregate and minify them. To perform the production build I went with the popular PostCSS, along with CSSNano for minification, and this configuration:

const atImport = require('postcss-import')
const nano = require('cssnano')

module.exports = {
  plugins: [
    atImport(),
    nano({ preset: 'default' })
  ]
}

Now onto the pre-rendering!

Pre-rendering the static site

The pre-rendering step utilizes wget to crawl your dynamic site, outputting static HTML representations of each page. You may have seen people achieve this with a more complex headless Chrome-based solution, but for many sites, this will be perfectly fine!

The wget command is similar to curl, however, it was designed for crawling and copying websites. Let’s take a look at the command, first we use -P to specify the output directory — ./build in this case, followed by -nv for “non-verbose” mode which outputs less information, -r which crawls and downloads recursively, and finally -E which adds the .html extension to files saved.

Note that the webserver should be running first, so that wget has something to “crawl”, let’s try it out:

$ go run main.go &
$ wget -P build -nv -nH -r -E localhost:3000

You should see some output which looks similar to the following, showing that wget has crawled the blost listing and post pages, creating their HTML file counterparts in the build directory:

2019-08-20 16:21:59 URL:http://localhost:3000/ [281/281] -> "build/index.html" [1]
2019-08-20 16:21:59 URL:http://localhost:3000/blog/post-a [22/22] -> "build/blog/post-a.html" [1]
2019-08-20 16:21:59 URL:http://localhost:3000/blog/post-b [22/22] -> "build/blog/post-b.html" [1]
2019-08-20 16:21:59 URL:http://localhost:3000/blog/post-c [22/22] -> "build/blog/post-c.html" [1]
FINISHED --2019-08-20 16:21:59--
Total wall clock time: 0.01s
Downloaded: 4 files, 628 in 0s (18.1 MB/s)

With any luck you’ll now see the output in your build directory — a complete copy of your website, now static.

build
├── blog
│   ├── post-a.html
│   ├── post-b.html
│   └── post-c.html
└── index.html

Even the blog’s RSS feed is generated using the same technique, your mileage may vary depending on the requirements, but I’m pretty happy with this simple solution!


I have some ideas for what could make static websites much more enjoyable to write, if you’re interested consider sponsoring my work on Github.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK