9

Pre-render Blazor WebAssembly at build time to optimize for search engines

 3 years ago
source link: https://swimburger.net/blog/dotnet/pre-render-blazor-webassembly-at-build-time-to-optimize-for-search-engines
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Pre-render Blazor WebAssembly at build time to optimize for search engines

Niels Swimberghe - 1/3/2021 - .NET

Follow me on Twitter, buy me a coffee

Blazor logo next to title: Pre-render Blazor WebAssembly at build time to optimize for search engines

The output of a published Blazor WebAssembly application consists of static files exclusively. Hence these applications can be hosted on static site hosts like Azure Static Web Apps, GitHub Pages, Firebase Hosting, and more. But just like other single page application (SPA) frameworks, Blazor WASM is not properly indexed by search engines. This makes it very hard to do search engine optimization (SEO).

The issues discussed in this article are not exclusive to SPA frameworks, but any web application requiring JavaScript execution to render the content.

The SEO problems with SPA's #

Search engines crawl your website using "crawlers" also called "spiders" sometimes. These crawlers are essentially bots visiting every page it can find on your website. The content on those pages are then added to the search index. But what do crawlers see when they visit SPA's? For Blazor WASM applications, this is what crawlers see:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
    <title>BlazorWasmPrerender</title>
    <base href="/" />
    <link href="css/bootstrap/bootstrap.min.css" rel="stylesheet" />
    <link href="css/app.css" rel="stylesheet" />
    <link href="BlazorWasmPrerender.styles.css" rel="stylesheet" />
</head>
<body>
    <div id="app">Loading...</div>
    <div id="blazor-error-ui">
        An unhandled error has occurred.
        <a href="" class="reload">Reload</a>
        <a class="dismiss">🗙</a>
    </div>
    <script src="_framework/blazor.webassembly.js"></script>
</body>
</html>

The only thing the crawlers see are the words "Loading…" or a blank screen.
To optimize for search engines at the margins, you can add some content above and below, or use the title and description meta tags in the head, but that won’t get you far. The reason why it doesn’t see your entire SPA, is because most search engine crawlers do not execute JavaScript.

The exception to this is Google which depending on the geography you are targeting is the primary search engine. Google crawls websites just like other search engines, but for JavaScript applications will go through an additional phase of rendering the pages with JavaScript execution

This rendering phase also works for Blazor WASM applications. Below is a Google search query which is returning content generated by Blazor WASM, proving that Google’s crawlers are able to render and index Blazor WASM applications:

Screenshot of a Google Search query returning Blazor WASM content

Even though Google is able to render and index JavaScript applications, the gold standard is still server side rendered HTML. Google takes longer to index JavaScript applications than it does indexing plain HTML. But most importantly, there are a lot of other search engines out there. If your content targets an Eastern European audience, Yandex is another search engine that you need to optimize for. In China, Baidu is the most popular search engine and Google Search doesn’t even exist. And of course, there are many smaller players all around the world you should consider optimizing for.

The SEO solutions for SPA's #

Many SPA frameworks have support for rendering the same application on both client and server. Being able to render both server and client side combines best of both worlds. You have the speed and SEO benefits from server rendered applications and the UX benefits of a SPA. Since .NET 5, Blazor WebAssembly also has support for pre-rendering on the server.

Server side pre-rendering is a great solution, but it does require your code to be executed on a server which is not possible if you’re using a static site host.
Static site hosting services do not allow server side code execution and only serve static files. This limitation does mean these services are incredibly performant and affordable, often with generous free tiers. Hence why these services are so popular.

So instead of pre-rendering server side at request time, you can pre-render at build-time. You can then host the HTML files outputted at build-time on static site hosts, giving you the best performance, best SEO, best UX, and a more affordable hosting.
There is no official support for doing this in Blazor WASM, but you can quickly hack something together with existing open-source pre-rendering tools.

Pre-render Blazor WASM at build time #

This tutorial will walk you through creating a Blazor WASM project and adding PowerShell/Bash scripts to pre-render the application. Then the functionality will be integrated into GitHub Actions to deploy the result to GitHub Pages.

Prerequisites #

You will need the following:

  • OS: Windows or Linux distro supported by .NET and NodeJS
  • .NET SDK (3.1 and up)
  • NodeJS
  • GitHub account

You can find the source code for the sample at this GitHub repository.

Setup the Blazor WASM project #

Use the following command to create a folder named Client and create a Blazor WASM project inside of that folder:

mkdir Client
dotnet new blazorwasm -o Client -n BlazorWasmPrerender

Integrating a pre-render tool #

There are many tools in the open-source community that provide pre-rendering functionality. For this tutorial, you will be using a very popular NodeJS tool called 'react-snap'. React-snap can pre-render your application using a headless Chrome browser. Point the tool to your SPA output files and it will render your application and follow any links it can find while saving the results to disk as HTML files. React-snap supports any SPA framework, even though the name may give you the impression it's only for react applications.

You can install the tool globally using the following command:

npm install -g react-snap

React-snap does require some configuration and it has to be stored in a json-file named package.json. When you run react-snap, it expects package.json to be in the current working directory (CWD). Using the following commands, create a folder named Prerender and create an empty package.json file in the folder:

PowerShell:

mkdir Prerender
New-Item Prerender/package.json -ItemType File

Bash:

mkdir Prerender
touch Prerender/package.json

Now publish the Blazor WASM project into a new subfolder of Prerender named output:

dotnet publish Client/BlazorWasmPrerender.csproj -c Release -o Prerender/output

The publish command, published all the static files to Prerender/output/wwwroot.
Now you need to add the necessary configuration for react-snap to Prerender/package.json:

{
    "reactSnap": {
        "source": "output/wwwroot",
        "minifyHtml": {
            "collapseWhitespace": true,
            "removeComments": true
        },
        "puppeteerArgs": ["--no-sandbox", "--disable-setuid-sandbox"]
    }
}

The most important configuration is the "source" property which tells react-snap where the static web app files are to pre-render. Refer react-snap’s GitHub readme for more information on the other properties.

With the configuration in place and the Blazor WASM published, you can now run react-snap using the following commands:

cd Prerender
npx react-snap
# Output: 
#   💬  console.log at /: Streaming compilation failed. Falling back to ArrayBuffer instantiation.  JSHandle@error
#   💬  console.log at /: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
#   ✅  crawled 1 out of 4 (/)
#   💬  console.log at /counter: Streaming compilation failed. Falling back to ArrayBuffer instantiation.  JSHandle@error️️️💬  console.log at /fetchdata: Streaming compilation failed. Falling back to ArrayBuffer instantiation.  JSHandle@error
#   💬  console.log at /404.html: Streaming compilation failed. Falling back to ArrayBuffer instantiation.  JSHandle@error
#   💬  console.log at /counter: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
#   💬  console.log at /fetchdata: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
#   💬  console.log at /404.html: mono_wasm_runtime_ready fe00e07a-5519-4dfe-b35a-f867dbaf2e28
#   ✅  crawled 2 out of 4 (/counter)
#   ⚠️  warning: 404 page title does not contain "404" string
#   ✅  crawled 3 out of 4 (/404.html)
#   ✅  crawled 4 out of 4 (/fetchdata)

You can see which files have been created in the output. Let's see if the pre-rendered project works as expected by running a static http server at Prerender/output/wwwroot.
To serve the static files locally, you can use a .NET global tool called 'dotnet-serve' Use the following commands to install the tool and then serve the static files:

dotnet tool install --global dotnet-serve
# make sure your CWD is Prerender
dotnet serve -o -d output/wwwroot

The 'o' argument instructs the tool to open the browser automatically. The 'd' argument tells the tool which directory to serve.

You can verify in the browser that the application has been pre-rendered successfully. You can navigate around and verify the source returns all the HTML content. But the interactivity is broken. When you click on the "Click me" button at the /counter page it doesn't do anything. When you check the browser console, you can get a glimpse of what is going wrong:

Screenshot of non-interactive pre-rendered Blazor WASM application with errors in the console

Normally, the output of a Blazor WASM application will have the following JavaScript reference at the bottom of the body-tag:

<script src="_framework/blazor.webassembly.js"></script>

In the pre-rendered HTML files, there are multiple script-tags:

<script src="_framework/blazor.webassembly.js"></script>
<script type="text/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;</script>
<script src="_framework/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"></script>

In a normal Blazor WASM scenario, the new script-tags are generated by the script blazor.webassembly.js when the browser loads the script. The problem arises because the script was executed as part of the pre-rendering, but then the script is executed again when the browser loads it. The script wasn't developed with this edge case in mind and cannot handle this scenario.
When you remove these two extra script-tags, and refresh the browser, the Blazor's interactivity will be restored. To make this easier, you can use some PowerShell or Bash scripting to automate this.

PowerShell:

Get-ChildItem ".\output\wwwroot\*.html" -Recurse | ForEach-Object { 
    $HtmlFileContent = (Get-Content -Path $_.FullName -Raw);
    $HtmlFileContent = $HtmlFileContent.Replace('[HTML YOU WANT TO REMOVE]','')
    Set-Content -Path $_.FullName -Value $HtmlFileContent
}

Bash:

find . -name "*.html" | while read htmlFile; do
    sed -i 's/[ESCAPED HTML YOU WANT TO REMOVE]//g' $htmlFile
done

It is likely that the HTML will differ depending on the version of .NET you are using. Here’s the same script with the HTML from .NET 5 copied in:

PowerShell:

Get-ChildItem ".\output\wwwroot\*.html" -Recurse | ForEach-Object {     
    $HtmlFileContent = (Get-Content -Path $_.FullName -Raw)
    $HtmlFileContent = $HtmlFileContent.Replace('<script type="text/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;</script><script src="_framework/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"></script>','')
    Set-Content -Path $_.FullName -Value $HtmlFileContent
}

Bash:

find . -name "output/*.html" | while read htmlFile; do
    sed -i 's/<script type="text\/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;<\/script><script src="_framework\/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc\/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"><\/script>//g' $htmlFile
done

This script will search for all HTML-files recursively and remove the extra script tags from the files. (React-snap minified the HTML and put everything on a single file.)

The moment any of the HTML you’re trying to remove changes, the script won’t work as expected. If you’re feeling adventurous, you can update the scripts to be more robust by using commands/utilities that understand HTML. 

After fixing all the HTML-files, you can refresh the browser, and you now have a pre-rendered Blazor WASM application that is still interactive!

Here’s a script that puts together the publish, pre-rendering, and HTML fixing:

PowerShell:

If(Test-Path .\Prerender\output)
{
    Remove-Item -Path .\Prerender\output -Recurse
}
 
dotnet publish .\Client\BlazorWasmPrerender.csproj -c Release -o Prerender/output --nologo
Push-Location .\Prerender
npx react-snap
Get-ChildItem ".\output\wwwroot\*.html" -Recurse | ForEach-Object { 
    $HtmlFileContent = (Get-Content -Path $_.FullName -Raw);
    $HtmlFileContent = $HtmlFileContent.Replace('<script type="text/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;</script><script src="_framework/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"></script>','')
    Set-Content -Path $_.FullName -Value $HtmlFileContent
}
Pop-Location

Bash:

#!/bin/bash
rm -rf Prerender/output
dotnet publish Client/BlazorWasmPrerender.csproj -c Release -o Prerender/output --nologo
pushd Prerender
npx react-snap
find ./output -name "*.html" | while read htmlFile; do
    sed -i 's/<script type="text\/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;<\/script><script src="_framework\/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc\/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"><\/script>//g' $htmlFile
done
popd

Build and publish to GitHub Pages using GitHub Actions #

There’s a prerequisite blog post which shows you how to build and deploy Blazor WASM to GitHub Pages which goes into more details. This tutorial will focus on integrating react-snap and the script to fix the HTML into GitHub Actions.

Push Blazor project to GitHub #

To make sure you only track the relevant files, you can use the following command to create a .NET specific .gitignore file:

# add the gitignore file tailored for dotnet applications, this will ignore bin/obj and many other non-source code files
dotnet new gitignore
# ignore the Prerender/output directory
echo "" >> .gitignore
echo "Prerender/output" >> .gitignore

You need to create a local Git repository and commit your source code to the repository using these commands:

# create the git repository
git init
# track all files that are not ignore by .gitignore
git add --all
# commit all changes to the repository
git commit -m "Initial commit"

Create a new GitHub repository (instructions) and copy the commands to "push an existing repository from the command line" from the empty GitHub repository page, here's what it should looks like but with a different URL:

git remote add origin https://github.com/Swimburger/BlazorWasmPrerender.git
git push -u origin main

Create GitHub Action #

Create a new file at .github/workflows/ named main.yml:

PowerShell:

mkdir .github/workflows
New-Item .github/workflows/main.yml -ItemType File

Bash:

mkdir .github/workflows
touch .github/workflows/main.yml

Copy the following content to main.yml:

name: Deploy to GitHub Pages
 
# Run workflow on every push to the master branch
on:
  push:
    branches: [ main ]
 
jobs:
  deploy-to-github-pages:
    # use ubuntu-latest image to run steps on
    runs-on: ubuntu-latest
    steps:
    # uses GitHub's checkout action to checkout code form the master branch
    - uses: actions/checkout@v2
    
    # sets up .NET Core SDK 5.0.101
    - name: Setup .NET Core SDK
      uses: actions/setup-dotnet@v1
      with:
        dotnet-version: 5.0.101
 
    # publishes Blazor project to the Prerender/output folder
    - name: Publish .NET Core Project
      run: dotnet publish Client/BlazorWasmPrerender.csproj -c Release -o Prerender/output --nologo
    
    # Use NodeJS react-snap utitility to prerender static website
    - name: prerender Blazor client
      working-directory: Prerender
      run: npx react-snap
    
    # change base tag in all html files to include subfolder
    - name: Change base tag
      working-directory: Prerender/output/wwwroot
      run: |
        find . -name "*.html" | while read htmlFile; do
            sed -i 's/<base href="\/"/<base href="\/BlazorWasmPrerender\/"/g' $htmlFile
            sed -i 's/<script type="text\/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;<\/script><script src="_framework\/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc\/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"><\/script>//g' $htmlFile
        done
    # add .nojekyll file to tell GitHub pages to not treat this as a Jekyll project. (Allow files and folders starting with an underscore)
    - name: Add .nojekyll file
      run: touch Prerender/output/wwwroot/.nojekyll
      
    - name: Commit wwwroot to GitHub Pages
      uses: JamesIves/[email protected]
      with:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        BRANCH: gh-pages
        FOLDER: Prerender/output/wwwroot

Let’s dissect the adjustments made to integrate pre-rendering:

# publishes Blazor project to the Prerender/output folder
- name: Publish .NET Core Project
  run: dotnet publish Client/BlazorWasmPrerender.csproj -c Release -o Prerender/output --nologo

Just as you published the project locally, GitHub Actions will publish the Blazor WASM project to Prerender/output.

# Use NodeJS react-snap utitility to prerender static website
- name: prerender Blazor client
  working-directory: Prerender
  run: npx react-snap

NodeJS is pre-installed on the 'ubuntu-latest' images, so you can simply run the npx react-snap command. The command will be run with 'Prerender' as the CWD.

# change base tag in all html files to include subfolder
- name: Change base tag
  working-directory: Prerender/output/wwwroot
  run: |
    find . -name "*.html" | while read htmlFile; do
        sed -i 's/<base href="\/"/<base href="\/BlazorWasmPrerender\/"/g' $htmlFile
        sed -i 's/<script type="text\/javascript">var Module; window.__wasmmodulecallback__(); delete window.__wasmmodulecallback__;<\/script><script src="_framework\/dotnet.5.0.1.js" defer="" integrity="sha256-SWZOE2EsCqc\/7dPgJrcFqUvVvdeJ9cipeZ2NFMC9v2s=" crossorigin="anonymous"><\/script>//g' $htmlFile
    done

Just like the bash script locally, GitHub Actions will go through all HTML-files to fix the script-tag issues. In addition to that, the base-tag will be updated to reflect the GitHub Pages subdirectory it will be hosted in. For more details on the base-tag issue, refer to the prerequisite tutorial.

The rest of the workflow YAML file is exactly the same as the prerequisite article. In addition to the Blazor project being pre-rendered to optimize for search engines, GitHub Pages will now not return 404 status codes anymore when requesting child subpaths in the application.

Summary #

SPA frameworks like Blazor require the execution of JavaScript to function. Most search engine crawler’s don’t execute JavaScript which means JavaScript generated content won’t be indexed. Google’s crawlers do execute JavaScript and they can successfully index Blazor WASM apps. To optimize for other search engines, you can pre-render Blazor WASM on the server, but that requires server side code execution. 

For static site hosts like GitHub Actions and Azure Static Web Apps, server side code execution is not available. Instead of pre-rendering in response to HTTP requests, you can move the pre-rendering to the build pipeline.
Using pre-rendering tools like react-snap, you can pre-render Blazor WASM. Additionally, you can integrate these pre-rendering tools inside of your continuous integration and continuous deployment pipelines. Using GitHub Actions, you can pre-render your Blazor application and deploy it to GitHub Pages. Deploying pre-rendered applications to GitHub Pages will increase SEO but also resolve those pesky HTTP 404 errors.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK