

[Golang] Iterate over All DOM Elements in HTML
source link: http://siongui.github.io/2016/04/10/go-iterate-over-all-dom-elements-in-html/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Introduction
Iterate over all DOM elements in HTML via Golang. Use net/html package to parse and iterate all elements in HTML. Search for HTML links and output them in reStructuredText format.
Another example of iterating over all DOM elements can be found in [4].
Install net/html package
$ go get -u golang.org/x/net/html
Traverse DOM Tree
Traverse the DOM tree (Iterate over all elements in HTML):
html.go | repository | view raw
package main import ( "flag" "fmt" "golang.org/x/net/html" "os" ) func traverse(n *html.Node) { if isAnchorElement(n) { printRstLink(n) } for c := n.FirstChild; c != nil; c = c.NextSibling { traverse(c) } } func parseCommandLineArguments() string { pPath := flag.String("input", "", "Path of HTML file to be processed") flag.Parse() path := *pPath if path == "" { fmt.Fprintf(os.Stderr, "Error: empty path!\n") } return path } func main() { inputFile := parseCommandLineArguments() fin, err := os.Open(inputFile) if err != nil { panic("Fail to open " + inputFile) } defer fin.Close() doc, err := html.Parse(fin) if err != nil { panic("Fail to parse " + inputFile) } traverse(doc) }
Find HTML links and print them in reStructuredText format:
handleHtmlLink.go | repository | view raw
package main import ( "errors" "fmt" "golang.org/x/net/html" "os" "strings" ) func isAnchorElement(n *html.Node) bool { return n.Type == html.ElementNode && n.Data == "a" } func isTextNode(n *html.Node) bool { return n.Type == html.TextNode } func isHasOnlyOneChild(n *html.Node) bool { return n.FirstChild != nil && n.FirstChild == n.LastChild } func getAttribute(n *html.Node, key string) (string, error) { for _, attr := range n.Attr { if attr.Key == key { return attr.Val, nil } } return "", errors.New(key + " not exist in attribute!") } func printRstLink(n *html.Node) { if !isHasOnlyOneChild(n) { fmt.Fprintf(os.Stderr, "Child number of anchor is not 1\n") return } if !isTextNode(n.FirstChild) { fmt.Fprintf(os.Stderr, "Child of anchor is not TextNode\n") return } text := strings.TrimSpace(n.FirstChild.Data) href, err := getAttribute(n, "href") if err != nil { fmt.Fprintf(os.Stderr, err.Error()) return } rstLink := "`" + text + " <" + href + ">`__" fmt.Println(rstLink) }
Usage
Download any HTML file and pass the file path to Go program by input flag. For example, if you have index.html put together with Go program in current directory, run the program by the following command:
$ go run html.go handleHtmlLink.go -input=index.html
Tested on: Ubuntu Linux 15.10, Go 1.6.
References:
Recommend
-
5
Introduction A common requirement for front-end applications is listing out items. It can take the form of a to-do list and card systems. Vue.js supports rendering lists of items onto the browser using the built-in v-for<
-
9
3 Ways to Select HTML DOM Elements Navigating the HTML document object model (DOM) of a web app used to be one of the more tedious tasks of a front-end developer. With the increased usage of modern frameworks like
-
2
How to iterate easily over object properties in JavaScript ...
-
4
how to iterate on all specified hosts advertisements I have a playbook that is supposed to create a config file for all specified hosts, on my...
-
12
Pandas Tutorial Part #13 – Iterate over Rows & Columns of DataFrame This tutorial will discuss how to iterate over rows or columns of a DataFrame by index positions or label names. Table Of Co...
-
11
How to iterate on all values () in a QMultiHash advertisements I need to iterate over a QMultiHash
-
16
Iterate over UTF-8 or non-ASCII strings in Golang. Iterations by for or range keyword.
-
2
Introduction Iterate over all DOM nodes/elements via goquery in Golang (Go programming language). T...
-
4
How To Iterate Over Object Keys With JavaScript Custom objects cannot be iterated over using the for...of loop. In addition, you can't use iterator methods like map() a...
-
9
In this Python tutorial, we will learn about different ways to iterate over a JSON object. Agenda Introduction JSON stands for Javascript object notation. Using JSON, we can store the data...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK