59

golang——json的html转义问题

 5 years ago
source link: https://studygolang.com/articles/19721?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

一、问题描述

json的Marshal 用来对slice,map,struct等结构化类型数据转义成[]byte/string,UnMarshal方法是用来对[]byte/string转义成指定结构的interface。但在处理html标签字符中,会存在转义问题。Marshal方法默认把html标签中的'<', '>' , '&'字符转义成unicode,为强制为有效UTF-8的JSON字符串,用Unicode替换符号替换无效字节。

go doc原文

String values encode as JSON strings coerced to valid UTF-8, replacing invalid bytes with the Unicode replacement rune. The angle brackets “<” and “>” are escaped to “\u003c” and “\u003e” to keep some browsers from misinterpreting JSON output as HTML. Ampersand “&” is also escaped to “\u0026” for the same reason. This escaping can be disabled using an Encoder that had SetEscapeHTML(false) alled on it.

翻译:

字符串类型encode成json串时,会被强制转义成有效utf-8编码,同时会把utf-8无法识别的字符用uncode代替。尖括号“<”和“>”被转义为“\ u003c”和“\ u003e”,以防止某些浏览器将JSON输出误解为HTML。出于同样的原因,标签“&”也被转移到“\ u0026”。 可以使用在其上调用SetEscapeHTML(false)的编码器禁用此转义。

Marshal的源码

func Marshal(v interface{}) ([]byte, error) {
    e := newEncodeState()

    err := e.marshal(v, encOpts{escapeHTML: true}) 
    if err != nil {
        return nil, err
    }
    buf := append([]byte(nil), e.Bytes()...)

    e.Reset()
    encodeStatePool.Put(e)

    return buf, nil
}

这一行encOpts{escapeHTML: true}),这里的true导致标签被转义。

二、解决办法

针对上述问题,有两种解决办法,第一种是替换上述三个tag,第二种是SetEscapeHtml(false);

package test

import (
    "bytes"
    "encoding/json"
    "fmt"
    "strings"
)

type Html struct {
    Title  string
    Body   string
    Footer string
}

func ParseHtml() {
    htmlJson := Html{
        Title:  "<title>北京欢迎你</title>",
        Body:   "<body>北京是中国的首都,有600多年的建都历史</body>",
        Footer: "<script>js:pop('123')</script>",
    }

    strJson, err := json.Marshal(htmlJson)
    if err == nil {
        //原始的json串
        fmt.Println("原始json 串", string(strJson))
    }
    var content = string(strJson)
    //第一种方法,替换'<', '>', '&'
    content = strings.Replace(string(strJson), "\\u003c", "<", -1)
    content = strings.Replace(content, "\\u003e", ">", -1)
    content = strings.Replace(content, "\\u0026", "&", -1)
    fmt.Println("第一种解决办法:", content)

    //第二种方法,SetEscapeHTML(False)
    bf := bytes.NewBuffer([]byte{})
    jsonEncoder := json.NewEncoder(bf)
    jsonEncoder.SetEscapeHTML(false)
    jsonEncoder.Encode(htmlJson)
    fmt.Println("第二种解决办法:", bf.String())
}

输出:

原始json 串 {"Title":"\u003ctitle\u003e北京欢迎你\u003c/title\u003e","Body":"\u003cbody\u003e北京是中国的首都,有600多年的建都历史\u003c/body\u003e","Footer":"\u003cscript\u003ejs:pop('123')\u003c/script\u003e"}
第一种解决办法: {"Title":"<title>北京欢迎你</title>","Body":"<body>北京是中国的首都,有600多年的建都历史</body>","Footer":"<script>js:pop('123')</script>"}
第二种解决办法: {"Title":"<title>北京欢迎你</title>","Body":"<body>北京是中国的首都,有600多年的建都历史</body>","Footer":"<script>js:pop('123')</script>"}

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK