This project and its related projects sound like a good idea, but really aren't.
Using the stripTags
function could be dangerous. From https://golang.org/pkg/html/template/#hdr-Security_Model:
This package assumes that template authors are trusted
stripTags
resides within html/template
and works according to those guaranties. Which might mean, that certain XSS attacks might go through undetected.
A fast, reliable and already battle-worn library to strip HTML tags is bluemonday.
They've got the bluemonday.StrictPolicy()
mode:
bluemonday.StrictPolicy()
is a mode which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on it's whitelist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements and the content of the elements should be stripped. This is a very strict policy.
Example:
stripped := bluemonday.StrictPolicy().SanitizeBytes(`<a onblur="alert(secret)" href="http://www.google.com">Google</a>`)
// Output: Google
That is exactly what you want when stripping arbitrary HTML content. A library, which understands XSS attacks and knows how to defuse these attacks. Even to the point of stripping all tags, leaving only plain text.
This Go package strips HTML tags from strings. No heavy lifting is done in this package. The unexported stripTags
fuction from html/template/html.go
is better suited for this task. All this package does is providing an exported function to access stripTags
.
stripTags
function in html/template/html.go could be really useful, however, it is not exported.stripTags
were made on Github without success.html/template
files and put the content into one single file.html/template
source files. Instead, it copies all html/template
files from go source into this package and adds one export.go
file, which adds a StripTags
function (see Versioning for the whole workflow).Import the library with
import "github.com/denisbrodbeck/striphtmltags"
package main
import (
"fmt"
"github.com/denisbrodbeck/striphtmltags"
)
func main() {
html := `<script>...</script> <b>¡Hi!</b>`
got := striphtmltags.StripTags(html)
fmt.Println(got)
// Output: ¡Hi!
}
This package follows the go release cycle.
On each new go release we:
$GOSRC/src/html/template/
into html/template
StripTags
which calls stripTags
Build script:
#!/usr/bin/env bash
set -eru -o pipefail
# exit on error
# exit on uninitialized variables
# enter restricted shell https://www.gnu.org/s/bash/manual/html_node/The-Restricted-Shell.html
URL='https://redirector.gvt1.com/edgedl/go/go1.9.2.src.tar.gz'
curl -L --silent "$URL" -o "go.tar.gz"
tar -zxf "go.tar.gz"
rm -rf "html/template/*"
cp "go/LICENSE" "./"
cp "go/PATENTS" "./"
cp "go/VERSION" "./"
cp -a "go/src/html/template/" "html/template/"
cp "export.go.tpl" "html/template/export.go"
rm -f "go.tar.gz"
rm -rf "./go/"
This package uses the unexported stripTags
function from html/template
. That works for most normal use cases, when you want to completely strip HTML tags.
If you need to sanitize potentially unsafe user input, while preserving some valid html tags, consider using HTML sanitizer libraries such as Bluemonday.
The original go license. Please have a look at the LICENSE for more details.