Beautiful Soup Alternatives for Go
Continuing the topic of extracting data from html
- For a direct Beautiful Soup analogue in Go, use soup.
- For CSS selector support, consider goquery.
- For XPath queries, use htmlquery.
- For another Beautiful Soup-inspired option, look at Node.
If you’re looking for a Beautiful Soup equivalent in Go, several libraries offer similar HTML parsing and scraping functionality:

soup
- soup is a Go library explicitly designed as an analogue to Python’s Beautiful Soup. Its API is intentionally similar, featuring functions like
Find,FindAll, andHTMLParse, making it easy for developers familiar with Beautiful Soup to transition to Go. - It allows you to fetch web pages, parse HTML, and traverse the DOM to extract data, much like Beautiful Soup.
- Example usage:
resp, err := soup.Get("https://xkcd.com") if err != nil { os.Exit(1) } doc := soup.HTMLParse(resp) links := doc.Find("div", "id", "comicLinks").FindAll("a") for _, link := range links { fmt.Println(link.Text(), "| Link :", link.Attrs()["href"]) } - Note: soup does not support CSS selectors or XPath; it relies on tag and attribute-based searching.
goquery
- goquery is another popular Go library for HTML parsing, offering a jQuery-like syntax for DOM traversal and manipulation.
- It supports CSS selectors, making it more flexible for complex queries compared to soup.
- Example usage:
doc, err := goquery.NewDocumentFromReader(resp.Body) doc.Find("div#comicLinks a").Each(func(i int, s *goquery.Selection) { fmt.Println(s.Text(), "| Link :", s.AttrOr("href", "")) })
htmlquery Go Library
htmlquery is a Go library designed for parsing and extracting data from HTML documents using XPath expressions. It provides a straightforward API for traversing and querying the HTML tree structure, making it especially useful for web scraping and data extraction tasks.
Key Features
- Allows querying HTML documents with XPath 1.0/2.0 expressions.
- Supports loading HTML from strings, files, or URLs.
- Offers functions to find single or multiple nodes, extract attributes, and evaluate XPath expressions.
- Includes query caching (LRU-based) to improve performance by avoiding repeated compilation of XPath expressions.
- Built on top of Go’s standard HTML parsing libraries and is compatible with other Go libraries like goquery.
Basic Usage Examples
Load HTML from a string:
doc, err := htmlquery.Parse(strings.NewReader("..."))
Load HTML from a URL:
doc, err := htmlquery.LoadURL("http://example.com/")
Find all `` elements:
list := htmlquery.Find(doc, "//a")
Find all `` elements with an href attribute:
list := htmlquery.Find(doc, "//a[@href]")
Extract the text of the first `` element:
h1 := htmlquery.FindOne(doc, "//h1")
fmt.Println(htmlquery.InnerText(h1)) // Outputs the text inside
Extract all values of the href attribute from `` elements:
list := htmlquery.Find(doc, "//a/@href")
for _, n := range list {
fmt.Println(htmlquery.SelectAttr(n, "href"))
}
Typical Use Cases
- Web scraping where XPath provides more precise or complex querying than CSS selectors.
- Extracting structured data from HTML documents.
- Navigating and manipulating HTML trees programmatically.
Installation
go get github.com/antchfx/htmlquery
Node
- Node is a Go package inspired by Beautiful Soup, providing APIs for extracting data from HTML and XML documents.
Colly
Colly - A web scraping framework for Go, which uses goquery internally for HTML parsing.
https://github.com/gocolly/colly
To install - add colly to your go.mod file:
module github.com/x/y
go 1.14
require (
github.com/gocolly/colly/v2 latest
)
Example:
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting", r.URL)
})
c.Visit("http://go-colly.org/")
}
Comparison Table
| Library | API Style | Selector Support | Inspiration | Notes |
|---|---|---|---|---|
| soup | Beautiful Soup-like | Tag & attribute only | Beautiful Soup | Simple, no CSS/XPath |
| goquery | jQuery-like | CSS selectors | jQuery | Flexible, popular |
| htmlquery | XPath | XPath | lxml/XPath | Advanced queries |
| Node | Beautiful Soup-like | Tag & attribute | Beautiful Soup | Similar to soup |
Summary
- For a direct Beautiful Soup analogue in Go, use soup.
- For CSS selector support, consider goquery.
- For XPath queries, use htmlquery.
- For another Beautiful Soup-inspired option, look at Node.
All these libraries leverage Go’s standard HTML parser, which is robust and HTML5-compliant, so the main difference is in API style and selector capabilities.