What is Beautiful Soup analogue in Go?

Beautiful Soup analogues in Go are soup , goquery , htmlquery and Node .

Beautiful Soup Alternatives for Go

Continuing the topic of extracting data from html

Page content

For a direct Beautiful Soup analogue in Go, use soup.
For CSS selector support, consider goquery.
For XPath queries, use htmlquery.
For another Beautiful Soup-inspired option, look at Node.

If you’re looking for a Beautiful Soup equivalent in Go, several libraries offer similar HTML parsing and scraping functionality:

gopher is cooking soup

soup

soup is a Go library explicitly designed as an analogue to Python’s Beautiful Soup. Its API is intentionally similar, featuring functions like Find, FindAll, and HTMLParse, making it easy for developers familiar with Beautiful Soup to transition to Go.
It allows you to fetch web pages, parse HTML, and traverse the DOM to extract data, much like Beautiful Soup.

Example usage:

resp, err := soup.Get("https://xkcd.com")
if err != nil {
    os.Exit(1)
}
doc := soup.HTMLParse(resp)
links := doc.Find("div", "id", "comicLinks").FindAll("a")
for _, link := range links {
    fmt.Println(link.Text(), "| Link :", link.Attrs()["href"])
}

Note: soup does not support CSS selectors or XPath; it relies on tag and attribute-based searching.

goquery

goquery is another popular Go library for HTML parsing, offering a jQuery-like syntax for DOM traversal and manipulation.
It supports CSS selectors, making it more flexible for complex queries compared to soup.

Example usage:

doc, err := goquery.NewDocumentFromReader(resp.Body)
doc.Find("div#comicLinks a").Each(func(i int, s *goquery.Selection) {
    fmt.Println(s.Text(), "| Link :", s.AttrOr("href", ""))
})

htmlquery Go Library

htmlquery is a Go library designed for parsing and extracting data from HTML documents using XPath expressions. It provides a straightforward API for traversing and querying the HTML tree structure, making it especially useful for web scraping and data extraction tasks.

Key Features

Allows querying HTML documents with XPath 1.0/2.0 expressions.
Supports loading HTML from strings, files, or URLs.
Offers functions to find single or multiple nodes, extract attributes, and evaluate XPath expressions.
Includes query caching (LRU-based) to improve performance by avoiding repeated compilation of XPath expressions.
Built on top of Go’s standard HTML parsing libraries and is compatible with other Go libraries like goquery.

Basic Usage Examples

Load HTML from a string:

doc, err := htmlquery.Parse(strings.NewReader("..."))

Load HTML from a URL:

doc, err := htmlquery.LoadURL("http://example.com/")

Find all `` elements:

list := htmlquery.Find(doc, "//a")

Find all `` elements with an href attribute:

list := htmlquery.Find(doc, "//a[@href]")

Extract the text of the first `` element:

h1 := htmlquery.FindOne(doc, "//h1")
fmt.Println(htmlquery.InnerText(h1)) // Outputs the text inside

Extract all values of the href attribute from `` elements:

list := htmlquery.Find(doc, "//a/@href")
for _, n := range list {
    fmt.Println(htmlquery.SelectAttr(n, "href"))
}

Typical Use Cases

Web scraping where XPath provides more precise or complex querying than CSS selectors.
Extracting structured data from HTML documents.
Navigating and manipulating HTML trees programmatically.

Installation

go get github.com/antchfx/htmlquery

Node

Node is a Go package inspired by Beautiful Soup, providing APIs for extracting data from HTML and XML documents.

Colly

Colly - A web scraping framework for Go, which uses goquery internally for HTML parsing.

https://github.com/gocolly/colly

To install - add colly to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/gocolly/colly/v2 latest
)

Example:

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

Comparison Table

Library	API Style	Selector Support	Inspiration	Notes
soup	Beautiful Soup-like	Tag & attribute only	Beautiful Soup	Simple, no CSS/XPath
goquery	jQuery-like	CSS selectors	jQuery	Flexible, popular
htmlquery	XPath	XPath	lxml/XPath	Advanced queries
Node	Beautiful Soup-like	Tag & attribute	Beautiful Soup	Similar to soup

Summary

For a direct Beautiful Soup analogue in Go, use soup.
For CSS selector support, consider goquery.
For XPath queries, use htmlquery.
For another Beautiful Soup-inspired option, look at Node.

All these libraries leverage Go’s standard HTML parser, which is robust and HTML5-compliant, so the main difference is in API style and selector capabilities.