Skip to content

Question about parsing nested tables and finding outer elements #468

@atye

Description

@atye

I want to parse only the outer tr and td elements of a table and ignore the inner tables. My method is to find the outer tr elements and with that selection, find the td elements. I should find 2 tr elements and 3 td elements.

In the example below, I don't understand why the last method of using .start > tbody > tr > td in the row selection works to find the 3 outer td elements. Doesn't Find only search descendants? The element with the start class and the tbody element are parents of the row selection, right?

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/PuerkitoBio/goquery"
)

var data = `
<!DOCTYPE html>
<html>
<body>
    <table class="start">
        <tbody>
            <tr>
                <td>test1</td>
                <td>test2</td>
            </tr>
            <tr>
		<td>
                <table>
                    <tbody>
                        <tr>
                           <td>test3</td>
			   <td>test4</td> 
                        </tr>
                        <tr>
                           <td>test5</td>
			   <td>test6</td> 
                        </tr>
                    </tbody>
                </table>
		</td>
            </tr>
        </tbody>
    </table>
</body>
</html>
`

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
	if err != nil {
		log.Fatal(err)
	}

	// find outer tr
	rowSelection := doc.Find(".start > tbody > tr")
	fmt.Println(len(rowSelection.Nodes))

	// finds all td
	colSelection := rowSelection.Find("td")
	fmt.Println(len(colSelection.Nodes))

	// finds all td
	colSelection = rowSelection.Find("tr > td")
	fmt.Println(len(colSelection.Nodes))

	// finds no td
	colSelection = rowSelection.Find("> td")
	fmt.Println(len(colSelection.Nodes))

	// finds outer td
	colSelection = rowSelection.Find(".start > tbody > tr > td")
	fmt.Println(len(colSelection.Nodes))
}
2
7
7
0
3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions