Table of Contents

Class HtmlTableParser

Namespace
SunamoHtml.Html
Assembly
SunamoHtml.dll

EN: Parser for HTML tables into 2D string array with colspan support. CZ: Parser HTML tabulek do 2D string pole s podporou colspan. Row/column indexing.

public sealed class HtmlTableParser
Inheritance
HtmlTableParser
Inherited Members
Extension Methods

Constructors

HtmlTableParser(HtmlNode, bool)

Initializes a new instance by parsing an HTML table node.

public HtmlTableParser(HtmlNode html, bool isIgnoreFirstRow)

Parameters

html HtmlNode

The HTML table node to parse.

isIgnoreFirstRow bool

Whether to ignore the first row (typically headers).

Properties

ColumnCount

Gets the number of columns in the parsed table.

public int ColumnCount { get; }

Property Value

int

Data

EN: The parsed table data. If an element contains null, it was a colspan cell. CZ: Naparsovaná tabulková data. Pokud prvek obsahuje null, jednalo se o colspan buňku.

[SuppressMessage("Performance", "CA1819")]
public string[][] Data { get; set; }

Property Value

string[][]

RowCount

Gets the number of rows in the parsed table.

public int RowCount { get; }

Property Value

int

Methods

ColumnValues(int, bool, bool, bool)

Gets all values from a specific column by index.

public IList<string> ColumnValues(int columnIndex, bool isNormalizeValuesInColumn, bool isRemoveAlsoInnerHtmlOfSubNodes, bool isSkipFirstRow)

Parameters

columnIndex int

The zero-based column index.

isNormalizeValuesInColumn bool

Whether to normalize values by removing HTML.

isRemoveAlsoInnerHtmlOfSubNodes bool

Whether to remove inner HTML of sub nodes.

isSkipFirstRow bool

Whether to skip the first row.

Returns

IList<string>

List of column values.

ColumnValues(string, bool, bool)

Gets all values from a column by column name (found in first row).

public IList<string> ColumnValues(string columnName, bool isNormalizeValuesInColumn, bool isRemoveAlsoInnerHtmlOfSubNodes)

Parameters

columnName string

The column name to search for in the first row.

isNormalizeValuesInColumn bool

Whether to normalize values by removing HTML.

isRemoveAlsoInnerHtmlOfSubNodes bool

Whether to remove inner HTML of sub nodes.

Returns

IList<string>

List of column values.

NormalizeValuesInColumn(IList<string>, bool)

Normalizes values in a column by removing HTML tags and decoding HTML entities.

public static void NormalizeValuesInColumn(IList<string> chars, bool isRemoveAlsoInnerHtmlOfSubNodes)

Parameters

chars IList<string>

List of column values to normalize.

isRemoveAlsoInnerHtmlOfSubNodes bool

Whether to remove inner HTML of sub nodes.