Table of Contents

Class HtmlAssistant

Namespace
SunamoHtml.Html
Assembly
SunamoHtml.dll

Helper class with various HTML manipulation methods (parsing, attribute handling, HTML decoding, etc.). Note: This is a mix of various HTML utilities - consider splitting into more specific classes.

public static class HtmlAssistant
Inheritance
HtmlAssistant
Inherited Members

Methods

AttrsValues(IList<HtmlNode>, string)

Gets attribute values from a list of HTML nodes.

public static IList<string> AttrsValues(IList<HtmlNode> anchors, string attributeName)

Parameters

anchors IList<HtmlNode>

List of HTML nodes.

attributeName string

The attribute name to get values for.

Returns

IList<string>

List of attribute values.

GetAnyHeader(HtmlNode, bool, bool)

Gets any header element (H1-H6) from the document.

public static IList<HtmlNode> GetAnyHeader(HtmlNode node, bool isRecursive, bool isStopAfterFirst)

Parameters

node HtmlNode

The HTML node to search in.

isRecursive bool

Whether to search recursively.

isStopAfterFirst bool

Whether to stop after finding the first header.

Returns

IList<HtmlNode>

List of found header nodes.

GetAttributesPairs(string)

Parses HTML attributes from text into a dictionary. If text doesn't contain HTML tags, wraps it in an img tag first.

public static Dictionary<string, string> GetAttributesPairs(string text)

Parameters

text string

The HTML text or attributes string.

Returns

Dictionary<string, string>

Dictionary of attribute name-value pairs.

GetValueOfAttribute(string, HtmlNode, bool)

Gets the value of an HTML attribute from a node. Returns empty string if attribute is not found. Returns "(null)" when attribute exists without a value (e.g., input readonly).

public static string GetValueOfAttribute(string attributeName, HtmlNode node, bool isTrim = false)

Parameters

attributeName string

The name of the attribute to get.

node HtmlNode

The HTML node to get the attribute from.

isTrim bool

Whether to trim the attribute value.

Returns

string

Attribute value, empty string if not found, or "(null)" if attribute exists without value.

HtmlDecode(string)

Decodes HTML-encoded text.

public static string HtmlDecode(string text)

Parameters

text string

The HTML-encoded text.

Returns

string

Decoded text.

InnerContentWithAttr(HtmlNode, bool, string, string, string, bool, bool)

Core method for getting inner content (HTML or text) of a node matching attribute criteria.

public static string InnerContentWithAttr(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isHtml, bool isContains = false)

Parameters

node HtmlNode

The HTML node to search in.

isRecursive bool

Whether to search recursively.

tag string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

isHtml bool

True to return InnerHtml, false to return InnerText.

isContains bool

Whether to use contains matching for attribute value.

Returns

string

HTML-decoded and trimmed content, or empty string if not found.

InnerHtml(HtmlNode, bool, string)

Gets the inner HTML of a child node with specified tag.

public static string InnerHtml(HtmlNode node, bool isRecursive, string tag)

Parameters

node HtmlNode

The parent HTML node to search in.

isRecursive bool

Whether to search recursively.

tag string

The tag name to search for.

Returns

string

Inner HTML of found node, or empty string if not found.

InnerHtmlWithAttr(HtmlNode, bool, string, string, string, bool)

Gets the inner HTML of a node that matches specified tag and attribute criteria.

public static string InnerHtmlWithAttr(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)

Parameters

node HtmlNode

The HTML node to search in.

isRecursive bool

Whether to search recursively.

tag string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

isContains bool

Whether to use contains matching for attribute value.

Returns

string

HTML-decoded and trimmed inner HTML, or empty string if not found.

InnerText(HtmlNode, bool, string)

Gets the inner text of a child node with specified tag.

public static string InnerText(HtmlNode node, bool isRecursive, string tag)

Parameters

node HtmlNode

The parent HTML node to search in.

isRecursive bool

Whether to search recursively.

tag string

The tag name to search for.

Returns

string

Inner text of found node, or empty string if not found.

InnerText(HtmlNode, bool, string, string, string, bool)

Gets the inner text of a node that matches specified tag and attribute criteria.

public static string InnerText(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)

Parameters

node HtmlNode

The HTML node to search in.

isRecursive bool

Whether to search recursively.

tag string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

isContains bool

Whether to use contains matching for attribute value.

Returns

string

HTML-decoded and trimmed inner text, or empty string if not found.

InnerTextDecodeTrim(HtmlNode)

Gets the decoded and trimmed inner text from an HTML node.

public static string InnerTextDecodeTrim(HtmlNode node)

Parameters

node HtmlNode

The HTML node.

Returns

string

Cleaned and decoded inner text.

InnerTextDecodeTrim(string)

Decodes and trims inner text, replacing whitespace characters and double spaces.

public static string InnerTextDecodeTrim(string result)

Parameters

result string

The inner text to process.

Returns

string

Cleaned and decoded text.

ParseInnerTextOfEveryTd(HtmlNode)

Parses the inner text of every TD element in a table row.

public static IList<string> ParseInnerTextOfEveryTd(HtmlNode tr)

Parameters

tr HtmlNode

The table row (TR) HTML node.

Returns

IList<string>

List of trimmed inner text values from all TD elements.

RemoveAllAttrs(HtmlNode)

Removes all attributes from an HTML node and replaces it with a clean version.

public static HtmlNode RemoveAllAttrs(HtmlNode node)

Parameters

node HtmlNode

The HTML node to remove attributes from.

Returns

HtmlNode

The new clean node that replaced the original.

RemoveComments(HtmlNode)

Removes all HTML comment nodes from the given node and its children recursively.

public static void RemoveComments(HtmlNode node)

Parameters

node HtmlNode

The HTML node to remove comments from.

RemoveStyleTagsText(string)

Removes all style tags from HTML text.

public static string RemoveStyleTagsText(string html)

Parameters

html string

The HTML text to process.

Returns

string

HTML with all style tags removed.

SetAttribute(HtmlNode, string, string)

Sets an attribute on an HTML node, removing any existing attributes with the same name first.

public static void SetAttribute(HtmlNode node, string attributeName, string value)

Parameters

node HtmlNode

The HTML node to set the attribute on.

attributeName string

The name of the attribute.

value string

The value for the attribute.

SplitByBr(string)

Splits HTML input by BR tags.

public static IList<string> SplitByBr(string html)

Parameters

html string

The HTML input to split.

Returns

IList<string>

List of HTML segments split by BR tags.

SplitByTag(string, string)

Splits HTML input by specified tag. Converts non-pair tags to XML-valid format before splitting.

public static IList<string> SplitByTag(string html, string tagName)

Parameters

html string

The HTML input to split.

tagName string

The tag name to split by.

Returns

IList<string>

List of HTML segments split by the specified tag.

TrimInnerHtml(string)

Trims the inner HTML of all elements in the HTML value.

public static string TrimInnerHtml(string value)

Parameters

value string

The HTML string to process.

Returns

string

HTML with trimmed inner HTML for all elements.