Class HtmlAssistant
- Namespace
- SunamoHtml.Html
- Assembly
- SunamoHtml.dll
Helper class with various HTML manipulation methods (parsing, attribute handling, HTML decoding, etc.). Note: This is a mix of various HTML utilities - consider splitting into more specific classes.
public static class HtmlAssistant
- Inheritance
-
HtmlAssistant
- Inherited Members
Methods
AttrsValues(IList<HtmlNode>, string)
Gets attribute values from a list of HTML nodes.
public static IList<string> AttrsValues(IList<HtmlNode> anchors, string attributeName)
Parameters
anchorsIList<HtmlNode>List of HTML nodes.
attributeNamestringThe attribute name to get values for.
Returns
GetAnyHeader(HtmlNode, bool, bool)
Gets any header element (H1-H6) from the document.
public static IList<HtmlNode> GetAnyHeader(HtmlNode node, bool isRecursive, bool isStopAfterFirst)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
isStopAfterFirstboolWhether to stop after finding the first header.
Returns
GetAttributesPairs(string)
Parses HTML attributes from text into a dictionary. If text doesn't contain HTML tags, wraps it in an img tag first.
public static Dictionary<string, string> GetAttributesPairs(string text)
Parameters
textstringThe HTML text or attributes string.
Returns
- Dictionary<string, string>
Dictionary of attribute name-value pairs.
GetValueOfAttribute(string, HtmlNode, bool)
Gets the value of an HTML attribute from a node. Returns empty string if attribute is not found. Returns "(null)" when attribute exists without a value (e.g., input readonly).
public static string GetValueOfAttribute(string attributeName, HtmlNode node, bool isTrim = false)
Parameters
attributeNamestringThe name of the attribute to get.
nodeHtmlNodeThe HTML node to get the attribute from.
isTrimboolWhether to trim the attribute value.
Returns
- string
Attribute value, empty string if not found, or "(null)" if attribute exists without value.
HtmlDecode(string)
Decodes HTML-encoded text.
public static string HtmlDecode(string text)
Parameters
textstringThe HTML-encoded text.
Returns
- string
Decoded text.
InnerContentWithAttr(HtmlNode, bool, string, string, string, bool, bool)
Core method for getting inner content (HTML or text) of a node matching attribute criteria.
public static string InnerContentWithAttr(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isHtml, bool isContains = false)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isHtmlboolTrue to return InnerHtml, false to return InnerText.
isContainsboolWhether to use contains matching for attribute value.
Returns
- string
HTML-decoded and trimmed content, or empty string if not found.
InnerHtml(HtmlNode, bool, string)
Gets the inner HTML of a child node with specified tag.
public static string InnerHtml(HtmlNode node, bool isRecursive, string tag)
Parameters
nodeHtmlNodeThe parent HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
Returns
- string
Inner HTML of found node, or empty string if not found.
InnerHtmlWithAttr(HtmlNode, bool, string, string, string, bool)
Gets the inner HTML of a node that matches specified tag and attribute criteria.
public static string InnerHtmlWithAttr(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isContainsboolWhether to use contains matching for attribute value.
Returns
- string
HTML-decoded and trimmed inner HTML, or empty string if not found.
InnerText(HtmlNode, bool, string)
Gets the inner text of a child node with specified tag.
public static string InnerText(HtmlNode node, bool isRecursive, string tag)
Parameters
nodeHtmlNodeThe parent HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
Returns
- string
Inner text of found node, or empty string if not found.
InnerText(HtmlNode, bool, string, string, string, bool)
Gets the inner text of a node that matches specified tag and attribute criteria.
public static string InnerText(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isContainsboolWhether to use contains matching for attribute value.
Returns
- string
HTML-decoded and trimmed inner text, or empty string if not found.
InnerTextDecodeTrim(HtmlNode)
Gets the decoded and trimmed inner text from an HTML node.
public static string InnerTextDecodeTrim(HtmlNode node)
Parameters
nodeHtmlNodeThe HTML node.
Returns
- string
Cleaned and decoded inner text.
InnerTextDecodeTrim(string)
Decodes and trims inner text, replacing whitespace characters and double spaces.
public static string InnerTextDecodeTrim(string result)
Parameters
resultstringThe inner text to process.
Returns
- string
Cleaned and decoded text.
ParseInnerTextOfEveryTd(HtmlNode)
Parses the inner text of every TD element in a table row.
public static IList<string> ParseInnerTextOfEveryTd(HtmlNode tr)
Parameters
trHtmlNodeThe table row (TR) HTML node.
Returns
RemoveAllAttrs(HtmlNode)
Removes all attributes from an HTML node and replaces it with a clean version.
public static HtmlNode RemoveAllAttrs(HtmlNode node)
Parameters
nodeHtmlNodeThe HTML node to remove attributes from.
Returns
- HtmlNode
The new clean node that replaced the original.
RemoveComments(HtmlNode)
Removes all HTML comment nodes from the given node and its children recursively.
public static void RemoveComments(HtmlNode node)
Parameters
nodeHtmlNodeThe HTML node to remove comments from.
RemoveStyleTagsText(string)
Removes all style tags from HTML text.
public static string RemoveStyleTagsText(string html)
Parameters
htmlstringThe HTML text to process.
Returns
- string
HTML with all style tags removed.
SetAttribute(HtmlNode, string, string)
Sets an attribute on an HTML node, removing any existing attributes with the same name first.
public static void SetAttribute(HtmlNode node, string attributeName, string value)
Parameters
nodeHtmlNodeThe HTML node to set the attribute on.
attributeNamestringThe name of the attribute.
valuestringThe value for the attribute.
SplitByBr(string)
Splits HTML input by BR tags.
public static IList<string> SplitByBr(string html)
Parameters
htmlstringThe HTML input to split.
Returns
SplitByTag(string, string)
Splits HTML input by specified tag. Converts non-pair tags to XML-valid format before splitting.
public static IList<string> SplitByTag(string html, string tagName)
Parameters
Returns
TrimInnerHtml(string)
Trims the inner HTML of all elements in the HTML value.
public static string TrimInnerHtml(string value)
Parameters
valuestringThe HTML string to process.
Returns
- string
HTML with trimmed inner HTML for all elements.