Class HtmlAgilityHelper
- Namespace
- SunamoHtml
- Assembly
- SunamoHtml.dll
HtmlHelperText - for methods which NOT operate on HtmlAgiityHelper! HtmlAgilityHelper - getting new nodes HtmlAssistant - Only for methods which operate on HtmlAgiityHelper!
public class HtmlAgilityHelper
- Inheritance
-
HtmlAgilityHelper
- Inherited Members
- Extension Methods
Fields
TextNode
Constant representing the text node type in HTML DOM.
public const string TextNode = "#text"
Field Value
Methods
CreateHtmlDocument(CreateHtmlDocumentInitData?)
Creates an HTML document with specific initialization options.
public static HtmlDocument CreateHtmlDocument(CreateHtmlDocumentInitData? data = null)
Parameters
dataCreateHtmlDocumentInitDataInitialization data, or null for default settings.
Returns
- HtmlDocument
Configured HTML document instance.
CreateNode(string)
Creates an HTML node from the given HTML string, wrapping non-tag content with spaces.
public static HtmlNode CreateNode(string html)
Parameters
htmlstringThe HTML string to create a node from.
Returns
- HtmlNode
The created HTML node.
FindAncestorParentNode(HtmlNode, string)
Finds an ancestor parent node with the specified tag name.
public static HtmlNode? FindAncestorParentNode(HtmlNode node, string tagName)
Parameters
Returns
- HtmlNode
The ancestor node with matching tag name, or null if not found.
HasAncestorParentNode(HtmlNode, string)
Checks if the node has an ancestor with the specified tag name.
public static bool HasAncestorParentNode(HtmlNode node, string tagName)
Parameters
Returns
- bool
True if an ancestor with the tag name exists, false otherwise.
InsertGroup(HtmlNode, List<string>)
Inserts a group of strings as inner HTML of the specified node, wrapping each string with spaces.
public static void InsertGroup(HtmlNode insertAfter, List<string> list)
Parameters
insertAfterHtmlNodeThe HTML node to insert content into.
listList<string>List of strings to insert as inner HTML.
Node(HtmlNode, bool, string)
Finds the first HTML node matching the specified tag within the given node.
public static HtmlNode? Node(HtmlNode node, bool recursive, string tag)
Parameters
nodeHtmlNodeThe parent HTML node to search within.
recursiveboolWhether to search recursively in child nodes.
tagstringThe HTML tag name to search for.
Returns
- HtmlNode
The first matching HTML node, or null if not found.
NodeWithAttr(HtmlNode, bool, string, string, string, bool)
Return null if not found
public static HtmlNode? NodeWithAttr(HtmlNode node, bool recursive, string tag, string attr, string attrValue, bool contains = false)
Parameters
Returns
Nodes(HtmlNode, bool, string)
Gets all nodes with the specified tag name.
public static List<HtmlNode> Nodes(HtmlNode node, bool isRecursive, string tag)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
Returns
NodesWhichContainsInAttr(HtmlNode, bool, string, string, string, bool)
Finds all HTML nodes where the specified attribute contains the given value.
public static IList<HtmlNode> NodesWhichContainsInAttr(HtmlNode node, bool recursive, string tag, string attr, string attrValue, bool searchAsSingleString = true)
Parameters
nodeHtmlNodeThe parent HTML node to search within.
recursiveboolWhether to search recursively in child nodes.
tagstringThe HTML tag name to search for.
attrstringThe attribute name to check.
attrValuestringThe value to search for within the attribute.
searchAsSingleStringboolWhether to search the attribute value as a single string (true) or split by whitespace (false).
Returns
NodesWithAttr(HtmlNode, bool, string, string, string, bool)
Gets nodes with exact attribute match.
public static IList<HtmlNode> NodesWithAttr(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isContainsboolWhether to use contains matching.
Returns
NodesWithAttrWildCard(HtmlNode, bool, string, string, string, bool)
Gets nodes with attribute matching wildcard pattern.
public static IList<HtmlNode> NodesWithAttrWildCard(HtmlNode node, bool isRecursive, string tag, string attributeName, string attributeValue, bool isContains = false)
Parameters
nodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagstringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value pattern.
isContainsboolWhether to use contains matching.
Returns
PairsDdDt(HtmlNode, bool, Dictionary<string, string>)
Extracts key-value pairs from HTML definition list (DL) by pairing DT (term) and DD (definition) elements.
public static Dictionary<string, string> PairsDdDt(HtmlNode dl, bool recursive, Dictionary<string, string> replaceHtmLForText)
Parameters
dlHtmlNodeThe DL (definition list) HTML node to parse.
recursiveboolWhether to search recursively in child nodes.
replaceHtmLForTextDictionary<string, string>Dictionary of HTML replacements to apply to extracted text.
Returns
- Dictionary<string, string>
Dictionary with DT text as keys and DD text as values.
RecursiveReturnTags(List<HtmlNode>, HtmlNode, bool, bool, string)
Recursively returns HTML tags matching the specified tag name. If single is true, returns only the first match (like Node vs Nodes). Use "*" in parameter to match any tag.
public static void RecursiveReturnTags(List<HtmlNode> result, HtmlNode htmlNode, bool isRecursive, bool isSingle, string tagName)
Parameters
resultList<HtmlNode>The list to add found nodes to.
htmlNodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
isSingleboolWhether to stop after finding first match.
tagNamestringThe tag name to search for, or "*" for any tag.
RecursiveReturnTagsWithContainsAttr(List<HtmlNode>, HtmlNode, bool, string, string, string, bool, bool)
Recursively returns tags with attribute containing specified value.
public static void RecursiveReturnTagsWithContainsAttr(List<HtmlNode> result, HtmlNode htmlNode, bool isRecursive, string tagName, string attributeName, string attributeValue, bool isEnoughContainsAttribute, bool isSearchAsSingleString = true)
Parameters
resultList<HtmlNode>The list to add found nodes to.
htmlNodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagNamestringThe tag name to search for.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isEnoughContainsAttributeboolWhether partial match is sufficient.
isSearchAsSingleStringboolWhether to search as a single string.
RecursiveReturnTagsWithContainsAttr(List<HtmlNode>, HtmlNode, bool, string, string, string, bool, bool, bool)
Recursively returns tags with attribute containing specified value. Use "*" in tagName to return all tags.
public static void RecursiveReturnTagsWithContainsAttr(List<HtmlNode> result, HtmlNode htmlNode, bool isRecursive, string tagName, string attributeName, string attributeValue, bool isWildCard, bool isEnoughContainsAttribute, bool isSearchAsSingleString = true)
Parameters
resultList<HtmlNode>The list to add found nodes to.
htmlNodeHtmlNodeThe HTML node to search in.
isRecursiveboolWhether to search recursively.
tagNamestringThe tag name to search for, or "*" for all tags.
attributeNamestringThe attribute name to match.
attributeValuestringThe attribute value to match.
isWildCardboolWhether to use wildcard matching.
isEnoughContainsAttributeboolWhether partial match is sufficient.
isSearchAsSingleStringboolWhether to search as a single string.
ReplacePlainUriForAnchors(HtmlDocument, string)
Replaces plain URIs in text with HTML anchor tags using the provided HtmlDocument.
[SuppressMessage("Design", "CA1055:UriReturnValuesShouldNotBeStrings")]
public static string ReplacePlainUriForAnchors(HtmlDocument htmlDocument, string html)
Parameters
htmlDocumentHtmlDocumentThe HtmlDocument to use for parsing.
htmlstringThe HTML string to process.
Returns
- string
HTML string with plain URIs converted to anchor tags.
ReplacePlainUriForAnchors(string)
Replaces plain URIs in text with HTML anchor tags.
[SuppressMessage("Design", "CA1055:UriReturnValuesShouldNotBeStrings")]
public static string ReplacePlainUriForAnchors(string html)
Parameters
htmlstringThe HTML string to process.
Returns
- string
HTML string with plain URIs converted to anchor tags.
TrimComments(IList<HtmlNode>)
Removes comment nodes from a list of HTML nodes.
public static IList<HtmlNode> TrimComments(IList<HtmlNode> nodes)
Parameters
Returns
TrimTexts(HtmlNodeCollection)
Removes text nodes from an HTML node collection, keeping everything else.
public static List<HtmlNode> TrimTexts(HtmlNodeCollection htmlNodeCollection)
Parameters
htmlNodeCollectionHtmlNodeCollectionThe HTML node collection to trim.
Returns
TrimTexts(List<HtmlNode>)
Removes text nodes but not comment nodes from a list of HTML nodes.
public static List<HtmlNode> TrimTexts(List<HtmlNode> nodes)
Parameters
Returns
WrapIntoTagIfNot(string, string)
Wraps the input string in an HTML tag if it doesn't already start with a tag.
public static string WrapIntoTagIfNot(string html, string tag = "div")
Parameters
htmlstringThe string to potentially wrap.
tagstringThe HTML tag to use for wrapping (default is div).
Returns
- string
The wrapped HTML string.