Table of Contents

Class HtmlHelper

Namespace
SunamoHtml.Html
Assembly
SunamoHtml.dll

EN: Shared HTML helper methods (mix of various utilities - consider splitting into more specific classes). CZ: Sdílené HTML pomocné metody (mix různých utilit - zvažte rozdělení do specifičtějších tříd).

public static class HtmlHelper
Inheritance
HtmlHelper
Inherited Members

Methods

ClearSpaces(string)

Clears all space characters (nbsp and regular spaces) from text.

public static string ClearSpaces(string text)

Parameters

text string

The text to clear spaces from.

Returns

string

Text without spaces.

ConvertHtmlToText(string)

Converts HTML to plain text by decoding HTML entities, replacing BR tags with newlines, and stripping all tags.

public static string ConvertHtmlToText(string htmlContent)

Parameters

htmlContent string

The HTML content to convert.

Returns

string

Plain text without HTML tags.

ConvertTextToHtml(string)

Converts plain text to HTML by replacing newlines with BR tags.

public static string ConvertTextToHtml(string text)

Parameters

text string

The text to convert.

Returns

string

HTML with BR tags instead of newlines.

DeleteAttributesFromAllNodes(IList<HtmlNode>)

Deletes all attributes from all HTML nodes in a list.

public static void DeleteAttributesFromAllNodes(IList<HtmlNode> nodes)

Parameters

nodes IList<HtmlNode>

The list of HTML nodes to remove attributes from.

GetTag(HtmlNode, string)

Returns the first child tag with the specified original name.

public static HtmlNode? GetTag(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The original tag name to search for.

Returns

HtmlNode

First matching child tag or null.

GetTagOfAtribute(HtmlNode, string, string, string)

EN: Returns the first tag with specified name and attribute value. CZ: Vrátí první tag se zadaným názvem a hodnotou atributu.

public static HtmlNode? GetTagOfAtribute(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

Returns

HtmlNode

First matching HTML node or null.

GetTagOfAtributeRek(HtmlNode, string, string, string)

Recursively searches for a tag with specified attribute name and value.

public static HtmlNode? GetTagOfAtributeRek(HtmlNode htmlNode, string nameOfTag, string nameOfAttribute, string valueOfAttribute)

Parameters

htmlNode HtmlNode

The HTML node to search in.

nameOfTag string

The tag name to search for.

nameOfAttribute string

The attribute name to match.

valueOfAttribute string

The attribute value to match.

Returns

HtmlNode

Found HTML node or null.

GetTagsOfAtribute(HtmlNode, string, string, string)

EN: Returns all immediate child tags with specified name and attribute value. CZ: Vrátí všechny přímé podřízené tagy se zadaným názvem a hodnotou atributu.

public static IList<HtmlNode> GetTagsOfAtribute(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

Returns

IList<HtmlNode>

List of matching child HTML nodes.

GetValuesOfStyle(HtmlNode)

EN: Parses the style attribute of an HTML node and returns it as a dictionary. CZ: Naparsuje style atribut HTML uzlu a vrátí ho jako slovník.

public static Dictionary<string, string> GetValuesOfStyle(HtmlNode htmlNode)

Parameters

htmlNode HtmlNode

The HTML node to get style values from.

Returns

Dictionary<string, string>

Dictionary with style property names as keys and values as values.

GetWithoutTextNodes(HtmlNode)

Gets all child nodes excluding text nodes.

public static IList<HtmlNode> GetWithoutTextNodes(HtmlNode htmlNode)

Parameters

htmlNode HtmlNode

The HTML node to get children from.

Returns

IList<HtmlNode>

List of non-text child nodes.

HasChildTag(HtmlNode, string)

Checks if an HTML node has a child tag with the specified tag name.

public static bool HasChildTag(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The HTML node to check.

tagName string

The tag name to search for in children.

Returns

bool

True if the node has a child tag with the specified name, false otherwise.

HasTagAttrContains(HtmlNode, string, string, string)

Checks if an HTML node has an attribute whose value, when split by delimiter, contains the specified value.

public static bool HasTagAttrContains(HtmlNode htmlNode, string delimiter, string attributeName, string value)

Parameters

htmlNode HtmlNode

The HTML node to check.

delimiter string

The delimiter to split the attribute value by.

attributeName string

The attribute name to check.

value string

The value to search for in the split parts.

Returns

bool

True if the attribute value contains the value after splitting, false otherwise.

HighlightingWords(string, int, int, IList<string>)

EN: Highlights searched words in text content with bold tags, returning sentence snippets. CZ: Zvýrazní hledaná slova v textovém obsahu tučnými tagy, vrátí úryvky vět. Before calling, white space characters must be converted to spaces in the content.

public static string HighlightingWords(string entireContent, int maxLettersPerSentence, int sentenceCount, IList<string> searchedWords)

Parameters

entireContent string

The entire content to search in.

maxLettersPerSentence int

Maximum letters per sentence snippet.

sentenceCount int

Number of sentence snippets to return.

searchedWords IList<string>

List of words to search for and highlight.

Returns

string

HTML string with highlighted words in sentence snippets.

PrepareToAttribute(string)

Prepares text for use in HTML attribute by replacing double quotes with single quotes.

public static string PrepareToAttribute(string text)

Parameters

text string

The text to prepare.

Returns

string

Text with double quotes replaced by single quotes.

RecursiveReturnTagsWithContainsAttr(IList<HtmlNode>, HtmlNode, string, string, string, bool, bool)

EN: Recursively searches for tags with attribute value matching specified criteria. CZ: Rekurzivně vyhledává tagy s hodnotou atributu odpovídající zadaným kritériím. Supports wildcard "*" for tag name to match all tags.

public static void RecursiveReturnTagsWithContainsAttr(IList<HtmlNode> result, HtmlNode htmlNode, string tagName, string attributeName, string attributeValue, bool isContains, bool isRecursively)

Parameters

result IList<HtmlNode>

The result list to add found nodes to.

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for, or "*" for all tags.

attributeName string

The attribute name to check.

attributeValue string

The attribute value to search for.

isContains bool

Whether to use Contains instead of exact match.

isRecursively bool

Whether to search recursively.

RemoveAllTags(string)

EN: Removes all HTML tags from text. Just calls StripAllTags method. CZ: Odstraní všechny HTML tagy z textu. Pouze volá metodu StripAllTags. Replaces every tag <*> with a period. Inner non-XML content is left as is.

public static string RemoveAllTags(string text)

Parameters

text string

The text to remove tags from.

Returns

string

Text without HTML tags.

ReplaceAllFontCase(string)

Replaces all case variations of BR tag with standard lowercased BR tag.

public static string ReplaceAllFontCase(string html)

Parameters

html string

The HTML string to process.

Returns

string

HTML with standardized BR tags.

ReplaceChildNodeByOuterHtml(HtmlNode, string, HtmlNode)

EN: Replaces a child node by matching its OuterHtml with a new node. CZ: Nahradí podřízený uzel porovnáním jeho OuterHtml s novým uzlem.

public static void ReplaceChildNodeByOuterHtml(HtmlNode htmlNode, string oldOuterHtml, HtmlNode newNode)

Parameters

htmlNode HtmlNode

The parent node containing the child to replace.

oldOuterHtml string

The OuterHtml of the child node to replace.

newNode HtmlNode

The new node to replace with.

ReplaceHtmlNonPairTagsWithXmlValid(string)

Replaces non-pair HTML tags with XML-valid equivalents (adds self-closing slash). Problematic with auto translate.

public static string ReplaceHtmlNonPairTagsWithXmlValid(string html)

Parameters

html string

The HTML input string.

Returns

string

HTML with XML-valid non-pair tags.

ReturnAllTags(HtmlNode, params string[])

EN: Returns all child tags matching specified tag names. CZ: Vrátí všechny podřízené tagy odpovídající zadaným názvům tagů.

public static IList<HtmlNode> ReturnAllTags(HtmlNode htmlNode, params string[] tagNames)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagNames string[]

Tag names to search for.

Returns

IList<HtmlNode>

List of matching HTML nodes.

ReturnAllTagsImg(HtmlNode, string)

EN: Returns all immediate child tags matching the specified tag name (non-recursive). CZ: Vrátí všechny přímé podřízené tagy odpovídající zadanému názvu (nerekurzivně). If tag is the specified name, doesn't apply recursion on that.

public static IList<HtmlNode> ReturnAllTagsImg(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for (e.g., img).

Returns

IList<HtmlNode>

List of matching child tags.

ReturnApplyToAllTags(string, string, EditHtmlWidthHandler, string)

EN: Returns HTML with all tags of specified type modified by the handler. CZ: Vrátí HTML se všemi tagy zadaného typu upravenými handlerem. Not suitable for returning content of entire page.

public static string ReturnApplyToAllTags(string text, string tagName, EditHtmlWidthHandler handler, string value)

Parameters

text string

The source code of the entire page.

tagName string

The tag name to search for (div, a, etc.).

handler EditHtmlWidthHandler

The handler method to apply to each tag.

value string

Optional parameter passed to the handler.

Returns

string

Modified HTML content.

ReturnTag(HtmlNode, string)

EN: Returns the first child tag matching the specified tag name. CZ: Vrátí první podřízený tag odpovídající zadanému názvu tagu. Returns null if tag is not found.

public static HtmlNode? ReturnTag(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The parent HTML node to search in.

tagName string

The tag name to search for.

Returns

HtmlNode

First matching HTML node or null.

ReturnTagRek(HtmlNode, object)

Recursively returns the first tag matching specified tag name.

public static HtmlNode ReturnTagRek(HtmlNode htmlNode, object tagName)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName object

The tag name to search for.

Returns

HtmlNode

First matching tag or null.

ReturnTagRek(HtmlNode, string)

EN: Recursively returns the first tag matching specified tag name. CZ: Rekurzivně vrátí první tag odpovídající zadanému názvu tagu.

public static HtmlNode? ReturnTagRek(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

Returns

HtmlNode

First matching tag or null.

ReturnTagWithAttr(HtmlNode, string, string, string)

EN: Returns the first tag with specified attribute name and value. Returns null if not found. CZ: Vrátí první tag se zadaným názvem atributu a hodnotou. Vrátí null pokud není nalezen.

public static HtmlNode? ReturnTagWithAttr(HtmlNode htmlNode, string tag, string attributeName, string value)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tag string

The tag name to search for.

attributeName string

The attribute name to match.

value string

The attribute value to match.

Returns

HtmlNode

First matching HTML node or null.

ReturnTagWithAttrRek(HtmlNode, string, string, string)

EN: Returns the first tag with specified name and attribute value, recursively searching the node tree. CZ: Vrátí první tag se zadaným názvem a hodnotou atributu, rekurzivně prohledá strom uzlů. Returns null if tag is not found.

public static HtmlNode? ReturnTagWithAttrRek(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

Returns

HtmlNode

First matching HTML node or null.

ReturnTags(HtmlNode, string)

EN: Returns all immediate child tags matching the specified tag name (non-recursive). CZ: Vrátí všechny přímé podřízené tagy odpovídající zadanému názvu (nerekurzivně). Wildcard "*" can be passed but wouldn't make much sense.

public static IList<HtmlNode> ReturnTags(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The parent HTML node to search in.

tagName string

The tag name to search for.

Returns

IList<HtmlNode>

List of matching child tags.

ReturnTagsRek(HtmlNode, string)

EN: Returns all tags matching the specified tag name, recursively searching the node tree. CZ: Vrátí všechny tagy odpovídající zadanému názvu tagu, rekurzivně prohledá strom uzlů. Supports wildcard "*" to match all tags.

public static IList<HtmlNode> ReturnTagsRek(HtmlNode htmlNode, string tagName)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for, or "*" for all tags.

Returns

IList<HtmlNode>

List of matching HTML nodes with trimmed text.

ReturnTagsWithAttrRek(HtmlNode, string, string, string)

EN: Returns all tags matching specified name and attribute value, recursively searching the node tree. CZ: Vrátí všechny tagy odpovídající zadanému názvu a hodnotě atributu, rekurzivně prohledá strom uzlů. Supports wildcard "" for tag name to match all tags. Supports wildcard "" for attribute value to match any value.

public static IList<HtmlNode> ReturnTagsWithAttrRek(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for, or "*" for all tags.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match, or "*" for any value.

Returns

IList<HtmlNode>

List of matching HTML nodes.

ReturnTagsWithAttrRek2(HtmlNode, string, string, string)

EN: Returns all tags with specified name and attribute value, recursively searching the node tree. CZ: Vrátí všechny tagy se zadaným názvem a hodnotou atributu, rekurzivně prohledá strom uzlů. Originally from HtmlDocument.

public static IList<HtmlNode> ReturnTagsWithAttrRek2(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

attributeName string

The attribute name to match.

attributeValue string

The attribute value to match.

Returns

IList<HtmlNode>

List of matching HTML nodes.

ReturnTagsWithContainsAttrRek(HtmlNode, string, string, string)

EN: Returns all tags with attribute value containing specified text, recursively searching the node tree. CZ: Vrátí všechny tagy s hodnotou atributu obsahující zadaný text, rekurzivně prohledá strom uzlů. Supports wildcard "*" for tag name to match all tags.

public static IList<HtmlNode> ReturnTagsWithContainsAttrRek(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for, or "*" for all tags.

attributeName string

The attribute name to check.

attributeValue string

The attribute value to search for.

Returns

IList<HtmlNode>

List of matching HTML nodes.

ReturnTagsWithContainsAttrRek(HtmlNode, string, string, string, bool, bool)

EN: Returns all tags with attribute value matching specified criteria, recursively searching the node tree. CZ: Vrátí všechny tagy s hodnotou atributu odpovídající zadaným kritériím, rekurzivně prohledá strom uzlů.

public static IList<HtmlNode> ReturnTagsWithContainsAttrRek(HtmlNode htmlNode, string tagName, string attributeName, string attributeValue, bool isContains, bool isRecursively)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for.

attributeName string

The attribute name to check.

attributeValue string

The attribute value to search for.

isContains bool

Whether to use Contains instead of exact match.

isRecursively bool

Whether to search recursively.

Returns

IList<HtmlNode>

List of matching HTML nodes.

ReturnTagsWithContainsClassRek(HtmlNode, string, string)

EN: Returns all tags with class attribute containing specified class name, recursively searching the node tree. CZ: Vrátí všechny tagy s atributem class obsahujícím zadaný název třídy, rekurzivně prohledá strom uzlů. Supports wildcard "*" for tag name to match all tags.

public static IList<HtmlNode> ReturnTagsWithContainsClassRek(HtmlNode htmlNode, string tagName, string className)

Parameters

htmlNode HtmlNode

The HTML node to search in.

tagName string

The tag name to search for, or "*" for all tags.

className string

The class name to search for.

Returns

IList<HtmlNode>

List of matching HTML nodes.

StripAllTags(string)

EN: Strips all HTML tags from text, replacing them with a single space. CZ: Odstraní všechny HTML tagy z textu, nahradí je jednou mezerou.

public static string StripAllTags(string text)

Parameters

text string

The text to strip tags from.

Returns

string

Text without HTML tags.

StripAllTags(string, string)

EN: Strips all HTML tags from text, replacing them with a specified replacement string. CZ: Odstraní všechny HTML tagy z textu, nahradí je zadaným řetězcem.

public static string StripAllTags(string text, string replacement)

Parameters

text string

The text to strip tags from.

replacement string

The string to replace tags with.

Returns

string

Text without HTML tags.

StripAllTagsList(string)

EN: Strips all HTML tags from text and returns individual words as a list. CZ: Odstraní všechny HTML tagy z textu a vrátí jednotlivá slova jako seznam. Use RemoveAllNodes when need to remove also inner HTML.

public static IList<string> StripAllTagsList(string text)

Parameters

text string

The HTML text to process.

Returns

IList<string>

List of words without HTML tags.

StripAllTagsSpace(string)

EN: Strips all HTML tags from text, replacing them with a space. CZ: Odstraní všechny HTML tagy z textu, nahradí je mezerou. Replaces every tag <*> with a space. Inner non-XML content is left as is.

public static string StripAllTagsSpace(string text)

Parameters

text string

The text to strip tags from.

Returns

string

Text without HTML tags.

ToXml(string)

EN: Converts HTML to XML format, removing XML declaration. CZ: Převede HTML do XML formátu, odstraní XML deklaraci. Already calls RemoveXmlDeclaration and ReplaceHtmlNonPairTagsWithXmlValid.

public static string ToXml(string xml)

Parameters

xml string

The HTML content to convert.

Returns

string

XML-formatted content without declaration.

ToXml(string, bool)

EN: Converts HTML to XML format, optionally removing XML declaration. CZ: Převede HTML do XML formátu, volitelně odstraní XML deklaraci. Already calls ReplaceHtmlNonPairTagsWithXmlValid.

public static string ToXml(string xml, bool isRemoveXmlDeclaration)

Parameters

xml string

The HTML content to convert.

isRemoveXmlDeclaration bool

Whether to remove the XML declaration.

Returns

string

XML-formatted content.

ToXmlFinal(string)

EN: Converts HTML to final XML format by replacing non-pair tags with XML-valid versions and removing XML declarations. CZ: Převede HTML do finálního XML formátu nahrazením nepárových tagů XML-validními verzemi a odstraněním XML deklarací.

public static string ToXmlFinal(string xml)

Parameters

xml string

The HTML/XML content to convert.

Returns

string

XML with UTF-8 declaration and valid non-pair tags.

TrimNode(HtmlNode)

Trims whitespace from an HTML node's inner content.

public static HtmlNode TrimNode(HtmlNode htmlNode)

Parameters

htmlNode HtmlNode

The HTML node to trim.

Returns

HtmlNode

The trimmed HTML node.

TrimOpenAndEndTags(string, string)

Removes opening and closing tags from HTML string.

public static string TrimOpenAndEndTags(string html, string nameOfTag)

Parameters

html string

The HTML string.

nameOfTag string

The tag name to remove.

Returns

string

HTML without specified opening and closing tags.

TrimTexts(HtmlNodeCollection)

Trims whitespace from all HTML nodes in a collection.

public static IList<HtmlNode> TrimTexts(HtmlNodeCollection htmlNodeCollection)

Parameters

htmlNodeCollection HtmlNodeCollection

The HTML node collection to trim.

Returns

IList<HtmlNode>

List of trimmed HTML nodes.

TrimTexts(IList<HtmlNode>)

Trims whitespace from all HTML nodes in a list, removing text nodes.

public static IList<HtmlNode> TrimTexts(IList<HtmlNode> nodes)

Parameters

nodes IList<HtmlNode>

The list of HTML nodes to trim.

Returns

IList<HtmlNode>

List of trimmed HTML nodes without text nodes.

TrimTexts(IList<HtmlNode>, bool, bool)

Trims whitespace from all HTML nodes in a list, optionally removing text nodes and comments.

public static IList<HtmlNode> TrimTexts(IList<HtmlNode> nodes, bool isRemoveTextNodes, bool isRemoveComments = false)

Parameters

nodes IList<HtmlNode>

The list of HTML nodes to trim.

isRemoveTextNodes bool

Whether to remove text nodes.

isRemoveComments bool

Whether to remove comments.

Returns

IList<HtmlNode>

List of trimmed HTML nodes.