I have snippets of Html stored in a table. Not entire pages, no tags or the like, just basic formatting.
I would like to be able to display that Html as text only, no formatting, on a given page (actually just the first 30 - 50 characters but that's the easy bit).
How do I place the "text" within that Html into a string as straight text?
So this piece of code.
<b>Hello World.</b><br/><p><i>Is there anyone out there?</i><p>
Hello World. Is there anyone out there?
If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like
<script> tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:
If you do have to worry about
<script> tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.
If you can use regular expressions there are many web pages out there with good info:
If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.
var plainText = HtmlUtilities.ConvertToPlainText(string html);
Feed it an HTML string like
And you'll get a plain text result like: