Does anyone know how to strip Word or other word processing formatting from an HTML text area? I can use the Replace function for each ASCII character but there has to be an easier way.
Announcement
Collapse
No announcement yet.
Special Character Formatting
Collapse
X
-
Whenever I copy some text from HTML etc I usually paste it in NotePad and then copy again to clip board from notepad.
This strips unwanted formatting for me.
Hope this helps.
-
I actually do the same thing JoeP does... seems the quickest way to me without stripping stuff you don't want.
Unless you need to replace the stuff when retrieving it from a file dynamically - in which case you will want to use the Replace() function. If the latter is the case, I agree with Dave - have any examples?
Here's the basic idea though:
Code:myString = Replace(Replace(Replace(Replace(myString,"<",""),">",""),chr(34),""),chr(39),"")
Last edited by whammy; Jun 17, 2002, 07:24 PM.Former ASP Forum Moderator - I'm back!
If you can teach yourself how to learn, you can learn anything. ;)
Comment
-
Special Characters
Trying to strip Word formatting from a cut and paste before it gets to the database. The formatting could be tabs, symbols, international alphabet characters, etc. Anything that could be cut and pasted into a text area from a word processor.
Comment
-
Instead of trying to identify all the unwanted characters, as in the above example using replace() identify the one's you do want instead. It's much easier to define what you want than trying to define all the other possible characters that you don't want.
Comment
-
Yeah... maybe instead of using Replace(), you could also just use a regular expression that contains the characters that are acceptable to you, and match the whole string against that, like:
myRegExp = new RegExp
With myRegExp
.Pattern = "\w\s"
.IgnoreCase = true
.Global = True
End with
If myRegExp.test(MyString) = False Then
myStringError = True
End If
I haven't tested that...
Or, using another method (not NEARLY as elegant), you could make a string of characters that are acceptable, like:
myAcceptableCharacters = ".|,|A|B|C|D|"
etc...
And loop through the string you're checking to see if the current character is in the string (say using a variable like CurrentCharacter), like:
If InStr(myAcceptableCharacters, CurrentCharacter) = False Then MyError = TrueLast edited by whammy; Jun 21, 2002, 07:32 PM.Former ASP Forum Moderator - I'm back!
If you can teach yourself how to learn, you can learn anything. ;)
Comment
-
Heh... that's definitely typical "Word" HTML formatting. YECH.
HTML TIDY (or the plugin HTML TIDY that comes with HTML KIT) claims to strip all of the "Word" formatting from a WORD-->HTML page, but from what I've seen it strips almost everything, lol.
I'm not sure how to overcome the obstacle of someone potentially pasting "Word" characters in a textarea, without using a regular expression or function of some sort.
You might be better off, if you're not comfortable using regular expressions, to let them know they need to paste from NotePad?Former ASP Forum Moderator - I'm back!
If you can teach yourself how to learn, you can learn anything. ;)
Comment
Comment