I'm extracting text from HTML (more on how - here), and after I extract text it has two problems:
- Lot's of $c(10) control characters
- Multiple whitespaces
Here's an example of the text extracted from HTML page:
InterSystems ObjectScript is a scripting language to operate with data using any data model of InterSystems Data Platform (Objects, Relational, Key-Value, Document, Globals) and to develop business logic for serverside applications on InterSystems Data Platform.