Article
· Feb 24 2m read

The bewitched line terminator

I want to address the nasty problems about reading a flat text in ASCII, UTF*
explicitly excluding HTML, EBCDIC, and other encoding.
According to Wikipedia there are at least 8 variations of control characters.

  • CR+LF is typical for Windows
  • LF is typical for the Linux/UNIX world
  • CR is Mac's favorite

As you can deduct from the names the inspiration comes from mechanical typewriters.

In IRIS* similar as in Caché or Ensemble or ... the classes %Stream* and %File* offer
the same property with the same default 

• property LineTerminator as %String(MAXLEN=10) [ InitialExpression = $get(^%SYS("Stream","LineTerminator"),$select($$$isUNIX:$char(10),1:$char(13,10))),Transient ];

This is quite comfortable. as long as your file comes from the same OS you work on.
It becomes more tricky if you don't set the LineTerminator correctly.

  • reading text from Linux/Unix (LF) in Windows (CRLF) is no issue
  • but Windows (CRLF) to Linux/Unix (LF) leaves the final CR at the end of your line

How can this happen ? 

  • If you receive your file over the network from somewhere
  • If you work with Docker Desktop on Windows and use some mapped directory
    • the default outside the container is (CRLF) while inside it is (LF) only
    • I just met this issue recently resulting in false text analysis.

A different hidden hurdle with Docker containers in this constellation is bash

Inside your container, you are in UBUNTU and the command shell is bash
Using volume mapping you can have full access to an (outside) directory in Windows
Storing a shell script there might use (CRLF) as a line terminator.

bash doesn't like this at all.
It expects straight (LF) and the remaining (CR) causes all kinds of nonsense.

Example:
cp -v /home/irisowner/dev/somefile.yml /user/irissys/mgr/

  • works as expected in a Linux-terminated shell script
  • produces sick filename '$'\r'  in a script with Windows terminators

How do I know what Line Terminator I got ?  - Tutorial

  1. I open my file to Read with an Undefined record length
    • set file="/home/irisowner/dev/iknow.yaml"
    • open file:"RU":0 if '$TEST quit  ;open failed
    •  
  2. Then I fetch something most likely longer than a line
    • read line#5000
    •  
  3. Next, we count (CR) and (LF)
    • set cr=$Length(line,$char(13))
    • set lf=$Length(line,$char(10)) 
    •  
  4. When you see the same number of (CR) as (LF) you got  a Windows text
    • more (LF) than (CR) indicates Linux/Unix formatted test
    • more (CR) than (LF)  uncovers a Mac-formatted text
    •  
  5. This is just a first check and doesn't protect you from exotic structures

I hope this helps you in some way.
I wasn't aware before and the sensibility of bash kept me busy for quite a while to understand it. 

Discussion (0)1
Log in or sign up to continue