Text Editors

General Info

This material is not required for any particular course, but is posted in case of questions about text editors. In other words, students can safely ignore it - but will potentially benefit from it long term.

Students in STEM1 disciplines need to use text editors routinely. These notes are intended to help understand what kinds of editors are available, which ones to use, and what to learn about each.

Text editors edit plain text files, usually in ASCII or UTF-8 character sets. They handle common editing tasks such as cut, copy and paste. Font choices and display formatting are set by the console or window being used, and cannot be changed within the document (although syntax highlighting is commonly provided to help see structure when the document is a computer program or has other amenable formatting, such as HTML). Multi-platform examples include emacs, vim2, nano, Sublime Text, and gedit. Microsoft's Notepad is a free text editor provided on Windows systems, but is artificially limited in capability. From MS:

    Text files used by Notepad should be no larger than 45K. Notepad 
    cannot open a file that exceeds 54 kilobytes (K) in size and does 
    not allow you to continue editing a file if the file size reaches
    between 45K and 54K. 
Translation: Microsoft wants you to buy something to edit text, and expends effort to make Notepad awkward and unusable for all but the simplest of cases. Actually, they want you to buy something from Microsoft, preferably something that requires paying up every year. MS's VS code is slightly better in that it is multi-platform but it is targeted for software development and less for more general text. VS code also relies on Node.js, which can be a bloated system itself. More deadly, VS Code collects usage data and sends it to Microsoft by default. According to MS, "The data is shared among Microsoft-controlled affiliates and subsidiaries and with law enforcement", which should send a shiver down your spine6.


Streaming text editors

Stream editors allow non-interactive editing, and are particularly useful for files larger than can be handled by user-interaction oriented text editors. sed, awk, tr, and other gag-reflex sounding tools are stream editors. If you need to change a single line of text in the middle of a 150 Gbyte ASCII data file, a stream editor is the only practical tool to use.

An example of a sed command is

    sed 's/[ \t]*$//'
which removes all trailing whitespace (spaces, tabs) from the end of each line in a file. Or a more arcane example is:
    sed -e :a -e 's/^.\{1,79\}$/ &/;ta'
which will right-justify each line in a file on column number 80. Obviously, that sed command is so clear that it requires no explanation. Less sarcastically, although they are powerful tools, only wizards and fools use sed or awk on a file without first making a backup copy in case something goes wrong.

Binary editors

A binary or hex editor is for handling binary files which cannot be interpreted as text files (and so are not really text editors). They can be used to change executable files or binary formatted ones like image files or compiled objects. They are complicated, cryptic, powerful, unforgiving, and dangerous4, which means computer scientists love them. For simply examining binary files without making changes, the Unix programs od, strings, and nm often suffice, e.g., to find in which library file a symbol is defined. Other binary/hex editors include hexedit, beav, bvi, and bless. Those allow manipulating arbitrary binary files; specialized binary files like NetCDF or JSON have editors like nco and jq which are faster and allow better interpetation of the data in the files.

od (for octal dump) will let you examine binary files as octets. Using od -c filename will try to interpret the octets as characters. The command od --strings=2 filename goes further, and only prints out things that have at least 2 consecutive bytes that can be interpreted as ASCII. Try it on a object file from a small function to see what it generates.

Some files are neither fish nor fowl; they might have an ASCII text header followed by binary data. FITS is an example of this, where the usual instructions tell you to create a terminal window exactly 80 characters wide and then view only the first x bytes of the file to read its header. For that, you can use od -c -N x to only display the first x bytes.


Word processors

Word processors edit text but include formatting, font and color specification, and layout tools. Almost always they are WYSIWYG tools. Because they include hidden formatting instructions and non-standard characters, they are less useful for programming. Tools like spell and grammar checking are of little use in programming. While auto-correct errors are amusing in text messages, they are deadly in computer codes.


Text editors for coding

For writing code, CS people should always learn and use a programmable text editor. The popularity of emacs is driven in part because it can be programmed using LISP. In any case, minimal requirements for a coder's text editor include

Regular expressions should be rich enough to allow common tasks like removing all trailing whitespace from lines of code, replacing tabs with spaces, or changing the case of some words. Consider the task of substituting vi with VI everywhere in this page, but without changing the word navigating into naVIgating at the same time7. Regular expressions combined with search-and-replace allow you to specify that in just a few keystrokes. "Regular expressions" occur in at least two major varieties and many sub-species, so beware that one which works in a Perl script may bomb disastrously in a Vim script.

Syntax highlighting is the coder's equivalent of a spell check, allowing spotting common typos and misuse of a language's keywords. At the same time, spell checking is of limited use in coding and most people turn it off if the editor provides it. Automatic completion saves typing, and an adroit choice of variable and function naming can allow you to type just two characters and a tab to spell out a long variable name.

Desirable features include

Some programming languages like Java and Matlab have Integrated Development Environments (IDEs) which include a text editor with all those features. That is fine if all of your work is in that one language, but means learning yet another programming text editor otherwise. A flexible, programmable text editor will provide almost everything an IDE does, but without having to learn a different tool for every language.

An important feature to use with any programming text editors is snippets, which sounds like what a hyperactive foul-tempered Chihuahua does, but instead refers to short stubs for common idioms in a given programming language (or in HTML, or a romance novel). These save typing and can help force discipline where needed. As an example: in Matlab, a plot function usually is usually opened in a new figure and should always be matched by a title , xlabel, and a ylabel. In my editor of choice, typing the word "plot" followed by a backquote will insert the text

    figure
    plot()
    xlabel('')
    ylabel('')
    title('')
and position the insertion point set to be betweeen the parentheses on the line with plot(). So typing 5 characters inserts 46 characters, and it forces me to not forget the required annotations that any plot should have.

Snippet capabilities can be more sophisticated, e.g., the Matlab plot() function commonly takes a triplet: the x-coordinates, the y-coordinates, and a string that specifies line and/or marker types and colors. So a snippet might insert

    plot( , , )
with the cursor positioned just before the first comma. Typing in that insertion point and hitting tab moves you to the next insertion point, just before the second comma. In this way all of the fields can be filled, just using tab to move to each one in turn.


Recommended types of editors

The overwhelming amount of time in coding consists of

  • navigating an existing document looking for a particular part, e.g., where the results of a computation are printed out to a file
  • copying lines or even blocks of code to paste into another location
  • commenting or uncommenting out parts of code, typically by inserting a comment marker in front of the relevant lines.
  • making minor modifications to (parts of) lines
  • Almost no one ever sits down and types in a new code from scratch; instead you start with some existing code and modify it, or start from a template and then you just fill in what you need. That is one reason snippets are so powerful in coding. Your text editor should allow you to do those common tasks rapidly and easily. If you find yourself hitting a down-arrow key repeatedly to get to someplace in a code, you are either using the wrong text editor, or have not learned how to use the one you have effectively.

    Computer scientists should

    1. master at least one high-capability, programmable editor
    2. know a console light-weight editor (nano, jed, ed, ...)
    3. be able to use least one stream editor
    4. be able to use least one hex/binary editor
    The intent is to be able to handle common situations rapidly and efficiently, and to be able to get by on low-end systems or in unusual circumstances. For example, getting a Unix OS installed and running on a low end system generally means performing many tasks before your favorite high-level editor like xemacs can be installed.

    In addition, for most sciences students benefit from knowing

    1. Latex for mathematical and technical documents
    2. MS Word or OpenOffice or LibreOffice Writer for other purposes such as presentations. However, beware that those word-processor based systems will require symbols and fonts that may not be installed everywhere or by default. So you might end up submitting a grant proposal in Egyptian heiroglyphics when it is printed, or with a presentation that looks like Sumerian graffiti3.

    STEM students who are not in computer science should learn 1, 2, 6, and if in an area than involves equations or math symbols, 5. Although some text processors have improved greatly in the past few years, no system can match TeX or LaTeX for displaying math equations and symbols. LaTex also does an outstanding job even for non-mathematical text, but can take a long time to master.


    Recommended high-end programming editors

    Although I personally use vim, that's mostly because I'm older than dirt. CS and other STEM students probably should learn emacs; it's the editor that other people under the age of 60 are most likely to know and be able to help you with. Used over a period of years vim is more efficient if you know touch-typing. It is particularly good at the tasks of navigating rapidly and easily, and intelligently cutting and pasting code. But vim has an infamously steep learning curve and its programmable extensions are not in a standard programming language, the way emacs uses LISP. Most IDEs have "emacs bindings", while fewer have vi bindings. vim and emacs essentially have the same capabilities; when one develops a new capability, it is quickly added to the other.

    Over half of my Ph.D. students switched over to use vim after seeing me use it over time, but that's partly from seeing the speed of someone who has been a touch typist for 50 years and a vi user for 40 years. vim takes many fewer keystrokes in general for navigating documents (paging up/down, searching text, finding features), but it comes with the extra complexity of multi-modal editing (vi has normal, insert, command-line modes, and vim adds on visual, select, and recording modes).

    The Textmate editor on Macs is an excellent high-capability editor, and Sublime Text seems to be a cross-platform version. One problem is that Textmate is a GUI-based editor, which means either using the mouse extensively, or mastering keyboard shortcuts to achieve the same. Sublime Text is not open-source software, which can mean you have professional support, but can also mean that after x years of practice and use the providers go out of business and you have to learn another editor. This is important, because ...


    A stern admonishment

    No matter which programming (or other) editor you use, learn it thoroughly. Over the course of a 4-year undergrad degree it will save you weeks of work. Over a career of years it will save you months of time and effort. Trudging along typing in every character of even a small program will distract and delay you from the real work, which is thinking and creating. Watch an experienced user working with an editor to get an idea of just how much difference it can make. As a student, force yourself to search out and find a new facility in your chosen high-end editor (emacs/vim/TextMate/whatever) every weekend, then use it until it becomes second nature.

    As a grad student I always spent Christmas Day5 reading the manual on vi, and it always resulted in discovering a faster way of doing some task that I had wasted hours on during the previous year. Now go away, and read the manual on whatever text editor you are currently using.


    Footnotes:

    (1) STEM students = students in science, technology, engineering, and mathematics. It's considered a slightly less insulting term than "geek", which originally meant the old winos that circuses hired to eat live snakes, bite the heads off chickens and spray the crowd with their blood, and generally horrify and titillate gawping ignorant audiences. In other words, geeks filled a niche that Internet comment sections now amply occupy.

    (2) vim is "vi improved", and is the implementation that everyone uses. I use the terms "vi" and "vim" interchangeably. vi is provided on all UNIX platforms. vim can be run in "compatible mode", which means it behaves like vi did in 1978. By why would anyone want to do that?

    (3) The technical term for those arcane rectangles that appear when a glyph is not found is .notdef, for "not defined". More commonly, they are called tofu since they resemble the shape of the spackling paste often mistakenly sold as a foodstuff. Google has a project called "Noto" for "no tofu", to provide a glyph for all possible Unicode scripts. You still get something incomprehensible unless you know all 93 scripts Noto supports, but at least they don't all look like white domino pieces.

    (4) Typical criterion for such editors: how much damage can you do to your computer hardware and software by just typing your name in the editor? vi almost always won this contest.

    (5) Otherwise I would spend Christmas Day shouting "Bah! Humbug!" at everyone.

    (6) A shiver not induced by the part about cops, but instead the part about "affiliates", which doesn't specify whether or not they have cloven hooves or forked tails. Nor does MS specify what that "usage" data consists of, which might be everything you write with the editor.

    (7) From a story related by Steve Oualline: A church had just bought their first computer and were learning how to use it. The church secretary decided to set up a form letter to be used in a funeral service. Where the person's name was to be she put in the word "<name>". When a funeral occurred she would change this word to the actual name of the departed. One day, there were two funerals, first for a lady named Mary, then later one for someone named Edna. So the secretary used global replace to change "<name>" to "Mary." So far so good. Next she generated the service for the second funeral by changing the word "Mary" to "Edna." That was a mistake. Imagine the Minister's surprise when he started reading the part containing the Apostle's Creed and saw, "Born of the Virgin Edna."


    Edits to this page:

    • Started: Fri 22 Aug 2014, 11:04 AM
    • Modified: Mon 09 Jan 2017, 11:07 AM correcting typos
    • Modified: Fri 21 Jul 2017, 06:03 PM to make it course-agnostic
    • Last Modified: Mon 11 Nov 2019, 03:55 PM