Skip to main content
  1. Tutorials/

🚧 Remove non-printable characters from a string in Go

·2 mins

When working with external files or user input, it is often a good idea to remove invisible characters that can cause problems. These characters are “non-printable” - they do not occupy a space in printing and fall under the Other or Separator category in the Unicode standard. For example, non-printable are:

  • Whitespaces (except the ASCII space character)
  • Tabs
  • Line breaks
  • Carriage returns
  • Control characters

To remove non-printable characters from a string in Go, you should iterate over the string and check if a given rune is printable using the unicode.IsPrint() function. If not, the rune should be ignored, otherwise it should be added to the new string.

Instead of iterating and manually creating a new string in the for loop, you can use the strings.Map(), which returns a copy of the string with all characters modified according to the mapping function. The best part is that the character is dropped if the mapping function returns a negative value for a given rune. So, we can return -1 for a non-printable character, and an unmodified rune if the unicode.IsPrint() returns true. See the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package main

import (
    "fmt"
    "strings"
    "unicode"
)

func main() {
    text := "b\u00a0e\u200bhind\n"

    fmt.Println(text)
    fmt.Println(len(text))
    fmt.Println("---")

    text = strings.Map(func(r rune) rune {
        if unicode.IsPrint(r) {
            return r
        }
        return -1
    }, text)

    fmt.Println(text)
    fmt.Println(len(text))
}

Output

b e​hind

12
---
behind
6

The unicode.IsPrint() returns true for:

  • letters
  • marks
  • numbers
  • punctuation
  • symbols
  • the ASCII space character

There is also a function unicode.IsGraphic(), that works almost the same, except that it returns true for all space characters in the category Zs of the Unicode standard.