Introduction
When working in the software industry, no matter you are a software engineer, a data scientist, a support engineer, or any other role, you probably need to know some basic skills about Bash to improve your productivity. This can be useful for automating complex tasks in your terminal, generating the file for configuration or documentation, sharing commands with your teammates, etc. In this post, we are going to explore some frequently used techniques about string operations.
This article is written in macOS 11.6 and the default Bash environment (more details are described in the command below):
$ bash -version && sw_vers
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin20)
Copyright (C) 2007 Free Software Foundation, Inc.
ProductName: macOS
ProductVersion: 11.6
BuildVersion: 20G165
After reading this article, you will understand:
- How to declare a variable?
- How to remove a substring using shell parameter expansion?
- How to replace a substring using shell parameter expansion?
- How to determine string value using regular expressions in if-statement?
- How to manipulate streams using different commands?
- How to go further from here?
Now, let’s get started!
Declaring Variables
Declare a single line variable. To declare a single line variable, just
declare a variable, followed by an equal sign (=
) for the assignment, and ends
with the value of the variable, either using a string directly, or using a
command inside a subshell ($(...)
):
content="Hello Bash"
creation_date=$(date +"%Y-%m-%d") # 2012-12-04
Declare a multi-line text. A heredoc is a special-purpose code block that tells the shell to read input from the current source until it encounters a line containing a delimiter. EOF (end of file) is a commonly used delimiter but it’s not mandatory. You can use JSON, YAML, TEXT, or any other delimiter that you think is relevant to your situation. The syntax for Heredoc in Bash is:
COMMAND << DELIMITER
Here is the long description
...
DELIMITER
Here is an example for generating a YAML file, where we generate the heredoc and print it inside a subshell and assign the result to a variable:
content=$(cat << YAML
cluster:
type: $TYPE
name: elasticsearch-$TYPE
date: $(date +"%Y-%m-%d")
YAML
)
Substring Removal
Removing a substring can be done inside a shell parameter expansion ${param}
.
Using character #
can delete a prefix of the string and using character %
can delete a suffix of the string. Using the same characters once or twice will
delete the shortest and the longest match respectively. To better remember this,
look at your ISO layout keyboard:
# 3
, $ 4
, % 5
In a standard keyboard layout, keys 3/4/5 represent #
/$
/%
. Since #
is
before $
and %
is after $
, deletion with #
happens from the front of a string
($
), and deletion with %
happens from the end of a string ($
). Here are some
concrete examples:
Deleting the shortest match from the front of a string:
d=2021-12-04
echo ${d#*-} # 12-04
Deleting the longest match from the front of a string:
d=2021-12-04
echo ${d##*-} # 04
Deleting the shortest match from the back of a string:
d=2021-12-04
echo ${d%-*} # 2021-12
Deleting the longest match from the back of a string:
d=2021-12-04
echo ${d%%-*} # 2021
Substring Replacement
Replace first match of $substring
with $replacementi
:
${string/substring/replacement}
Replace all matches of $substring
with $replacement
:
${string//substring/replacement}
Examples:
d=2021-12-04
echo ${d/-/_} # 2021_12-04
d=2021-12-04
echo ${d//-/_} # 2021_12_04
If Statement
Normal regular expression:
if [[ "$date" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]
Negate regular expression:
# inside
if [[ ! "$date" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]
# outside
if ! [[ "$date" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]
Contain keyword using ==
, e.g. word “2021”:
if [[ "$date" == *"2021"* ]]
Stream
To manipulate streams (stdin
, stdout
, stderr
) in Bash, you can use a pipe
(|
) followed by a command
after your stream to filter, update, or collect
information. This can be achieved using grep
, sed
, cut
, xargs
, or other
commands. Here is the syntax:
my_stream | <command>
Here are some examples using “users.csv”:
$ cat users.csv
"User A","10","Paris"
"User B","20","London"
"User C","30","New York"
"User d","40","Toulouse"
Find users in Paris:
$ cat users.csv | grep Paris
"User A","10","Paris"
Remove quotes ("
):
$ cat users.csv | sed 's/"//g'
User A,10,Paris
User B,20,London
User C,30,New York
User d,40,Toulouse
Remove user with incorrect format, e.g. ID written in lower case:
$ cat users.csv | sed '/User [[:lower:]]/d'
"User A","10","Paris"
"User B","20","London"
"User C","30","New York"
Cut column 2 and only keep columns 1 and 3:
$ cat users.csv | cut -f 1,3 -d ,
"User A","Paris"
"User B","London"
"User C","New York"
"User d","Toulouse"
… to learn more about the syntax of your target command, use man
command or
use the help option -h
:
# man cut
# man grep
# man sed
# ...
man <command>
# cut -h
# grep -h
# sed -h
<command> -h
Going Further
- To know more techniques about manipulating strings, visit Advanced Bash-Scripting Guide: Chapter 10. Manipulating Variables in The Linux Documentation Project (TLDP).
Conclusion
In this article, we discussed different string operations in Bash, including variable declaration, substring removal, built-in regular expression for if-statement, manipulating streams (filter, update, collect). The source code is also available on GitHub. Interested to know more? You can subscribe to the feed of my blog, follow me on Twitter or GitHub. Hope you enjoy this article, see you the next time!
References
- Kewei Shang & Mincong Huang, “Bash | Tech Resources”, GitHub, 2020.
- Vivek Gite, “How to find out macOS version information from Terminal command prompt”, CyberCiti, 2021.
- “3.5.3 Shell Parameter Expansion”, GNU, 2021.
- “Manipulating Strings”, The Linux Documentation Project (TLDP), 2021.