The Command Line is Your Friend: A Quick Introduction

The command line can be a scary place for people who are traditionally accustomed to using point-and-click mechanisms for executing tasks on their computer. While the idea of interacting with files and software via text may seem like a terrifying concept, the terminal is a powerful tool that can boost productivity and provide users with greater control of their system. For data analysts, the command line provides tools to perform a wide array of tasks, including file explanation and exploratory data analysis. Getting accustomed with these capabilities will enable users to become more competent in their interactions with the computer.

Working Directory:

The working directory refers to the folder or files that are currently being utilized. This is usually expressed as a hierarchical path and can be found using the pwd (‘print working directory’) command. The working directory can be changed from the command line using the cd (‘change directory’) command. Once a working directory has been set, use ls to list the contents of the current directory.

$ pwd
/Users/abraham.mathew
$ cd /Users/abraham.mathew/Movies/
$ ls
DDC - Model Visits.xlsx                    ILM Leads.xlsx
DDC - Page Type Views.xlsx               OBI Velocity-Day Supply.xlsx
...

Files and Folders:

The command line offers numerous tools for interacting with files and folders. For example, the mkdir (‘make directory’) command can be used to create an empty directory. Commands like mv and cp can then be used to rename files or copy the file into a new location. One can use the rm command to delete a file and rmdir to delete a directory.

$ mkdir Test_Dir_One
$ mkdir Test_Dir_Two
$ cp history.txt history_new.txt
cp: history.txt: No such file or directory
$ history > history.txt
$ cp history.txt history_new.txt
$ ls
...
$ cp history.txt /Users/abraham.mathew/movies/history_new_two.txt
$ pwd
/Users/abraham.mathew/Movies
$ rm history_new.txt
$ rmdir Test_Dir_Two

Interacting with Files:

The head and tail commands can be used to print the beginning and ending contents of a text or csv file. Furthermore, use the wc (‘word count’) command to find the numbers of lines, words, and characters in a file. The grep command can be used to find certain elements within a file using regular expressions. To combine files side by side, one can use the paste command. Cat, which is typically used to print out the contents of a file, can also be used to concatenate a number of files together.

$ head -n 5 Iris_Data.csv
,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
$ head -n 5 Iris_Data.csv > Iris_Subset_One.txt
$ tail -n 5 Iris_Data.csv > Iris_Subset_two.txt
$ wc Iris_Data.csv
     151     151    4209 Iris_Data.csv
$ wc -l Iris_Data.csv
     151 Iris_Data.csv
$ grep "setosa" Iris_Data.csv | wc -l
      50
$ ls -l | grep "Iris"
-rw-r--r--   1 abraham.mathew  1892468438     4209 Nov  3 15:23 Iris_Data.csv
-rw-r--r--   1 abraham.mathew  1892468438      784 Nov  3 15:48 Iris_Subset.csv
-rw-r--r--   1 abraham.mathew  1892468438      157 Nov  3 21:37 Iris_Subset_One.txt
-rw-r--r--   1 abraham.mathew  1892468438      140 Nov  3 21:37 Iris_Subset_two.txt
$ paste Iris_Subset_One.txt Iris_Subset_Two.txt
,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species     146,6.7,3,5.2,2.3,virginica
1,5.1,3.5,1.4,0.2,setosa     147,6.3,2.5,5,1.9,virginica
2,4.9,3,1.4,0.2,setosa     148,6.5,3,5.2,2,virginica
3,4.7,3.2,1.3,0.2,setosa     149,6.2,3.4,5.4,2.3,virginica
4,4.6,3.1,1.5,0.2,setosa     150,5.9,3,5.1,1.8,virginica
$ cat Iris_Subset_One.txt Iris_Subset_Two.txt > Iris_New.txt

Other Tools:

In many cases, the user will need to compute multiple commands in one line. This can be done with the semicolon, which acts as a separator between Unix commands. Another important tool is the pipe operator, which takes the output of one command and utilizes it with another command. For example, if a user were looking for all files within a directory that contained a particular string, they could pipe together the ls and grep commands in order to get the desired output. Redirection tasks are performed using the greater than sign, which is used to send the output of a command to a new file.

$ head -n 3 Iris_New.txt ; wc Iris_New.txt
,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3,1.4,0.2,setosa
      10      10     297 Iris_New.txt
$ ls -l | grep "Iris"
-rw-r--r--   1 abraham.mathew  1892468438     4209 Nov  3 15:23 Iris_Data.csv
-rw-r--r--   1 abraham.mathew  1892468438      297 Nov  3 21:45 Iris_New.txt
-rw-r--r--   1 abraham.mathew  1892468438      784 Nov  3 15:48 Iris_Subset.csv
-rw-r--r--   1 abraham.mathew  1892468438      157 Nov  3 21:37 Iris_Subset_One.txt
-rw-r--r--   1 abraham.mathew  1892468438      140 Nov  3 21:37 Iris_Subset_two.txt
$ head -n 10 Iris_Data.csv > Iris_Redirection.txt
$ head -n 10 Iris_Redirection.txt
,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa
7,4.6,3.4,1.4,0.3,setosa
8,5,3.4,1.5,0.2,setosa
9,4.4,2.9,1.4,0.2,setosa

There you have it, the basics for getting acquainted with the command line. While there are many other important command line tools, including curl, sed, awk, and wget, the procedures mentioned in this post will provide users with the essential building blocks. There is a steep learning curve, but the long term benefits of using the command line are well worth the short term costs.

Related

Leave a comment