The command line can be a scary place for people who are traditionally accustomed to using point-and-click mechanisms for executing tasks on their computer. While the idea of interacting with files and software via text may seem like a terrifying concept, the terminal is a powerful tool that can boost productivity and provide users with greater control of their system. For data analysts, the command line provides tools to perform a wide array of tasks, including file explanation and exploratory data analysis. Getting accustomed with these capabilities will enable users to become more competent in their interactions with the computer.
Working Directory:
The working directory refers to the folder or files that are currently being utilized. This is usually expressed as a hierarchical path and can be found using the pwd (‘print working directory’) command. The working directory can be changed from the command line using the cd (‘change directory’) command. Once a working directory has been set, use ls to list the contents of the current directory.
$ pwd /Users/abraham.mathew $ cd /Users/abraham.mathew/Movies/ $ ls DDC - Model Visits.xlsx ILM Leads.xlsx DDC - Page Type Views.xlsx OBI Velocity-Day Supply.xlsx ...
Files and Folders:
The command line offers numerous tools for interacting with files and folders. For example, the mkdir (‘make directory’) command can be used to create an empty directory. Commands like mv and cp can then be used to rename files or copy the file into a new location. One can use the rm command to delete a file and rmdir to delete a directory.
$ mkdir Test_Dir_One $ mkdir Test_Dir_Two $ cp history.txt history_new.txt cp: history.txt: No such file or directory $ history > history.txt $ cp history.txt history_new.txt $ ls ... $ cp history.txt /Users/abraham.mathew/movies/history_new_two.txt $ pwd /Users/abraham.mathew/Movies $ rm history_new.txt $ rmdir Test_Dir_Two
Interacting with Files:
The head and tail commands can be used to print the beginning and ending contents of a text or csv file. Furthermore, use the wc (‘word count’) command to find the numbers of lines, words, and characters in a file. The grep command can be used to find certain elements within a file using regular expressions. To combine files side by side, one can use the paste command. Cat, which is typically used to print out the contents of a file, can also be used to concatenate a number of files together.
$ head -n 5 Iris_Data.csv ,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 1,5.1,3.5,1.4,0.2,setosa 2,4.9,3,1.4,0.2,setosa 3,4.7,3.2,1.3,0.2,setosa 4,4.6,3.1,1.5,0.2,setosa $ head -n 5 Iris_Data.csv > Iris_Subset_One.txt $ tail -n 5 Iris_Data.csv > Iris_Subset_two.txt $ wc Iris_Data.csv 151 151 4209 Iris_Data.csv $ wc -l Iris_Data.csv 151 Iris_Data.csv $ grep "setosa" Iris_Data.csv | wc -l 50 $ ls -l | grep "Iris" -rw-r--r-- 1 abraham.mathew 1892468438 4209 Nov 3 15:23 Iris_Data.csv -rw-r--r-- 1 abraham.mathew 1892468438 784 Nov 3 15:48 Iris_Subset.csv -rw-r--r-- 1 abraham.mathew 1892468438 157 Nov 3 21:37 Iris_Subset_One.txt -rw-r--r-- 1 abraham.mathew 1892468438 140 Nov 3 21:37 Iris_Subset_two.txt $ paste Iris_Subset_One.txt Iris_Subset_Two.txt ,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 146,6.7,3,5.2,2.3,virginica 1,5.1,3.5,1.4,0.2,setosa 147,6.3,2.5,5,1.9,virginica 2,4.9,3,1.4,0.2,setosa 148,6.5,3,5.2,2,virginica 3,4.7,3.2,1.3,0.2,setosa 149,6.2,3.4,5.4,2.3,virginica 4,4.6,3.1,1.5,0.2,setosa 150,5.9,3,5.1,1.8,virginica $ cat Iris_Subset_One.txt Iris_Subset_Two.txt > Iris_New.txt
Other Tools:
In many cases, the user will need to compute multiple commands in one line. This can be done with the semicolon, which acts as a separator between Unix commands. Another important tool is the pipe operator, which takes the output of one command and utilizes it with another command. For example, if a user were looking for all files within a directory that contained a particular string, they could pipe together the ls and grep commands in order to get the desired output. Redirection tasks are performed using the greater than sign, which is used to send the output of a command to a new file.
$ head -n 3 Iris_New.txt ; wc Iris_New.txt ,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 1,5.1,3.5,1.4,0.2,setosa 2,4.9,3,1.4,0.2,setosa 10 10 297 Iris_New.txt $ ls -l | grep "Iris" -rw-r--r-- 1 abraham.mathew 1892468438 4209 Nov 3 15:23 Iris_Data.csv -rw-r--r-- 1 abraham.mathew 1892468438 297 Nov 3 21:45 Iris_New.txt -rw-r--r-- 1 abraham.mathew 1892468438 784 Nov 3 15:48 Iris_Subset.csv -rw-r--r-- 1 abraham.mathew 1892468438 157 Nov 3 21:37 Iris_Subset_One.txt -rw-r--r-- 1 abraham.mathew 1892468438 140 Nov 3 21:37 Iris_Subset_two.txt $ head -n 10 Iris_Data.csv > Iris_Redirection.txt $ head -n 10 Iris_Redirection.txt ,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species 1,5.1,3.5,1.4,0.2,setosa 2,4.9,3,1.4,0.2,setosa 3,4.7,3.2,1.3,0.2,setosa 4,4.6,3.1,1.5,0.2,setosa 5,5,3.6,1.4,0.2,setosa 6,5.4,3.9,1.7,0.4,setosa 7,4.6,3.4,1.4,0.3,setosa 8,5,3.4,1.5,0.2,setosa 9,4.4,2.9,1.4,0.2,setosa
There you have it, the basics for getting acquainted with the command line. While there are many other important command line tools, including curl, sed, awk, and wget, the procedures mentioned in this post will provide users with the essential building blocks. There is a steep learning curve, but the long term benefits of using the command line are well worth the short term costs.
