Managing Files
Part 1: Working with the Contents of Files
Let's consider the content of file /tmp/matrix.c
. You may paste the contents of the file into /tmp/matrix.c
using your favorite text editor.
head: read the first few lines of the given text as an input
The initial three lines of content will be directed into a new file named '/tmp/matrix_head.txt'.
If you use only
head /tmp/matrix.c
, the first ten lines are displayed.tail: read the final few lines of any text given to it as an input
The last six lines of the file
/tmp/matrix.c
will be shown in/tmp/matrix_tail.txt
.more: lets you view text files or other output in a scrollable manner. It displays the text one screenful at a time. Press
<Space>
to advance the screen.All of the lines of the file
/tmp/matrix.c
will be displayed.less: is a program similar to more, but allows backward movement in the file as well as forward movement. Additionally, you may search for patterns using less.
The content of the file
/tmp/matrix.c
will be displayed on the terminal and you can navigate by pressing the up and down arrows.wc: word count
Here, 21 is shown in the first column which represents the 21 lines, 64 words and 367 characters of the file
/tmp/matrix.c
.It is possible to provide the output for multiple files by listing the name of each separated by a space. For example:
wc file1 file2 file3
.In case you need to know the size of an image file in the current directory as well as the total for all of them, you can use the
-c
option like:wc -c *.jpg
.
Part 2: Data Manipulation
Let's apply commands to filter, sort, group, match, and replace data in the file /tmp/data.txt
. You may paste the contents of the file into /tmp/data.txt
using your favorite text editor.
NAME
START LOCATION
END LOCATION
cM
SNPs
Comments
Wendi
72017
5827331
12.43
1686
Match to Mom
Sheila
6514775
1500362
6.65
1089
Match to Mom
Michael
3793615
12596858
17.25
2785
Match to Dad or IBS
Robert
4090545
5115145
2.68
500
Mom but not me
Sheila
2514775
5600362
8.65
1189
Match to Mom
sed: a special editor for modifying files, mostly used for substitutions
This replaces all occurrences of
Sheila
withLinda
in the file/tmp/data.txt
, and sends the output todata.txt.bak
.It is crucial to redirect the desired changes into another file in case you will need to review or compare to the original file.
grep: search the input file(s) for lines containing a match to a given pattern. This utilizes regular expression patterns.
The content of the file
/tmp/data.txt.2
will include the lines that contain "Match to Mom", taking the file/tmp/data.txt
as an input.awk: to parse and manipulate tabular data. It operates on a line-by-line basis and iterates through the entire file.
It will display in columns: NAME (column 1) and Comments (column 6) in the
/tmp/data_tab.txt
file.You may need to list only the rows that contain a value of cM greater than 10, then you run
awk '$4 >10' /tmp/data.txt
If you want to know the rows that contain "Match to Mom", then type
awk '$4 ~/Match to Mom/'
sort: is used to sort a file, arranging records in a particular order. By default, the
sort
command sorts file using ASCII.In this case, the data is going to be sorted according to SNPs because the option
5k
(5th column) is set. It was set alson
because they are numbers.If you need to sort data in descending order, you will need to use the option
-r
, which means reverse, like this:sort -k 5n -r /tmp/data.txt
.In case you want to sort and remove duplicates, then use the option
-u
, like this:sort -u /tmp/data.txt
.If you want to sort a list to ordered by month name, then use the option
-M
, like this:sort -M /your/file
.
Part 3: Working with a Collection of Files
Let's work with the files that are located inside /tmp/test_files
. Here are the instructions to create them.
find: to search for files based on various search criteria like permissions, user ownership, modification of date/time, size, etc.:
In this case, the search happens in the current path (
.
) and is looking for those files that have the number 6 in their name, and were created by the user user.If you want to search for a file(s) in which the filename has the characters "conf" and modified 7 days ago, then type:
find / -name "*conf" -mtime 7
.If you want to find a file without searching over the entire network or mounted filesystems on your system, you would run:
find / -name foo.bar -print -xdev
.
Part 4: Comparing Differences Between the Contents of Files
For this section, we will use the files /tmp/test_file/test6.txt
and /tmp/test_file/test66.txt
. You may paste the contents of the two files into /tmp/test_file/
using your favorite text editor.
diff: show the differences between two files' contents
In this case, the differences between the two files
test6.txt
andtest66.txt
are located in lines 5 and 6.If you want to restrict the number of columns, you can run:
diff --width=5 test6.txt test66.txt
.If you want to know if the files are different without interest in which lines are different, please run
diff -q test6.txt test66.txt
.
Part 5: Compressing and Extracting Files
To create files with extensions such as .tar
, tar.gz
, .tgz
, .gz
, or .bz2
use the commands tar
(also useful to extract files), gzip
, or bzip2
.
gzip: compresses the size of the given files. Whenever possible, each file is replaced by one with the extension
.gz
.:It will compress test66.txt file using the "gzip" command it will have as an output
test66.gz
.bzip2: bzip2 creates smaller archives than gzip but has a slower decompression time and higher memory use
It will compress the file test66.txt. It will keep the uncompressed version and create the new file:
test66.txt
andtest66.txt.bz2
.To decompress the file and remove the
bz2
extension, please runbzip2 -d test66.txt.bz2
.zip: compress the size of the given files. Whenever possible, each file is replaced by one with the extension
.gz
.It will compress
test66.txt
andtest6.txt
files into a directory called test.zip.To compress a directory, please run
zip -r squash.zip dir1
. This will zip the whole directorydir1
intosquash.dir
.To decompress, use
unzip squash.zip
; this unzips it in your current working directory.tar: bundle many files together into a single file on a single tape or disk. If you have more than 2 files then it is recommended to use tar instead of gzip or bzip2.
It will compress the
/dirname
directory and create a file called a "tar ball" namedoutput.tar
.To install
tar
, please runyum install tar
orapt-get install tar
, to extract the content ofoutput.tar
, please runtar -xvf output.tar
.
Last updated