Welcome to the World of Free Computer Tutorial And Solution

Friday, 6 April 2012

Sikkim Manipal Univercity - SMU- bca 5th sem unix book



Unit-02-Unix File System
Structure
2.1 Introduction
Objectives
2.2 The UNIX Filesystem
2.3 Typical UNIX directory structure
2.3.1 Pathnames
2.3.2 Home directory
2.3.3 Current directory
2.3.4 Parent directory
2.4 Directory and File Handling Commands
2.5 Making Hard and Soft (Symbolic) Links
2.6 Specifying multiple filenames
2.7 Use of Quotes
2.8 File and Directory Permissions
2.1 Introduction
This unit introduces the filesystem of UNIX OS. It gives details like the types of files,
 various commands for navigating the filesystem, creation, deletion of files etc.,
It gives detail description of the various commands and option for handling of 
permission assigned to files and directories.
Objectives :
After studying this unit, you should be able to explain:
The UNIX filesystem and directory structure.
File and directory handling commands.
How to make symbolic and hard links.
How wildcard filename expansion works.
What argument quoting is and when it should be used.
File and directory permissions in more detail and how these can be changed.
2.2 The UNIX Filesystem
The UNIX operating system is built around the concept of a filesystem which is
 used to store all the information that constitutes the long-term state of the system. 
This state includes the operating system kernel itself, the executable files for the 
commands supported by the operating system, configuration information, 
temporary workfiles, user data, and various special files that are used to
 give controlled access to system hardware and operating system functions.
Every item stored in a UNIX filesystem belongs to one of four types:
1. Ordinary files
Ordinary files can contain text, data, or program information. Files cannot contain
 other files or directories. Unlike other operating systems, UNIX filenames are not 
broken into a name part and an extension part (although extensions are still
 frequently used as a means to classify files). Instead they can contain any 
 keyboard character except for '/' and be up to 256 characters long.
(note however that characters such as *,?,# and & have special meaning
 in most shells and should not therefore be used in filenames). 
Putting spaces in filenames also makes them difficult to manipulate - rather use 
the underscore '_'.
2. Directories
Directories are containers or folders that hold files, and other directories.
3. Devices
To provide applications with easy access to hardware devices, 

UNIX allows them to be used in much the same way as ordinary files. 
There are two types of devices in UNIX - block-oriented devices which 
transfer data in blocks (e.g. hard disks) and character-oriented devices that 
 transfer data on a byte-by-byte basis (e.g. modems and dumb terminals).
4. Links
A link is a pointer to another file. There are two types of links a hard link to 

a file is indistinguishable from the file itself. A soft link (or symbolic link) 
provides an indirect pointer or shortcut to a file. A soft link is implemented as 
a directory file entry containing a pathname.
2.3 Typical UNIX Directory Structure
The UNIX filesystem is laid out as a hierarchical tree structure which is anchored 
at a special top-level directory known as the root (designated by a slash '/').
 Because of the tree structure, a directory can have many child directories,
but only one parent directory. Fig. 2.1 illustrates this layout.

Fig. 2.1: Part of a typical UNIX filesystem tree
The top-level directory is known as the root. Beneath the root are 
several system directories. Directories are identified by the / character. For example:
/bin , /etc, /usrs etc.,
Historically, user directories were often kept in the directory /usr. 
However, it is often desirable to organise user directories in a different manner.
Directory
Typical Contents
/
The "root" directory
/bin
Essential low-level system utilities
/usr/bin
Higher-level system utilities and application programs
/sbin
Superuser system utilities (for performing system 
administration tasks)
/lib
Program libraries (collections of system calls that can 
be included in programs by a compiler) for low-level
 system utilities
/usr/lib
Program libraries for higher-level user programs
/tmp
Temporary file storage space (can be used by any user)
/home or /homes
User home directories containing personal file space for each user
. Each directory is named after the login of the user.

/etc
UNIX system configuration and information files
/dev
Hardware devices
/proc
A pseudo-filesystem which is used as an interface to the kernel.  
Includes a sub-directory for each active program (or process).
Fig. 2.2: Typical UNIX directories
Fig. 2.2 shows some typical directories of an UNIX system and briefly describes their contents.
2.3.1 Pathnames
Files and directories may be referred to by their absolute pathname. For example:
 (Refer fig2.1)
/home/system/username/f1.c
The initial / signifies you are starting at the root and hence indicates an absolute pathname.
Files and directories may also be referred to by a relative pathname. For example: (Refer fig2.1)
username/f1.c
The file location is relative to the current location.
2.3.2 The Home Directory
Unix provides each user his own directory known as home directory. Within this the users can 
store their own files, create subdirectories. This directory and its content is exclusive of the 
respective user. (Refer fig2.1)
Example: $ cd
(cd command without any arguments gives home directory of user)
2.3.3 The Current Directory
The current directory can be referred to by the character (a full stop). This refers to your 
actual location in the filestore hierarchy. When you log in the current directory is set to
 your home directory. (Refer fig2.1)
Example: $ cd.
(cd command with . Gives current working directory)
2.3.4 The Parent Directory
The parent directory is the directory immediately above the current directory. 
The parent directory can be referred to by the characters (two full stops). 
For example to refer to the file test in the parent directory:
../test
Relative path names may also be constructed by progressively stepping back through
 parent directories using the construct.
Self Assessment Questions I
1) All utilities, applications, data in Unix is stored as a file. (True/False)
2) The four different types of files in Unix are ____, ____, ____&_____.
3) A link is a pointer to another file .(True/False)
4) A ______ link is short cut to a file.
5) If the pathname starts with / it indicates a ________ pathname.
6) If the pathname starts with current working directory indicates a ________ pathname.
2.4 Directory and File Handling Commands
This section describes some of the more important directory and file handling commands.
pwd (print [current] working directory)
pwd displays the full absolute path to the your current location in the filesystem. So
    $ pwd
    /usr/bin
implies that /usr/bin is the current working directory.
ls (list directory)
ls lists the contents of a directory. If no target directory is given, then the contents of the
 current working directory are displayed. So, if the current working directory is /,
    $ ls
    bin   dev  home  mnt   share  usr  var
    boot  etc  lib   proc  sbin   tmp  vol
Actually, ls doesn't show you all the entries in a directory - files and directories that begin
 with a dot (.) are hidden (this includes the directories '.' and '..' which are always present). 
 The reason for this is that files that begin with a. usually contain important configuration
 information and should not be changed under normal circumstances. If you want to see all
 files
, ls supports the -a option:
    $ ls -a
Even this listing is not that helpful - there are no hints to properties such as the size, type and
 ownership of files, just their names. To see more detailed information, use the -l option 
 (long listing), which can be combined with the -a option as follows:
    $ ls -a -l
      (or, equivalently,)
    $ ls -al
Each line of the output looks like this:
where:
type is a single character which is either 'd' (directory), '-' (ordinary file), 'l' (symbolic link), 
'b' (block-oriented device) or 'c' (character-oriented device).

permissions is a set of characters describing access rights. There are 9 permission characters, 
describing 3 access types given to 3 user categories. The three access types are read ('r'), write 
(
'w') and execute ('x'), and the three users categories are the user who owns the file, users in the
 group that the file belongs to and other users (the general public). An 'r', 'w' or 'x' character means
 the corresponding permission is present; a '-' means it is absent.
links refers to the number of filesystem links pointing to the file/directory (see the discussion
 on hard/soft links in the next section).
owner is usually the user who created the file or directory.
group denotes a collection of users who are allowed to access the file according to the group
 access rights specified in the permissions field.
size is the length of a file, or the number of bytes used by the operating system to store the
 list of files in a directory.
date is the date when the file or directory was last modified (written to).
The -u option display the time when the file was last accessed (read).
name is the name of the file or directory.
ls supports more options. To find out what they are, type:
    $ man ls
man is the online UNIX user manual, and it can be used to get help with commands and find 
out about what options are supported. It has quite a terse style which is often not that helpful,
 so some users prefer to the use the (non-standard) info utility if it is installed:
    $ info ls
cd (change [current working] directory)
    $ cd path
changes your current working directory to path (which can be an absolute or a relative path). 
One of the most common relative paths to use is '..' (i.e. the parent directory of the current 
directory).
Used without any target directory
    $ cd
resets your current working directory to your home directory (useful if you get lost). 
If you change into a directory and you subsequently want to return to your original directory,
 use
   $ cd -
mkdir (make directory)
  $ mkdir directory
creates a subdirectory called  directoryin the current working directory. 
You can only create subdirectories in a directory if you have write permission on that directory.
rmdir (remove directory)
$ rmdir directory
removes the subdirectory directory from the current working directory. 
You can only remove subdirectories if they are completely empty (i.e. of all entries besides the 
'.
' and '..' directories).
cp (copy)
cp is used to make copies of files or entire directories. To copy files, use:
    $ cp source-file(s) destination
where source-file(s) and destination specify the source and destination of the copy respectively.
 The behaviour of cp depends on whether the destination is a file or a directory. 
If the destination is a file, only one source file is allowed and cp makes a new file 
called destination that has the same contents as the source file. If the destination is a directory,
 many source files can be specified, each of which will be copied into the destination directory. 
 Section 2.6 will discuss efficient specification of source files using wildcard characters.
To copy entire directories (including their contents), use a recursive copy:
$ cp -rd source-directories destination-directory
mv (move/rename)
mv is used to rename files/directories and/or move them from one directory into another. 
Exactly one source and one destination must be specified:
    $ mv source destination
If destination is an existing directory, the new name for source (whether it be a file or a directory)
 will be destination/source. If source and destination are files, source is renamed destination
Ifdestination is an existing file it will be destroyed and overwritten by source 
(you can use the -i option if you would like to be asked for confirmatio
n before a file is overwritten in this way).
rm (remove/delete)
    $ rm target-file(s)
Example:
$ rm file1.txt
removes the specified files. Unlike other operating systems, it is almost impossible to recover a deleted file unless you have a backup (there is no recycle bin!) so use this command with care.
 If you would like to be asked before files are deleted, use the -i option:
    $ rm -i myfile
    rm: remove 'myfile'?
rm can also be used to delete directories (along with all of their contents, including any
 subdirectories they contain). To do this, use the -r option. To avoid rm from asking any
 questions or giving errors (e.g. if the file doesn't exist) you used the -f (force) option. 
Extreme care needs to be taken when using this option - consider what would happen
 if a system administrator was trying to delete user will's home directory and accidentally typed:
$ rm -rf / home/will
(instead of rm -rf /home/will).
cat (catenate/type)
    $ cat target-file(s)
displays the contents of target-file(s) on the screen, one after the other. You can also use it to
 create files from keyboard input as follows (> is the output redirection operator, which will be 
discussed in the next chapter):
more and less (concatenate with pause)
    $ more target-file(s)
displays the contents of target-file(s) on the screen, pausing at the end of each screenful and
 asking the user to press a key (useful for long files). It also incorporates a searching facility
 (press '/' and then type a phrase that you want to look for).
You can also use more to break up the output of commands that produce more than one 
screenful of output as follows (| is the pipe operator, which will be discussed in the next chapter):
    $ ls -l | more
less is just like more, except that has a few extra features (such as allowing users to scroll
 backwards and forwards through the displayed file). less is not a standard utility, and may 
 not be present on all UNIX systems.
2.5 Making Hard and Soft (Symbolic) Links
Direct (hard) and indirect (soft or symbolic) links from one file or directory to another can be
 created using the ln command.
    $ ln filename linkname
creates another directory entry for filename called linkname (i.e. linkname is a hard link). 
Both directory entries appear identical (and both now have a link count of 2). 
If either filename or linkname is modified, the change will be reflected in the other
 file (since they are in fact just two different directory entries pointing to the same file).
    $ ln -s filename linkname
creates a shortcut called linkname (i.e. linkname is a soft link). The shortcut appears as an
 entry with a special type ('l'):
    $ ln -s hello.txt bye.txt
    $ ls -l bye.txt
lrwxrwxrwx   1 will finance 13 bye.txt -> hello.txt
The link count of the source file remains unaffected. Notice that the permission bits on a symbolic
 link are not used (always appearing as rwxrwxrwx). Instead the permissions on the link are
 determined by the permissions on the target (hello.txt in this case).
Note that you can create a symbolic link to a file that doesn't exist, but not a hard link. Another 
difference between the two is that you can create symbolic links across different physical disk 
devices or partitions, but hard links are restricted to the same disk partition.
Self Assessment Questions II
1) _____ displays the full absolute path to the your current location in the filesystem.
2) _____ changes your current working directory to path.
3) mkdir creates a subdirectory called  directory in the root.(True/False).
4) _______ is used to rename files/directories.
5) To displays the contents of target-file(s) on the screen ______ command is used.
2.6 Specifying multiple filenames
Multiple filenames can be specified using special pattern-matching characters. The rules are:
'?' matches any single character in that position in the filename.
'*' matches zero or more characters in the filename. A '*' on its own will match all files. '*.*'
 matches all files with containing a '.'.
Characters enclosed in square brackets ('[' and ']') will match any filename that has one
 of those characters in that position.
A list of comma separated strings enclosed in curly braces ("{" and "}") will be expanded as
a Cartesian product with the surrounding characters.
For example:
1. ??? matches all three-character filenames.
2. ?ell? matches any five-character filenames with 'ell' in the middle.
3. he* matches any filename beginning with 'he'.
4. [m-z]*[a-l] matches any filename that begins with a letter from 'm' to 'z' and ends in a
 letter from 'a' to 'l'.
5. {/usr,}{/bin,/lib}/file expands to /usr/bin/file /usr/lib/file /bin/file and /lib/file.
Note that the UNIX shell performs these expansions (including any filename matching) 
on a command's arguments before the command is executed.
2.7 Use of Quotes
Certain characters (e.g. '*', '-','{' etc.) are interpreted in a special way by the shell. In order to pass arguments that use these characters to commands directly (i.e. without filename expansion etc.), we need to use special quoting characters. There are three levels of quoting that you can try:
1. Try insert a '' in front of the special character.
2. Use double quotes (") around arguments to prevent most expansions.
3. Use single forward quotes (') around arguments to prevent all expansions.
There is a fourth type of quoting in UNIX. Single backward quotes (`) are used to pass the output of some command as an input argument to another. For example:
1. $ hostname
rose
2. $ echo this machine is called hostname`
this machine is called rose
2.8 File and Directory Permissions
Permission
File
Directory
Read
User can look at the contents of the file
User can list the files in the directory
Write
User can modify the contents of the file
User can create new files and remove existing files in the directory
Execute
User can use the filename as a UNIX command
User can change into the 
directory, but cannot list the
 files unless (s)he has read
permission. User can read files
 if (s)he has read permission on
 them.
Fig 2.3 : Interpretation of permissions for files and directories
Every file or directory on a UNIX system has three types of permissions, describing what operations
 can be performed on it by various categories of users. The permissions are read (r), write (w) 
 and execute (x), and the three categories of users are user/owner (u), group (g) and others (o). 
Because files and directories are different entities, the interpretation of the permissions 
assigned to each differs slightly, as shown in Fig 2.3.
File and directory permissions can only be modified by their owners, or by the superuser (root),
 by using the chmod system utility.
chmod (change [file or directory] mode)
$ chmod options files
chmod accepts options in two forms. Firstly, permissions may be specified as a sequence of 3
 octal digits (octal is like decimal except that the digit range is 0 to 7 instead of 0 to 9). Each octal
 digit represents the access permissions for the user/owner, group and others respectively. 
The mappings of permissions onto their corresponding octal digits is as follows:
---
0
--x
1
-w-
2
-wx
3
r--
4
r-x
5
rw-
6
rwx
7
For example the command:
    $ chmod 600 private.txt
sets the permissions on private.txt to rw------- (i.e. only the owner can read and write to the file).
Permissions may be specified symbolically, using the symbols u (user), g (group), o (other), a (all)
, r (read), w (write), x (execute), + (add permission), (take away permission) and = 
(assign permission). For example, the command:
    $ chmod ug=rw,o-rw,a-x *.txt
sets the permissions on all files ending in *.txt to rw-rw---- (i.e. the owner and users in the file's 
group can read and write to the file, while the general public do not have any sort of access).
chmod also supports a -R option which can be used to recursively modify file permissions, e.g.
$ chmod -R go+r play
will grant group and other read rights to the directory play and all of the files and directories
 within
 play.
chgrp (change group)
$ chgrp group files
can be used to change the group that a file or directory belongs to. It also supports a -R option.
2.9 Summary
Everything is a file in UNIX. There are various type sof filelike regular, directory, link, device.
 A filename is restricted tpo 255 characters. The filesystem is hierarchical structure and the top-
most directory is called the root.
The commands ln, cat can be used to create files. The ls command can be used with various option
( -l, -a,-R) to list the files in any manner .
The commands like pwd, mkdir, cd, rm , cp, cat are few of the very powerful built-in 
commands 
widely used to work on the filesystem.
The SHELL matches the filenames with the wildcards viz., *, ?, that have to be expanded before the
 command is executed by kernel. Any wild-card is escaped with a to be treated literally, 
 and if there are a number of them, should be replaced within quotes.

More details of the ones explained and other for to work on the filesystem can be 
learnt from the manual pages of Unix(man pages/ help pages ).
2.10 Terminal Questions
1) Explain the different types of files of a Unix Filesystem.
2) Mention the difference between hard and soft link.
3) Write the command to create direct and indirect links.
4) Explain the use of the option used with ls command:
-l, -al,-a
5) With proper syntax explain the use of chmod command.
6) What is the significance of cat command?
7) What are the types of permission for a file or directory allowed in unix?
8. What is the use of chgrp command ?
9) Mention the use of * with a file name.
10) Explain the use of and `(backquotes) in a unix command
2.11 Answers to SAQs and TQs
SAQ I
1) True
2) Ordinary, directories, device and links
3) True
4) Soft link
5) Absolute
6) Relative
SAQ II
1) pwd
2) cd
3) False
4) mv
5) cat
TQs
1. section 2.2
2. section 2.2
3. section 2.5
4. section 2.4
5. section 2.8
6. section 2.4
7. section 2.8
8. section 2.8
9. section 2.7
10. section 2.6
Unit-03-Unix File Handling Utilities
e-Slm Unit
Unit-03-Unix File Handling Utilities
Structure
3.1 Introduction
3.2 Inspecting File Content
3.3 Finding Files
3.4 Finding Text in Files
3.5 Sorting files
3.6 Splitting of files
3.7 Advanced Text File Processing
3.8 File Compression and Backup
3.9 Handling Removable Media (e.g. floppy disks)
3.1 Introduction
Unix has a elaborate list of commands used for working with files and directories.
 This unit gives a brief introduction of very important command used to work with the unix files. 
 There is definitely more support in Unix than just the commands, which are mentioned in this unit.
For any more help regarding the usage of a command while working on Unix use the unix 
documentation manual.
Objectives: 
After studying this unit, you should be able to explain:
Ways to examine the contents of files.
How to find files when you don't know how their exact location.
Ways of searching files for text patterns.
How to sort files.
Tools for compressing files and making backups.
Accessing floppy disks and other removable media.
3.2 Inspecting File Content
The other useful utilities apart from cat available in Unix for investigating the contents of 
files are explained below:
file filename(s)
file analyzes a file's contents for you and reports a high-level description of what type of file it
 appears to be:
usage of file command
$ file myprog.c letter.txt webpage.html
  myprog.c :  C program text
  letter.txt :  English text
  webpage.html:  HTML document text
head filename, tail filename
head and tail display the first and last few lines in a file respectively. You can specify the number
of lines as an option, e.g.
$ tail -20 messages.txt
$ head -5 messages.txt
tail includes a useful -f option that can be used to continuously monitor the last few lines of a
 (possibly changing) file. This can be used to monitor log files, for example:
$ tail -f /var/log/messages
continuously outputs the latest additions to the system log file.
objdump options binaryfile
objdump can be used to disassemble binary files - that is it can show the machine language
 instructions which make up compiled application programs and system utilities.
od options filename (octal dump)
od can be used to display the contents of a binary or text file in a variety of formats, e.g.
$ cat hello.txt
  hello world
$ od -c hello.txt

    0000000  h  e  l  l  o     w  o  r  l  d n
    0000014
$ od -x hello.txt

0000000 6865 6c6c 6f20 776f 726c 640a 0000014
3.3 Finding Files
There are a lot to look for things in Unix and that's where the find command can help.
The general format for this command is to specify the starting point for a search through the 
file system, followed by any actions desired.
Options
Meaning
atime n
True if file was accessed n days ago
ctime n
True if the file created n days ago
exec command
Execute command
mtime n
True if the file was modified n days ago
name pattern
True if filename matches pattern
print
print names of file found
type c
True if file is of type c
Where
C Meaning
d Directory
f file
l link
user name
True if file is owned by user name
The find command checks the specified options, going from left to right, once for each file or
 directory encountered.
Let us see few examples
find is used to list all files and directories below the current directory
$ find . - print
Let us display all .cpp files
$ find . - name "*.cpp" - print
To find a list of the directories
$ find . -type d - print
To find just those files that have been modified in the last three days.
$ find . - mtime - 3 -name "*.cpp" - print
5. Let us print all files with test
$ find . -name "test. ? " - print.
find can in fact do a lot more than just find files by name. It can find files by type (e.g. -type f fo
r files, -type d for directories), by permissions (e.g. -perm o=r for all files and directories that 
can be read by others), by size (-size) etc. You can also execute commands on the files you find.
 For example,
$ find . -name "*.txt" -exec wc -l '{ }' ';'
counts the number of lines in every text file in and below the current directory. The '{ }' is 
replaced by the name of each file found and the ';' ends the -exec clause.
For more information about find and its abilities, use the command
man find
or
info find.
which (sometimes also called whence) command
which is used to find out where an application program or system utility is stored on disk. For example:
$ which ls
Output we get:     /bin/ls
locate string
find can take a long time to execute if you are searching a large filespace (e.g. searching from / downwards). The locate command provides a much faster way of locating all files whose names match a particular search string. For example:
$ locate ".txt"
will find all filenames in the filesystem that contain ".txt" anywhere in their full paths.
One disadvantage of locate is it stores all filenames on the system in an index that is usually updated only once a day. This means locate will not find files that have been created very recently. It may also report filenames as being present even though the file has just been deleted. Unlike find, locate cannot track down files on the basis of their permissions, size and so on.
3.4 Finding Text in Files
grep (General Regular Expression Print)
$ grep options pattern files
grep searches the named files (or standard input if no files are named) for lines that match a given pattern. The default behaviour of grep is to print out the matching lines. For example:
$ grep hello *.txt
searches all text files in the current directory for lines containing "hello". Some of the more useful options that grep provides are:
-c (print a count of the number of lines that match)
-i (ignore case)
-v (print out the lines that don't match the pattern) and
-n (printout the line number before printing the matching line).
-l (Displays list of filenames only)
-x (Matches pattern with entire line)
-f File (Takes patterns from file, one per line)
$ grep -vi hello *.txt
searches all text files in the current directory for lines that do not contain any form of the word hello
 (e.g. Hello, HELLO, or hELlO).
To search for all files in an entire directory tree for a particular pattern, combine grep with find 
using backward single quotes to pass the output from find into grep. So
$ grep hello `find . -name "*.txt" -print`
will search all text files in the directory tree rooted at the current directory for lines
 containing the
 word "hello".
The patterns that grep uses are actually a special type of pattern known as regular expressions. 
Just like arithemetic expressions, regular expressions are made up of basic sub expressions 
combined by operators.
The most fundamental expression is a regular expression that matches a single character. 
 Most characters, including all letters and digits, are regular expressions that match themselves.  
Any other character with special meaning may be quoted by preceding it with a backslash ().
A list of characters enclosed by '[' and ']' matches any single character in that list;
if the first character of the list is the caret `^', then it matches any character not in the list.
A range of characters can be specified using a dash (-) between the first and last items in the 
list.
 So [0-9] matches any digit and [^a-z] matches any character that is not a digit.
The caret `^' and the dollar sign `$' are special characters that
match the beginning and end of a line respectively. The dot '.' matches any character. So
$ grep ^..[l-z]$ hello.txt
matches any line in hello.txt that contains a three character sequence that ends with a lowercase
 letter from l to z.
egrep (extended grep) is a variant of grep that supports more sophisticated regular 
expressions. 
Here two regular expressions may be joined by the operator `|'; the resulting regular 
 expression
 matches any string matching either subexpression. Brackets '(' and ')' may be used for grouping 
regular expressions. In addition, a regular expression may be followed by one of several repetition
 operators:
`?' means the preceding item is optional (matched at most once).
`*'  means the preceding item will be matched zero or more times.
`+' means the preceding item will be matched one or more times.
`{N}'   means the preceding item is matched exactly N times.
`{N,}'  means the preceding item is matched N or more times. `{N,M}' means the preceding item

 is matched at least N times, but not more than M times.
For example, if egrep was given the regular expression
    '(^[0-9]{1,5}[a-zA-Z ]+$)|none'
it would match any line that either:
begins with a number up to five digits long, followed by a sequence of one or more letters or spaces,
 or
contains the word none
Note that UNIX systems also usually support another grep variant called fgrep (fixed grep) which
 simply looks for a fixed string inside a file (but this facility is largely redundant).
Self Assessment Questions I
1) ______ analyzes a file's contents for you and reports a high-level description of what type of file
 it appears to be.
2) _____and_____ display the first and last few lines in a file
3) ______ can be used to displays the contents of a binary or text file.
4) ______ is used to find out where an application program or system utility is stored on disk.
5) grep stands for ________________.
6) grep command with _____ option prints out the lines that don't match the pattern.
7) _________ and _________are the variants of grep
3.5 Sorting files
The Sort command is used to sort the file contents either alphabetically or numerically.
sort filenames
sort sorts lines contained in a group of files alphabetically (or if the -n option is specified) numerically. The sorted output is displayed on the screen, and may be stored in another file by redirecting the output. So
$ sort input1.txt input2.txt > output.txt
outputs the sorted concentenation of files input1.txt and input2.txt to the file output.txt.
uniq filename
uniq removes duplicate adjacent lines from a file. This facility is most useful when combined 
with sort:
$ sort input.txt | uniq > output.txt
Cut:
It is a filter used to cut or pick up a given number of character or fields from the specified file.
File emp_info:
Emp_id Name Qualification Ph_no Address age designation
1 bob BCA 99999 Mumbai 25 System analyst
2 jai MCA 88888 Mumbai 25 system analyst
Example:
To view only a few selected fields ,for instance name and designation field from the file emp_info
 given above.
$ cut f 2,7 emp_info
Output:
bob System analyst
jai System analyst
To view fields 2 through 7
$ cut f 2-7 emp_info
Output:
bob BCA 99999 Mumbai 25 System analyst
jai MCA 88888 Mumbai 25 system analyst
The cut command assumes that the fields are separated by tab character. If the fields are delimited 
by some character other than the default tab character , cut supports an option d which allows to
 set the delimiter. If the emp_info has the information stored in the following format.
File emp_info:
Emp_id : Name : Qualification : Ph_no : Address : age : designation
1 : bob : BCA : 99999 : Mumbai :25 : System analyst
2 : jai : MCA : 88888 : Mumbai : 25 : system analyst
The command for listing the Name and designation field is as follows:
$ cut f 2,7 d : emp_info
The cut command can also cut specified number of columns from a given file and display the
 result on the standard output. The option used for it is c. For example,
$ cut c 1-15 emp_info
The above command displays the first 15 columns from each line in the file emp_info.
3.6 Splitting a file
It may happen that the file you are handling is huge and takes too much time to edit. In such a 
case you might feel that the file should be split into smaller files. The "split" utility performs
 this 
task. Having split a file into smaller pieces, the pieces can be edited singly and then can be 
concatenated into one whole file again with the cat command.
To split a file test, which is containing 100 lines into 25 lines each
$ split - 25 test
xaa
xab
xac
xad
3.7 Advanced Text File Processing
sed (stream editor)
sed performs basic text transformations on an input stream (i.e. a file or input from a pipeline).
For example, to delete lines containing particular string of text, or to substitute one pattern for
 another wherever it occurs in a file. sed is probably at its most useful when used directly from
 the command line with simple parameters:
$ sed "s/pattern1/pattern2/" inputfile > outputfile
(substitutes pattern2 for pattern1 once per line)
$ sed "s/pattern1/pattern2/g" inputfile > outputfile
(substitutes pattern2 for pattern1 for every pattern1 per line)
$ sed "/pattern1/d" inputfile > outputfile
(deletes all lines containing pattern1)
$ sed "y/string1/string2/" inputfile > outputfile
(substitutes characters in string2 for those in string1)
awk (Aho, Weinberger and Kernigan)
awk is useful for manipulating files that contain columns of data on a line by line basis. Like sed,
 awk statements can be directly passed on the command line, or a script file can be written and 
allow the awk to read the commands from the script.
The general syntax is
awk[-f re] [parameter] ['prog'] [-f prog file] [in-file]
in-file . Specifies the list of files to be processed. If no files are given, awk processes 
the standard
 input.
The prog argument is a string, single quoted.
The parameter options let you assign values to various variables.
The awk utility provides the ability to isolate and process separate filled of the lines being scanned.
Let us consider a simple example
$ cat stu
Name age sex
Rama 20 m
Krishna 25 m
Sita 21 f
$
The name field can be referenced as $ 1, age as $2 and sex as $3. The symbol pair $n is called a
 field variable. Where n is the field number. A built-in variable, NF, holds the number of fields, 
so $ NF means the last field. NF and several other built - in variables can be accessed in your awk
 programs, rather like the predefined shell variables. By default, fields are defined as any 
contiguous set of nonwhite space characters separated by white space.
The F option lets you change the built-in variables FS (Field separator) from spaces/tabs to any 
single nonspace character.
You know awk can process one record at a time. Let us process records stored in stu file.
$ awk ' {print $ 0 } ' stu
Name age sex
Rama 20 m
Krishna 25 m
Sita 21 f
$ awk ' { print $2 } ' stu
age
20
25
21
$ awk ' { print $2, $1, $3 } ' stu
age Name Sex
20 Rama m
25 Krishna m
21 Sita f
awk Relational Operators :
The following relational operator's are used
Relational operator
Meaning
x = = y
x equals y ?
x ! = y
x no equal to y ?
x > y
x greater than y ?
x > = y
x greater than or equal to ? x less than y ?
x < y
x less than y ?
x < = y
x less than or equal to y?
x ~ re
x matches the regular expression re?
x!~ re
x does not match the regular expression re?
The exact meaning of a relational operator depends on the data type of the two variables or 
expressions being compared.
For example

$ awk ' NR = = 2, NR = = 3' stu
The above command print line numbers from 2 to 3 both are inclusive.
Example:
Write a script cricket.awk to calculate the team's batting average and to check if Mike Atherton
 got another duck:
$ cat > cricket.awk
BEGIN { players = 0; runs = 0 }
{
players++; runs +=$3
}
/atherton/
{
if (runs==0) print "atherton duck!"
}
END
{
print "the batting average is " runs/players
}
press (ctrl-d)
$ awk -f cricket.awk cricket.dat

atherton duck!
the batting average is 21.2
$
The BEGIN clause is executed once at the start of the script, the main clause once for every
 line, 
the /atherton/ clause only if the word atherton occurs in the line and the END clause once at
 the end of the script.
The tr command
The tr (translate) command is a simple but useful filter designed to replace one or more characters
 in your files with one or more other characters.
The syntax is
tr [-cds] [in-string] [out-string]
The - s option substitutes all of the specified character's with another character specified 
character and displays the results.
Let us consider a simple example
$ cat display
Babu
Banu
Bima
Bindu
To substitute "c" s for all the "B": s. You input the command.
$ tr BC < display
Cabu
Canu
Cima
Cindu
The - d option deletes characters you specify from your file and sdisplays the result. So, to delete all of the "B"
$ tr -d B<display
abu
anu
ima
Indu
The - c option tells tr not to match the specified characters. It is used with the - s and -d 
options to modify the way that those options operate. For example, we want to delete all 
of the characters except "B"
$ tr -cd B< display
B
B
B
B
Without any option, tr does a straight substitution. For example let us substitute
u with a blank
$ tr u ' ' < display
Bab
Ban
Bima
Bind
$
You can also use ranges of character in your arguments for tr, for example, to change all of
 the lowercase characters to uppercase. You could type
$ cat old
rama
krishna
sita
$ tr '[a-z]' '[A-Z]' < old
RAMA
KRISHNA
SITA
$
Self Assessment Questions II
1) sort sorts lines contained in a group of files only alphabetically. (True/False)
2) _________ removes duplicate adjacent lines from a file.
3) _________ is a filter used to cut or pick up a given number of character or fields from the specified 
file.
4) sed is not a stream editor. (True/False)
5) Given the file criket.dat
1 atherton     0   bowled
2 hussain     20   caught
3 stewart     47   stumped
4 thorpe      33   lbw
5 gough        6   run-out
What will be the output for the following command?
awk '{ print $0 }' cricket.dat
6) tr stands for _________________
7) To break a big file into smaller units ___________ command is used.
3.8 File Compression and Backup
UNIX systems usually support a number of utilities for backing up and compressing files. 
The most useful are:
tar (tape archiver)
tar backs up entire directories and files onto a tape device or (more commonly) into a single 
disk file known as an archive. An archive is a file that contains other files plus information 
 about
 them, such as  their filename, owner, timestamps, and access permissions. tar does not perform
 any compression by default.
To create a disk file tar archive, use
    $ tar -cvf archivenamefilenames
where archivename will usually have a .tar extension. Here the c option means create, v
 means verbose (output filenames as they are archived), and f means file.To list the contents of
 a tar archive, use
    $ tar -tvf archivename
To restore files from a tar archive, use
$ tar -xvf archivename
cpio
cpio is another facility for creating and reading archives. Unlike tar, cpio doesn't automatically
 archive the contents of directories, so it's common to combine cpio with find when creating an
 archive:
$ find . -print -depth | cpio -ov -Htar > archivename
This will take all the files in the current directory and the
directories below and place them in an archive called archivename.The -depth option controls

 the order in which the filenames are produced and is recommended to prevent problems with
 directory permissions when doing a restore.The -o option creates the archive, the -v option prints
 the names of the files archived as they are added and the -H option  specifies an archive format 
type (in this case it creates a tar archive).
Another common archive type is crc, a portable format with a checksum for error control.
To list the contents of a cpio archive, use
    $ cpio -tv < archivename
To restore files, use:
    $ cpio -idv < archivename
Here the -d option will create directories as necessary. 
To force cpio to extract files  on top of files
 of the same name that already exist (and have the same or 
later modification time), use the 
-u 
option.
compress, gzip
compress and gzip are utilities for compressing and decompressing individual files (which may 
be or may not be archive files). To compress files, use:
$ compress filename
or
$ gzip filename
In each case, filename will be deleted and replaced by a compressed file called filename.Z or
 filename.gz. To reverse the compression process, use:
$ compress -d filename
or
$ gzip -d filename
3.9 Handling Removable Media (e.g. floppy disks)
UNIX supports tools for accessing removable media such as CDROMs and floppy disks.
mount, umount
The mount command serves to attach the filesystem found on some device to the filesystem tree. 
Conversely, the umount command will detach it again (it is very important to remember to do this
 when removing the floppy or CDROM). The file /etc/fstab contains a list of devices and the point
s at which they will be attached to the main filesystem:
$ cat /etc/fstab
/dev/fd0   /mnt/floppy  auto    rw,user,noauto  0 0
/dev/hdc   /mnt/cdrom   iso9660 ro,user,noauto  0 0
In this case, the mount point for the floppy drive is /mnt/floppy and the mount point for the
 CDROM is /mnt/cdrom. To access a floppy we can use:
$ mount /mnt/floppy
$ cd /mnt/floppy

$ ls (etc...)
To force all changed data to be written back to the floppy and to detach the floppy disk
from the
 filesystem, we use:
$ umount /mnt/floppy
mtools
If they are installed, the (non-standard) mtools utilities provide a convenient way of accessing 
 DOS-formatted floppies without having to mount and unmount filesystems. Use DOS-type
 commands like "mdir a:", "mcopy a:*.* .", "mformat a:", etc.
3.10 Summary
The commands find and file to locate file. To work on the 
content of the file like sorting, pattern
 matching, translating (translate characters, to change the case 
of alphabets, to compress 
consecutive occurrences (-s) and delete a specific character), 
text transformations and
 manipulation UNIX provides utilities sort, grep, tr, awk, sed etc. 
tr is the utility that works
 only with standard input.
To extract required amount of information from files the 
commands/utilities like head, tai
l, cut, split can be used.
UNIX systems usually support a number of utilities for backing 
up and compressing files 
like 
tar, cpio, gzip etc.
3.11 Terminal Questions
1) What are mtools?
2) How can you access a floppydisk ? explain.
3) Describe the various utilities used for taking back-up and compression activities.
4) Explain the use of sed and awk utilities.
5) Bring out the usage of sort and uniq command with an example.
6) Explain the ways in which one can identify the file exact location.
7) Describe the various options used with the cut command.
8) Explain the significance of option I, -v with grep command
3.12 Answers to TQs and SAQs
SAQ I
1) File
2) head and tail
3) od
4) which
5) General Regular Expression Print
6) v
7) egrep and fgrep
SAQ II
1) False
2) uniq
3) cut
4) False
5) The output is
1 atherton     0   bowled
2 hussain     20   caught
3 stewart     47   stumped
4 thorpe      33   lbw
5 gough        6   run-out
6) translate
7) split
TQs
1) section 3.8
2) section 3.8
3) section 3.7
4) section 3.6
5) section 3.5
6) section 3.3
7) section 3.5
8) section 3.4

No comments:

Post a Comment