OPS102 - Filesystem Basics
Hierarchical File Systems
Most modern computers are equipped with one or more random-access storage devices -- either a mechanical hard disk drive (HDD), or a fully electronic solid state disk (SSD). Both of these provide a numbered set of blocks or sectors, each of which stores a set amount of data (typically 512 or 1024 bytes).
In order to conveniently use this storage, it is arranged into files, which are named collections of bytes of arbitrary length. The organization of blocks/sectors into files is handled by a filesystem, which is a scheme for structuring data, along with the corresponding software to implement this scheme. Most filesystems track certain metadata about a file in addition to the file name (or "filename"), such as the date/time of creation, the date/time of last modification, the owner or original creator of the file, and the permissions applicable to the file (for example, who is permitted to read and to change the file contents).
A hierarchical filesystem introduces the concept of directories, which are special files which hold zero or more other files. The files in a directory may themselves be directories, enabling files to be nested into an arbitrary hierarchy. The master directory is called the root directory.
When graphical user interfaces were developed, the metaphor of a traditional paper-based office was introduced, and directories were called folders in this metaphor (a folder in a traditional office is a piece of card stock folded in half to group together related papers). Therefore, the terms directory and folder are synonyms.
Filenames
The rules for valid filenames vary with the filesystem, but generally, filenames may include letters, numbers, dashes, underscores, and periods. Other punctuation marks may be acceptable in some filesystems but not in others, and are therefore best avoided, especially if files may be transferred between different types of filesystems or between computers.
Spaces may be included in filenames, but may require quoting if accessed from the command-line, so that the shell does not interpret the filename as two or more separate filenames. For example, the filename "red leaf" may be interpreted as two separate filenames if written without quoting:
ls red leaf
When quotes are added, the ambiguity is removed, and the shell will correctly interpret the filename as a single name:
ls "red leaf"
For this reason, it is good practice to avoid using spaces in filenames (underscores are a good alternative).
Extensions
Many operating systems use an extension at the end of a filename to denote the type of data stored in the file. These extensions are delimited by a period followed by one or more characters. For example, in the filename:
ops102_project.pdf
The extension is "pdf", denoting a file in Portable Document Format.
It is unusual to use multiple extensions on a Windows system, but not uncommon on a Linux (or other Unix-like) system.
For example, on a Linux system, the filename
backup.tar.gz
has two extensions, "tar" indicating a archive created with the tar
command, and "gz" indiciating that the file was compressed with the gzip
command.
Case Sensitivity
Some filesystems are case-sensitive, and UPPER- and lower-case letters are considered to be different. For example, the filenames MILK.PDF
, Milk.pdf
, and milk.pdf
refer to three different files. This is the case with most Linux (and Unix-like) filesystems.
Most Windows filesystems are not case-sensitive, so the filesnames MILK.PDF
, Milk.pdf
, and milk.pdf
would refer to the same file.
Current Directory or Working Directory
Most operating systems have the concept of a "current directory" or "working directory", which allows a directory to be temporarily designated as the current working location. The working directory may be changed at any time.
Pathnames
A pathname is a filename that includes information about the directory in which the file is stored. (Sometime pathnames are simply called filenames!).
There are three types of pathnames:
Absolute Pathname
An absolute pathname starts with a slash (on Unix-based operating systems, such as Linux), or a backslash (on Windows), indicating the root directory. It contains the names of all of the directories from the root directory to the specified file, separated by slash/backslash characters.
For example, on a Linux system, the pathname
/home/kim/ops102/presentation.pdf
indicates that the file presentation.pdf can be found by starting at the root directory, then traversing to a directory named "home" containing the directory "kim" containing the directory "ops102" containing the file ("presentation.pdf").
Similarly, the Windows pathname
\Users\kim\ops102\presentation.pdf
indicates that the file presentation.pdf can be found by starting at the root directory, then traversing to a directory named "Users" containin the directory "kim" containing the directory "ops102" containing the file ("presentation.pdf").
Absolute pathnames can be readily identified by the fact that they start with the slash/backslash character. They are often the longest form of the pathname, but they are unambiguous.
Relative-to-Home Pathnames
On Linux (and other Unix-like operating systems), pathnames may be specified starting with the tilde ("~") character followed by a slash, which represents the current user's home directory. This is a directory assigned by the system administrator which contains all of the user's personal files. The home directory is usually (but not always) /home/username
.
For example, if the current user's home directory is /home/kim
, then the filename
~/ops102/presentation.pdf
corresponds to the absolute pathname
/home/kim/ops102/presentation.pdf
For any file in the user's home directory, a relative-to-home pathname is generally shorter than an absolute pathanme. However, a relative-to-home pathname will have a different meaning for other users, since each user has a unique home directory.
You can also specify the user whose home directory is to be used as the starting point, by placing the a userid between the tilde and slash characters. Thus the pathanme
~kim/ops102/presentation.pdf
is relative to the home directory of the user "kim", regardless of which user is currently logged in, while the pathname
~sam/ops102/presentation.pdf
is relative to the home directory of the user "sam".
Relative Pathnames
Any pathname that does not start with a slash/backslash or a tilde character is a relative pathname, which is interpreted as starting at the current directory.
If the current directory is /home/kim/ops102
, then the Linux pathname
presentation.pdf
is interpreted as
/home/kim/ops102/presentation.pdf
The symbol ".." means the parent directory. Assuming the same current directory as above (/home/kim/ops102
), the Linux pathname
../Downloads/example.txt
Is interpreted as
/home/kim/Downloads/example.txt
In the same way, the symbol "." is interpreted as referring to the current directory, so
./test.odt
is the same as
test.odt
and both refer to
/home/kim/ops102/test.odt
Likewise, if the current directory on a Windows system was \Users\kim
, then the pathaname
ops102/presentation.pdf
refers to the absolute pathname
\Users\kim\ops102\presentatin.pdf
And the relative pathname
..\Downloads\example.txt
refers to the absolute pathname
\Users\kim\Downloads\example.txt
Relative pathnames are often the shortest form of pathname, but the meaning of a relative pathname changes based on the current working directory.