Industrial Training

File Management

In this section the following notions are discussed:
1. Purpose of a File Management
2. File names, naming conventions
3. File allocation on storage media
4. Compaction

Purpose of a File Management

The file manager handles all files on secondary storage media. To perform these tasks, file management must:

1.0be able to identify the numerous files by giving unique names to them
2.0maintain a list telling where exactly each file is stored, how many sectors on the medium it occupies, and in which order those sectors make up the file
3. provide simple and fast algorithms to read and write files in cooperation with the device manager
4. give and deny access rights on files to users and programs
5. allocate and deallocate files to processes in cooperation with the process manager
6. provide users and programs with simple commands for file handling

File names, naming conventions

In order to make users, programs and the file manager itself able to identify the different files they must be given a unique file name.

The relative file name is what a user normally recognises as file name; it consists of a name and an extension, for instance problem.txt or forloop.cpp. Apart from some exceptions, relative file names look the same in all operating systems.

The name is normally given by the user, whereas the extension (which is separated from the name by a dot) generally indicates what kind of file it is.

common file extensions

The absolute file name is normally much longer than the user thinks it is. Here, the relative file name is preceeded by the place on disk it is stored, that is: the drive name and the directory names in which to find that file.

1. So the absolute file name consists of:
2. drive name
3. directory name(s)
4. file name
5. extension

File name and extension are separated by a dot. The directories are separated by slashes (UNIX) or back slashes (Windows, DOS). Because drive names and file organization differ from OS to OS, absolute file names look different depending on what operating system is used.

For instance, a file with the relative name syllabus.doc, saved by the user Peter in the directory data would look like that

in DOS: c:\data\syllabus.doc

in LINUX: /usr/home/Peter/data/syllabus.doc

Note that the absolute file name changes when the location is different. The relative file name, however, stays the same. So, after saving that file on a floppy disk, the absolute file name of the backup would be

in DOS: a:\syllabus.doc

in LINUX: /mnt/fdd0/syllabus.doc

A relative file name is restricted in length. How this restriction exactly looks like again depends on the OS. DOS has the hardest restictions, allowing the file name and also all directory names only to be 8 characters long, and the extension 3. This is properly known as "8.3"-restiction (speak: eight-dot-three). All other OS's allow the relative file name to be at least 14, but most often up to 255 characters long.

File allocation on storage media

On the storage medium a file is saven in blocks (sectors) of equal size. To access these files, device manager and file manager work together: The device manager "knows" where to find each sector on disk, but only the file manager has a list telling in what sectors either file is stored. This list is the File Allocation Table (FAT)

There are different ways of allocating files. The main concern is to provide a strategy that lets the FAT not grow too large, that makes it possible to retrieve a special sector of a file, and that wastes not too much storage space.

1. contiguous file allocation
2. non-contiguous file allocation (FAT)
3. chained allocation
4. indexed allocation

Contiguous file allocation
With contiguous file allocation a single set of blocks is allocated to a file at the time of file creation. Each file is stored contiguously, one sector after another.

The advantage is that the FAT only has to have a single entry for each file, indicating the name, the start sector, and the length. Moreover, it is easy to get a single block because its address can simply be calculated: If a file starts at sector c, and the nth block is wanted, the location on secondary storage is simply c+n.

The disadvantage is that it may be difficult (if not impossible) to find a sufficiently large set of contiguous blocks. From time to time it will be neccessary to perform compaction.

Contiguous file allocation is nowadays only used for tapes and recordable CDs. One does not make use of compaction algorithms, though, because data there is not supposed to be changed. It is rather overwritten/thrown away if no longer needed.

Non-contiguous file allocation (FAT)
With non-contiguous file allocation all blocks of a file can be distributed all over the storage medium. The File Allocation Table (FAT) lists not only all files, but has an entry for each sector the file occupies. Because all information is stored in the FAT, and no assumption on the distribution of the file is taken, this method of allocation is sometimes simply called FAT.

The advantage is that it is very easy to get a single block, because each block has its entry in the FAT. Additionally, it is a very simple allocation method where not much overhead is produced and no sophisticated search method for free blocks is needed.

The disadvantage is that the FAT can and will grow to an enormous size (imagine that a file of 1MB size must have 2000 entries in the FAT if each sector stores 512 Bytes of data). That slows the system down. Compaction will be needed from time to time.

FAT has been in use under DOS for a long time, and some alternations of it are still used by Win95 and Win98.

Chained allocation
With chained file allocation only the first blocks of either file gets an entry in the FAT, and this first sector has got a pointer at its end that points to the next sector of it (or indicates that it was the last).

The advantage is again that the FAT only has to have a single entry for each file, indicating file name and position of the first sector. The files do not have to be stored contiguously.

The disadvantage is it takes very long to retrieve a single block because that information is neither stored nor can it be calculated. If a special sector is needed, all preceeding sectors have to be read, all the time in order to get information about where the next block is located.

Indexed allocation With indexed file allocation also only the first blocks of either file get an entry in the FAT. In this first sector, however, no data is stored but only pointers to where the file is on storage medium. That is why the first block is called the index block.

Here as well the FAT only has to have a single entry for each file, indicating file name and position of the first sector. Additionally, it is easy to retrieve a single block because the information about where it is stored is saved in the first block.

The disadvantage is that for each file an additional sector is needed. Even a very small file always occupies at least two blocks, where the data would easy fit in one. So some of the secondary storage space is wasted.

Indexed allocation is (in minor variations) implemented in all UNIXes. It is fast and reliable, and nowadays the waste of storage space does not matter so much anymore.

Hi I am Pluto.