How data is stored in the repository

How data is stored in the repository
Prev	Chapter 2. The Repository	Next

For most purposes it isn't important how cvsnt stores information in the repository. In fact, the format has changed in the past, and is likely to change in the future. Since in almost all cases one accesses the repository via cvsnt commands, such changes need not be disruptive.

However, in some cases it may be necessary to understand how cvsnt stores data in the repository, for example you might need to track down cvsnt locks (the section called “Several developers simultaneously attempting to run CVS”) or you might need to deal with the file permissions appropriate for the repository.

Where files are stored within the repository

The overall structure of the repository is a directory tree corresponding to the directories in the working directory. For example, supposing the repository is in

/usr/local/cvsroot

here is a possible directory tree (showing only the directories):

/usr
 |
 +--local
 |   |
 |   +--cvsroot
 |   |    |
 |   |    +--CVSROOT
          |      (administrative files)
          |
          +--gnu
          |   |
          |   +--diff
          |   |   (source code to gnu diff)
          |   |
          |   +--rcs
          |   |   (source code to rcs)
          |   |
          |   +--cvsnt
          |       (source code to cvsnt)
          |
          +--yoyodyne
              |
              +--tc
              |    |
              |    +--man
              |    |
              |    +--testing
              |
              +--(other Yoyodyne software)

With the directories are history files for each file under version control. The name of the history file is the name of the corresponding file with ,v appended to the end. Here is what the repository for the yoyodyne/tc directory might look like:

  $CVSROOT
    |
    +--yoyodyne
    |   |
    |   +--tc
    |   |   |
            +--Makefile,v
            +--backend.c,v
            +--driver.c,v
            +--frontend.c,v
            +--parser.c,v
            +--man
            |    |
            |    +--tc.1,v
            |
            +--testing
                 |
                 +--testpgm.t,v
                 +--test2.t,v

The history files contain, among other things, enough information to recreate any revision of the file, a log of all commit messages and the user-name of the person who committed the revision. The history files are known as rcs files, because the first program to store files in that format was a version control system known as rcs. For a full description of the file format, see the man page [rcsfile(5)], distributed with rcs, or the file doc/rcsfile in the cvsnt source distribution. This file format has become very common--many systems other than cvsnt or rcs can at least import history files in this format.

The rcs files used in cvs and cvsnt differ in a few ways from the standard format. The biggest difference in cvs is magic branches; for more information see the section called “Magic branch numbers”. Also in cvsnt the valid tag names are a subset of what rcs accepts; for cvsnt's rules see the section called “Tags-Symbolic revisions”. cvsnt also brings binary diffs and mergepoints to the table. Future versions of cvsnt may introduce still further changes, so it is unwise to try to read (or write to) the repository with rcs. cvsnt provides some rcs 'lookalike' comands for accessing the repository files.

File permissions

All ,v files are created read-only, and you should not change the permission of those files. The directories inside the repository should be writable by the persons that have permission to modify the files in each directory. On Unix, this normally means that you must create a group (see group(5)) consisting of the persons that are to edit the files in a project, and set up the repository so that it is that group that owns the directory. On Windows, you must allow write access to the files for each user or group that is accessing the repository. If impersonation is not enabled, then the repository is always accessed as SYSTEM.

This means that you can only control access to files on a per-directory basis using the operating system (however see the chacl and lsacl commands for a way to do this withing cvsnt itself).

Note that users must also have write access to check out files, because cvsnt needs to create lock files (the section called “Several developers simultaneously attempting to run CVS”).

Also note that users must have write access to the CVSROOT/val-tags file. cvsnt uses it to keep track of what tags are valid tag names (it is sometimes updated when tags are used, as well as when they are created).

Normally each rcs file will be owned by the user who last checked it in. This has little significance; what really matters is who owns the directories. See also the section called “Running CVSNT as a nonprivileged user”.

cvsnt tries to set up reasonable file permissions for new directories that are added inside the tree, but you must fix the permissions manually when a new directory should have different permissions than its parent directory. If you set the CVSUMASK environment variable that will control the file permissions which cvsnt uses in creating directories and/or files in the repository. CVSUMASK does not affect the file permissions in the working directory; such files have the permissions which are typical for newly created files, except that sometimes cvsnt creates them read-only (see the sections on watches, the section called “Setting up cooperative edits”; -r, the section called “Global options”; or CVSREAD, Appendix C, All environment variables which affect CVS).

Note that using the client/server cvsnt (the section called “Remote repositories”), there is no good way to set CVSUMASK; the setting on the client machine has no effect. If you are connecting with ssh, you can set CVSUMASK in .bashrc or .cshrc, as described in the documentation for your operating system. This behavior might change in future versions of cvsnt; do not rely on the setting of CVSUMASK on the client having no effect.

Under Windows NT, because of the way directory permissions work on that platform, setting CVSUMASK will have no effect.

Using remote repositories, you will generally need stricter permissions on the cvsroot directory and directories above it in the tree; see the section called “Security considerations with password authentication”.

Some operating systems have features which allow a particular program to run with the ability to perform operations which the caller of the program could not. For example, the set user ID (setuid) or set group ID (setgid) features of unix or the installed image feature of VMS. cvsnt was not written to use such features and therefore attempting to install cvsnt in this fashion will provide protection against only accidental lapses; anyone who is trying to circumvent the measure will be able to do so, and depending on how you have set it up may gain access to more than just cvsnt. You may wish to instead consider pserver or sserver. They shares some of the same attributes, in terms of possibly providing a false sense of security or opening security holes wider than the ones you are trying to fix, so read the documentation on pserver security carefully if you are considering this option (the section called “Security considerations with password authentication”).

The attic

The attic was used in older versions of cvs to store files in the branches. Its use has been depreciated since cvsnt 2.0.15, and cvsnt no longer stores files in the Attic. It will, however, read files that have been stored in the Attic by previous versions of cvs.

instead. It should not matter from a user point of view whether a file is in the attic; cvsnt keeps track of this and looks in the attic when it needs to. But in case you want to know, the rule was that the rcs file is stored in the attic if and only if the head revision on the trunk has state dead. A dead state means that file has been removed, or never added, for that revision. For example, if you add a file on a branch, it will have a trunk revision in dead state, and a branch revision in a non-dead state.

The CVS directory in the repository

The CVS directory in each repository directory contains information such as file attributes (in a file called CVS/fileattr.xml. In the future additional files may be added to this directory, so implementations should silently ignore additional files.

The format of the fileattr.xml file is a series of XML entries describing the edit state of each file, and any access permissions that are current.

CVS locks in the repository

For an introduction to cvsnt locks focusing on user-visible behavior, see the section called “Several developers simultaneously attempting to run CVS”. The following section is aimed at people who are writing tools which want to access a cvsnt repository without interfering with other tools acessing the same repository. If you find yourself confused by concepts described here, like read lock, write lock, and deadlock, you might consult the literature on operating systems or databases.

cvsnt now uses the LockServer to handle lock concurrency in a dynamic way (see the section called “The CVSNT lockserver”. This following section refers to the obsolete filesysem lock method, which may still be in use on some sites.

Any file in the repository with a name starting with #cvs.rfl. is a read lock. Any file in the repository with a name starting with #cvs.wfl is a write lock. Old versions of cvsnt (before cvsnt 1.5) also created files with names starting with #cvs.tfl, but they are not discussed here. The directory #cvs.lock serves as a master lock. That is, one must obtain this lock first before creating any of the other locks.

To obtain a readlock, first create the #cvs.lock directory. This operation must be atomic (which should be true for creating a directory under most operating systems). If it fails because the directory already existed, wait for a while and try again. After obtaining the #cvs.lock lock, create a file whose name is #cvs.rfl. followed by information of your choice (for example, hostname and process identification number). Then remove the #cvs.lock directory to release the master lock. Then proceed with reading the repository. When you are done, remove the #cvs.rfl file to release the read lock.

To obtain a writelock, first create the #cvs.lock directory, as with a readlock. Then check that there are no files whose names start with #cvs.rfl.. If there are, remove #cvs.lock, wait for a while, and try again. If there are no readers, then create a file whose name is #cvs.wfl followed by information of your choice (for example, hostname and process identification number). Hang on to the #cvs.lock lock. Proceed with writing the repository. When you are done, first remove the #cvs.wfl file and then the #cvs.lock directory. Note that unlike the #cvs.rfl file, the #cvs.wfl file is just informational; it has no effect on the locking operation beyond what is provided by holding on to the #cvs.lock lock itself.

Note that each lock (writelock or readlock) only locks a single directory in the repository, including Attic and CVS but not including subdirectories which represent other directories under version control. To lock an entire tree, you need to lock each directory (note that if you fail to obtain any lock you need, you must release the whole tree before waiting and trying again, to avoid deadlocks).

Note also that cvsnt expects writelocks to control access to individual foo,v files. rcs has a scheme where the ,foo, file serves as a lock, but cvsnt does not implement it and so taking out a cvsnt writelock is recommended. See the comments at rcs_internal_lockfile in the cvsnt source code for further discussion/rationale.

How files are stored in the CVSROOT directory

The $CVSROOT/CVSROOT directory contains the various administrative files. In some ways this directory is just like any other directory in the repository; it contains rcs files whose names end in ,v, and many of the cvsnt commands operate on it the same way. However, there are a few differences.

For each administrative file, in addition to the rcs file, there is also a checked out copy of the file. For example, there is an rcs file loginfo,v and a file loginfo which contains the latest revision contained in loginfo,v. When you check in an administrative file, cvsnt should print

cvs commit: Rebuilding administrative file database

and update the checked out copy in $CVSROOT/CVSROOT. If it does not, there is something wrong (Appendix G, Dealing with bugs or getting help). To add your own files to the files to be updated in this fashion, you can add them to the checkoutlist administrative file (the section called “The checkoutlist file”).

By default, the modules file behaves as described above. If the modules file is very large, storing it as a flat text file may make looking up modules slow (I'm not sure whether this is as much of a concern now as when cvsnt first evolved this feature; I haven't seen benchmarks). Therefore, by making appropriate edits to the cvsnt source code one can store the modules file in a database which implements the ndbm interface, such as Berkeley db or GDBM. If this option is in use, then the modules database will be stored in the files modules.db, modules.pag, and/or modules.dir.

For information on the meaning of the various administrative files, see Appendix B, Reference manual for Administrative files.