PathName

Overview

In Common Lisp a pathname is an object that represents a file or directory in the file system. It has several components (discussed in more detail below) that separately hold certain information such as a file's name, its path in the hierarchical file system, etc. In Main.CLforJava, the pathname type has been expanded to handle not merely entities in a file system but any resource that can be identified by a Uniform Resource Identifier (URI). In conjunction with this expansion of a pathname's functionality, Main.CLforJava has also expanded the Common Lisp concept of streams to support opening connections to remote resources identified by a URI.

File System Concepts

  • Only physical pathnames are considered here; logical pathnames are ignored for the time being.

  • A namestring is a string that represents a filename. In general, the syntax of namestrings involves the use of implementation-defined conventions, usually those customary for the file system in which the named file resides.

  • Pathnames are structured objects that can represent, in an implementation-independent way, the filenames that are used natively by an underlying file system. In addition, pathnames can also represent certain partially composed filenames for which an underlying file system might not have a specific namestring representation.

  • Many file systems permit more than one filename to designate a particular file. Even where multiple names are possible, most file systems have a convention for generating a canonical filename in such situations. Such a canonical filename (or the pathname representing such a filename) is called a truename.

  • The truename of a file may differ from other filenames for the file because of symbolic links, version numbers, logical device translations in the file system, logical pathname translations within Common Lisp, or other artifacts of the file system.

  • The truename for a file is often, but not necessarily, unique for each file. For instance, a Unix file with multiple hard links could have several truenames. (See here for some specific examples of truenames.)

Glossary of Relevant Lisp terms

*default-pathname-defaults*
A global variable holding a default pathname with all components filled in a system-dependent manner. e.g. a working directory. It is used by several pathname operations to fill in missing parts of the pathname. Usually, this is the current working directory. See Specs for more details.

file
n. a named entry in a file system, having an implementation-defined nature.

file stream
n. an object of type file-stream. see FileStream

file system
n. a facility which permits aggregations of data to be stored in named files on some medium that is external to the Lisp image and that therefore persists from session to session. see FileSystem?

filename
n. a handle, not necessarily ever directly represented as an object, that can be used to refer to a file in a file system. Pathnames and namestrings are two kinds of objects that substitute for filenames in Common Lisp.

namestring
n. a string that represents a filename using either the standardized notation for naming logical pathnames described in Section 19.3.1 (Syntax of Logical Pathname Namestrings), or some implementation-defined notation for naming a physical pathname.

pathname
n. an object of type pathname, which is a structured representation of the name of a file. A pathname has six components: a host, a device, a directory, a name, a type, and a version.

pathname designator
n. a designator for a pathname; that is, an object that denotes a pathname and that is one of: a pathname namestring (denoting the corresponding pathname), a stream associated with a file (denoting the pathname used to open the file; this may be, but is not required to be, the actual name of the file), or a pathname (denoting itself). See Section 21.1.1.1.2 (Open and Closed Streams).

truename
  1. n. the canonical filename of a file in the file system. See Section 20.1.3 (Truenames).
  2. a pathname representing a truename.

wild
  1. adj. (of a namestring) using an implementation-defined syntax for naming files, which might ``match'' any of possibly several possible filenames, and which can therefore be used to refer to the aggregate of the files named by those filenames.
  2. (of a pathname) a structured representation of a name which might ``match'' any of possibly several pathnames, and which can therefore be used to refer to the aggregate of the files named by those pathnames. The set of wild pathnames includes, but is not restricted to, pathnames which have a component which is :wild, or which have a directory component which contains :wild or :wild-inferiors.
  • See the function wild-pathname-p.

References

Main.CLforJava Forum

HyperSpec

Java Classes

Core Java Classes Javadoc links
Java.io.File Javadoc link
java.net.URI Javadoc link
Apache FilenameUtils Javadoc link

Abstract Factory Patterns

Uniform Resource Identifier (URI) Syntax

Implementation

For the Lisp Programmer

The Main.CLforJava implementation of pathnames differs somewhat from that of Common Lisp. Main.CLforJava pathnames handle not only traditional file system pathnames, but also Uniform Resource Identifiers (URIs). Where the Lisp programmer finds pathname related functionality to differ from that of Common Lisp, it is usually a result of accommodating URIs.

For the most part one can create standard file based pathnames like they would in other Lisp implementations. The two functions for doing so are PATHNAME and MAKE-PATHNAME. PATHNAME takes a string or stream argument. MAKE-PATHNAME allows the function caller to specify as keyword-value pairs individual components of a pathname. The general differences found in Main.CLforJava pathnames will be discussed in this section. Please be sure to read the other sections that follow for more specific information regarding pathnames, their components, and the functions that work with them.

General Concepts

The version component of a pathname will always be the keyword :unspecific. The device component will hold a keyword indicating a URI scheme. Even where a standard file based pathname is created, the device component will hold the value :FILE. Where something like drive letter would be expected in the device component it is moved to the directory component. The host component will typically be NIL for a file based pathname but read on for information about URIs.

There is no way to distinguish between a relative URI and a relative file-based pathname. Therefore, all relative pathnames are considered to be file-based, that is, their device component is :file. This poses no problems since a relative URI cannot be really be used for anything until it has been merged with an absolute URI (which specifies the scheme and perhaps other necessary information). Therefore, a file-based relative pathname can always be merged with a non-opaque, absolute, URI using MERGE-PATHNAMES.

There are some additional functions implemented in Main.CLforJava for dealing with URIs. A URIs host is placed in the pathname's host component (if the URI is opaque the entire "scheme specific part" is put in the host component). The standard PATHNAME-HOST will retrieve this information. Non-opaque URIs have an "authority" concept which contains in one string the user information, host, and port number. This entire string can be retrieved with PATHNAME-AUTHORITY. As already mentioned, the host part is already in the pathname's host component. PATHNAME-USER and PATHNAME-PORT will retrieve the rest of the information. Also, some URIs (specifically URLs) may have a query or fragment section. These become part of the directory component of a pathname. To parse them separately there is URI-QUERY and URI-FRAGMENT. Finally, there is URI-OPAQUE-P which returns a Boolean T or NIL indicating whether or not the URI represented by a pathname is opaque.

Using the PATHNAME function

Windows users need to be cognizant of the need to use double black slashes in strings since that character is also the escape character.

Under the hood, an attempt is made first to make a valid, absolute URI from the string argument. If this cannot be done, it is assumed to be a standard file based pathname. Since there can never be a backslash in a URI, a Windows style pathname will never get parsed as a URI. But ambiguity may occur with Linux-like file systems. For example, consider "C:/foo/file.txt". The "C:" is not a drive letter (a Windows concept), but a directory named "C:". This would be parsed as a URI with a scheme of "C". The issue here is the colon which indicates a URI scheme. A possible workaround for this would be to use MAKE-PATHNAME to create a pathname with the device explicitly set to :FILE.

Example 1

  • (setq p1 (pathname "C:\\foo\\bar\\file.txt")) => C:\foo\bar\file.txt
  • (pathname-device p1) => :FILE
  • (pathname-directory p1) => (:ABSOLUTE "C:" "foo" "bar")

Example 2

  • (setq p1 (pathname "http://somehost.com/foo/file.htm")) => http://somehost.com/foo/file.htm
  • (pathname-host p1) => "somehost.com"
  • (pathname-device p1) => :http
  • (pathname-directory p1) => (:ABSOLUTE "foo")
  • (pathname-name p1) => "file"
  • (pathname-type p1) => "htm"

Using the MAKE-PATHNAME function

This function works mostly like in other implementations. Remember, however, that the device component is always a keyword. MAKE-PATHNAME will accept a string as the value of this component and it will be parsed as a standard file based pathname. The device value will be set to :FILE. Whatever string was entered as the device component value will be moved to the directory component just like the PATHNAME function does with a drive letter. If a URI is intended, a keyword representing the intended scheme must be entered as the device component's value. For the programmer's convenience, several popular URI schemes are already interned as keywords for use in the device component. These include :HTTP, :HTTPS, :NEWS, :FILE, and :MAILTO. Otherwise, the programmer must first make a symbol from the intended scheme string (e.g. using MAKE-SYMBOL). Then this symbol may be set as the device component's value.

One more subtle difference involves the possible :DEFAULTS keyword argument to MAKE-PATHNAME. In Main.CLforJava, no merging is done if this argument is absent. If it is specified, it must be explicitly given the value of a pathname; there is no default pathname for this purpose. The pathname created from the other arguments to MAKE-PATHNAME will be merged (using the MERGE-PATHNAME function) with the pathname value of :DEFAULTS and the new pathname will be returned.

Example 1

  • (setq p1 (make-pathname :device :http :host "clforjava.cofc.edu" :directory '(:absolute "foo" "bar"))) => http://clforjava.cofc.edu/foo/bar/
  • (pathname-device p1) => :HTTP
  • (pathname-host p1) => "clforjava.cofc.edu"
  • (pathname-name p1) => "file"
  • (pathname-type p1) => "txt"
  • (pathname-version p1) => :UNSPECIFIC

Example 2

  • (setq p1 (make-pathname :device "C:" :directory '(:absolute "foo" "bar") :name "file" :type "txt")) => clforjava.cofc.edu\C:\foo\bar\file.txt
  • (pathname-device p1) => :FILE
  • (pathname-directory p1) => (:ABSOLUTE "C:" "foo" "bar")
  • (pathname-name p1) => "file"
  • (pathname-type p1) => "txt"
  • (pathname-version p1) => :UNSPECIFIC

Example 3

  • (setq p1 (make-pathname :name "file" :version 3)) => file
  • (pathname-version p1) => :UNSPECIFIC

For the Java Programmer

Pathnames will be implemented as immutable types. The lisp.common.type.Pathname interface contains a nested abstract factory class which contains logic to select an appropriate concrete factory based on whether the pathname to be created is file system based (traditional Lisp pathname) or URI based. The abstract factory returns an instance of a concrete factory and this instance is used to call a newInstance method. The logic which decides whether something is a URI or a file based pathname works great with Windows style pathnames (where the blackslash precludes any possibility of it being a URI) but can lead to some ambiguous situations with Linux style pathnames. See For the Lisp Programmer for more information.

Creating a pathname within Java looks something like this:

  • // the abstract factory's getFactory is parsing arg1 to figure out what sort
  • // of concrete factory should be returned.
  • Pathname.Factory fact = Pathname.Factory.getFactory((String) arg1);
  • //the concrete factory's newInstance method is used to create the new pathname
  • filePath = fact.newInstance((String)arg1);

Each newInstance method has one of the following signatures:

  • newInstance(lisp.common.type.String pathNamestring)
  • newInstance(java.lang.String pathNamestring)
  • newInstance(lisp.common.type.FileStream fStream)
  • newInstance(T host, T device, List directory, T name, T type, T version)

Lisp File System Features in Main.CLforJava

Types

Lisp type Java Implementation Description Status
pathname Pathname.java The Java interface for Lisp type pathname under package lisp.common.type. This will contain the abstract factory as discussed above as well as the required attributes and methods. All pathname implementations will implement this interface. Implemented
pathname PathnameImpl.java An abstract class which defines the six standard pathname components (discussed in more detail below) and implements the Pathname interface. All specific implementations of pathnames will inherit from this. Implemented
pathname PathnameFileImpl.java A concrete class which implements the standard file based pathname. Implemented
pathname PathnameURI.java A concrete class which implements URI based pathnames. Implemented

Errors

Lisp Error Java Exception Description Status
file-error FileErrorException Thrown when file operations fail. Implemented
type-error IllegalArgumentException Thrown when an argument to a function is not of the type expected. Implemented
Note: All thrown exceptions are to be wrapped in FunctionException

Global Variables

Lisp variable Description Status
*default-pathname-defaults*
Holds some default pathname. At Main.CLforJava startup this gets set to the working directory (typically the directory in which Main.CLforJava was launched). Implemented

Pathname Creation

These next two functions are involved in creating pathnames in Lisp. The make-pathname function is straightforward as the caller specifies what the values of each component are in keyword-value pairs (see keywords below). However, the pathname function requires parsing the function argument into its separate components. This is further discussed in the section on pathname components below. The pathname function accepts either a string (both Java and Lisp) or a stream argument. In Common Lisp only a file stream is applicable, but CLforJava will eventually be able to handle URI connection streams as well.
Lisp Functions Java Class Description Status
pathname Pathname returns a pathname denoted by its single argument. The argument can be a string or a stream. See also Using the PATHNAME function Implemented
make-pathname MakePathname constructs and returns a pathname based on component values passed. In addition to being able to take up to six keyword-value pairs, this may take the :case keyword which impacts in what case the pathname is stored. Also it can take a :defaults keyword with another pathname as its value. This value is merged with the components specified by the caller. See also Using the MAKE-PATHNAME function . Implemented

Convenience Functions For The Java Programmer

These two functions are not part of our Lisp implementation. They exist just as a convenience for the Java programmer and are methods which can be called on any instance of a pathname object. Their existence in any future pathname implementations is enforced by the Java pathname interface.
Lisp Functions Java Method Description Status
N/A asFile Returns an instance of java.io.File representing the pathname or null if the pathname cannot be created as a File instance. Implemented
N/A asURI Returns an instance of java.net.URI representing the pathname. Since file names can always be represented as a URI of scheme file:// this should never return null. Implemented

Pathname Components

Component Description
host For file based pathnames this will always be NIL unless explicitly supplied in a MAKE-PATHNAME call. In the event of a Windows-style share, like "\\computer\foo\file.txt" the "computer" part will be an element of the directory (the second element, right after the keyword that specifies if the path is relative or absolute). For URI based pathnames this will hold the URI's authority information, or the "scheme specific part" if the URI is opaque. This may lead to some unexpected results but it is due to the parsing rules for URIs. If a URI is opaque, the scheme specific part will not be further parsed. So for "mailto:user@cofc.edu", which is opaque, the pathname's host component will hold "user@cofc.edu". That is, the username is not parsed separate from the network host. Again, this is dictated by the rules for URI parsing. Be careful not to confuse the URI's host component with the pathname's host component. The latter holds the entire authority of which a URI host is only one part.
device This will hold a URI scheme keyword such as :file, :mailto, :http, etc. These keywords are not part of the Lisp standard but are part of our effort to support URI based pathnames. The standard Lisp pathname will always have :file in this slot.
directory This will be a list of the directory path indicated in the pathname, not including the file name, if any. The first item in the list must be a Symbol (either :absolute or :relative) and each subsequent list member, if any, will be an element in the directory path hierarchy. Basically, these items will be all the tokens that are not otherwise assigned to host, device, name, or type. Alternately, if a given pathname is URI based, the directory will contain either :absolute or :relative followed by the URI path, query, and fragment, if they exist.
name If the pathname ends with a file separator character (one of the slashes) then there is no name; it gets NIL. Else, the name will be the last element of the pathname, excluding the type, if any (see the type component description below). This will be NIL (unless explicitly supplied in a make-pathname call) if the pathname is URI based.
type If there is no name, there will also be no type; this gets NIL. Else, the name will be searched for a dot. If the dot is the first character in the name, it will be assumed to be part of the name (as in a typical hidden file). Otherwise, the right-most dot will be located (in case there is more than one dot in the name) and the substring to the right of this right-most dot will be designated as the type. This will be NIL (unless explicitly supplied in a make-pathname call) if the pathname is URI based.
version The version concept is not meaningfully supported in today's major file systems. Therefore, this will always be :unspecific.
Note: The mapping of these six components to specific file system concepts is implementation-defined.

Pathname Component Returning Functions

These next six functions take a pathname argument and return the indicated component. See the section on pathname components above for more information. Each of these should throw a type-error if the first argument is not a pathname. All except pathname-version take the :case argument (see section on Symbols for more details).
Lisp Functions Java Class Description Status
pathname-host PathnameHost Returns the host or NIL. For URI based pathnames this will return the URI's authority component or "scheme specific part" for opaque URIs. It will be NIL for file based pathnames unless explicitly set using MAKE-PATHNAME. Be careful not to confuse the URI's host component with the pathname's host component. The latter holds the entire authority of which a URI host is only one part. Implemented
pathname-device PathnameDevice Returns the device or NIL Implemented
pathname-directory PathnameDirectory Returns a directory list or NIL Implemented
pathname-name PathnameName Returns the file name or NIL. Implemented
pathname-type PathnameType Returns the file type or NIL. Implemented
pathname-version PathnameVersion This will always return :unspecific Implemented

Pathname to Namestring Functions

These next five functions convert pathnames to namestrings using the system-dependent form. Every pathname implementation (file, URI, etc.) has a toString() method which takes care of ensuring that the output is formatted properly (i.e. a URI has a different display form than a file).
Lisp Functions Java Class Description Status
namestring Namestring Returns a string representation of the pathname. Implemented
file-namestring FileNamestring Will call the name getter and type getter and concatenate the two for the return string. Implemented
directory-namestring DirectoryNamestring Returns the directory list in a string format, complete with the proper separator characters Implemented
host-namestring HostNamestring Returns the host name component as a string. Implemented
enough-namestring EnoughNamestring Returns an abbreviated namestring that is just sufficient to identify the file named by submited pathname, string, or file stream, when considered relative to the default provided or if none is provided, *default-pathname-defaults* is used. Implemented

File Operations

Lisp Functions Java Class Description Status
delete-file DeleteFile Deletes the file passed in Implemented
directory Directory Returns all files on file system that match the pathspec arg. Can take wildcards. Implemented
file-author FileAuthor Following many other Lisp implementations, this function is not meaningfully implemented. It always returns NIL. Implemented
file-length FileLength Returns the length of a file. The unit of length is the byte. Implemented
file-position FilePosition With a single arg (a stream), returns the current position in the stream. With 2 args will change the position. Implementation details to be determined. Implemented
file-write-date FileWriteDate Returns date file created or last written to Implemented
load Load loads the specified file into memory from secondary storage. Implemented
open Open Creates, opens and returns a file stream. Implemented partially
probe-file ProbeFile Returns a physical pathname if a file exists or NIL if it does not. Implemented
rename-file RenameFile Renames a file and returns the new file name as a pathname if successful. Implemented
with-open-file WithOpenFile This macro uses open to open a stream and perform an arbitrary number of actions on the stream as indicated in the body of the macro call. Implementation details to be determined. Unimplemented
ensure-directories-exist EnsureDirectoriesExist Determines if the specified directory exists, and if not tries to create it. Implemented
truename Truename Returns the canonical filename(physical pathname) indicated by its argument (a pathname, string or a file stream) Implemented

Other Functions

Lisp Functions Java Class Description Status
pathnamep PathnameP returns true if object passed in is of type pathname, else false. This will simply check to see if the argument is an instance of Pathname Implemented
parse-namestring ParseNamestring Takes a string, pathname, or file stream as arg. If arg is pathname the pathname is returned along with 0. If arg is a file stream, the pathname associated with the file stream is returned along with 0. If arg is a string, it is parsed from the given start postion to the given end position and returns the resulting pathname from the subString along with the ending position of the string. Implemented
wild-pathname-p WildPathnameP Tests a pathname for the presence of the symbol :wild or the * wild card character. Implemented
pathname-match-p PathnameMatchP returns true if pathname matches a wild pathname passed in as arg, otherwise NIL. Implemented
translate-pathname TranslatePathname This takes a pathname designator that matches a supplied from-wildcard into a corresponding pathname that matches a supplied to-wildcard and then returns the pathname. See the spec for more details. Implemented
merge-pathnames MergePathnames This merges two pathnames (the second one can be left out and it defaults to the value of *default-pathname-defaults* ) into one pathname. Implemented
user-homedir-pathname UserHomedirPathname Returns a pathname object naming the user's home directory. Implemented
file-error-pathname FileErrorPathname Takes a file-error as an argument and returns the offending pathname. Will have to be implemented when error system for lisp is constructed. Unimplemented

Extensions to Lisp Standard

NOTE: All of the Java classes are in the Java package lisp.extensions.function.
Lisp Functions Java Class Description Status
current-directory CurrentDirectory returns the current directory as a pathname Implemented
pathname-user-info PathnameUserInfo parses the URI user information from a URI based pathname's host component. Returns a Lisp string or NIL if there is no user info defined. Note that per URI specifications (see References) this will not return a user name from an opaque URI. The URI "mailto:someuser@cofc.edu"" is opaque, and the entire string "someuser@cofc.edu" is atomic per URI parsing rules and would be in the host component of the pathname as is. This function would return NIL if applied to that URI. However, in a non-opaque URI like "http://someuser:somepw@www.cofc.com/foo/index.htm" this function would return the string "someuser:somepw". Implemented
pathname-port PathnameURIPort Parses the URI port information from a URI based pathname's host component. Returns an integer. A valid port will be a nonnegative number. If there is no defined port then -1 is returned. Implemented
pathname-authority-host PathnameAuthorityHost Returns the URI's host component which is a subpart of the authority component. Be careful not to confuse the URI's host component with the pathname's host component. The latter holds the entire authority of which a URI host is only one part. Returns a Lisp string or NIL if there is no URI host defined. Implemented
pathname-query PathnameURIQuery Parses the URI query information from a URI based pathname's path component. Returns a Lisp string or NIL if there is no query defined. Implemented
pathname-fragment PathnameURIFragment Parses the URI fragment information from a URI based pathname's path component. Returns a Lisp string or NIL if there is no fragment defined. Implemented
uri-opaque-p URIOpaqueP Determines if a given URI is opaque (which means it cannot be further parsed). Returns Boolean T or NIL. Implemented

Keyword Symbols

In this section we will discuss implementation decisions regarding several Lisp keywords that are relevant to pathnames. Keywords in Main.CLforJava are defined in lisp.common.type.Keyword.java.

Dealing with :case

:case has two possible values -- :common or :local. These keywords function the same under both Linux and Windows. If the value is :local then the case of the pathname is not altered from the arguments given to pathname or make-pathname. If the value is :common then one of three things may happen to the case. If the case was originally mixed, then it will be unaltered. If the case was originally all upper or all lower, then it will be altered to be all the opposite case. These possible transformations are performed separately for each element of the pathname (e.g. if :case :common is used on the pathname "/foo/BaR/PIG" the result would be "/FOO/BaR/pig".

Dealing with :wild and wild cards

The symbol :wild simply matches anything, like the common usage of the "*" as a wild card character. Our implementation will define "*" as the only wild card character.

Using :unspecific symbol for pathname components

It is possible for a pathname component to have the value of the symbol :unspecific, which is different from NIL in a subtle way. NIL means the component is unfilled. :unspecific means the component is unfilled and should not ever be filled (as it might be during a merge-pathname). We will always fill the version component slot with :unspecific.

To Do List

  • Finish the functions (see status above for what still needs to be done)
  • Develop unit tests.
  • Run unit tests and fix problems
  • Ensure that ".." is properly handled in pathnames
  • Ensure that relative pathnames of can be merged with absolute URIs where appropriate
  • Thoroughly document our changes to streams and the open function over in the streams section of TWiki and link to it from here where appropriate.

Topic revision: r75 - 2009-03-11 - 20:26:22 - MadelineWilliams
 
Home
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback