Comparing Linux/UNIX Binary Package Formats
This is a comparison of the deb, rpm, tgz, slp, and pkg package formats,
as used in the Debian, Red Hat, Slackware, and Stampede linux
distributions respectively (pkg is the SVr4 package format, used in 
Solaris). I've had some experience with each of the package formats,
both building packages, and later in my work on the
Alien package conversion
program.
I've tried to keep this comparison unbiased, however for the record, I'm a
fan of the deb format, and a Debian developer. If you discover any bias or
inaccuracy in this comparison, or any important features of a package format
I have left out, please mail me so I can
correct it. Several people have already done so. I'm also looking for data
to fill in the places marked by `?'.
This comparison deals only with the package formats, not with the various
tools (dpkg, rpm, etc.), that are used to deal with and install the packages.
It also does not deal with source packages, only binary packages.
Package format comparison table.
What is compared.
This section deals with ensuring that you know who created the package, and
that you can check the package installed on your system to see if the files
in it have ben modified since you installed it.
- signed packages
- 
Does the package format contain internal support for a GPG or PGP 
signature that can be used to verify who created it?
- checksums
- 
Are checksums available for all the files in the package?
- permissions, owners, etc
- 
Is information on the files in the package, their proper permissions, sizes,
owners, groups, major and minor number (for devices), etc, available?
Recognising that it's important sometimes to be able to peer inside packages
without using their package managers, this section compares how the various
packages can be processed with tools available on any linux system
[3].
- recognizable by file
- 
Is the package format able to be recognized by file?
- data unpackable by standard tools
- 
Can an experienced user, when presented with a package in this
format, extract its payload using only tools that will be on any linux
system? They can remember a few facts to help them deal with the format,
but remembering file offsets and stuff like that is too hard.
- metadata accessible by standard tools
- 
If the package has some sort of metadata (ie, package name, description,
version) contained in it, can this data be accessed by standard tools,
without too much difficulty?
- creatable by standard tools
- 
Can a package be created using standard tools, without too much difficulty? 
Metadata is my term for the information about a package contained in the
package. This includes things like the package name, description, and
version number.
- name
- 
Does the package have a name in the metadata?
- version
- 
Does the package have a version number in the metadata?
- description
- 
Is there a place in the metadata for a description of the package?
- dependencies
- 
A dependency says a package needs another package to be installed for the
first package to work properly.
- recommendations
- 
A recommendation says a package will almost always need to have another
package installed.
- suggestions
- 
A suggestion says a package may sometimes work better if another package is
installed. The user can just be informed of this as a FYI. 
- conflicts
- 
A conflict is a package that cannot be installed when this package is
installed. One common reason is if the two packages both contain the same
files.
- virtual packages and provides
- 
This means that there are so called "virtual packages", such as a web
browser, or a mail delivery system, and packages can say they provide those
virtual packages, while other packages can depend on the virtual packages. 
- versioned dependencies and conflicts
- 
A package can depend on or conflict with (or recommend, etc.), a specific
version of a package, or all versions > or < a given version.
- boolean package relationshipss
- 
This means that a package can depend, conflict, etc on a package AND 
(another package OR a third package). Any boolean expression must be
representable, no matter how complex.
[11]
- file dependencies
- 
This means a package can require that some other package - any other package
- be installed that contains a given file (like /bin/sh) [13].
- copyright info
- 
The package's metadata contains basic copyright information. This is useful
for automatic copyright sorting, etc.
- grouping
- 
The package can be assigned to a group (ie, web browsers, libraries), which
might be used to group the packages when viewing a list of available
packages, etc. This makes it easier to deal with large groups of packages.
- priority
- 
The package can be assigned a priority, which says how important this
package is to the system. For example, packages with high priority should be
looked at carefully when you are setting up a system, but you can skip
installing all the packages with low priority and still know you'll still get
a functional unix system. 
The ability to categorize files depending on what they are used for, so they
can be dealt with in special ways.
- config files
- 
Are config files supported? These are files that the user will typically
want to edit, so when a new version of a package is installed, the package
manager should be able to know to leave them alone, or do something smart like
prompt the user for what to do if they have modified the files, or at least
make backups of the user's changes before overwriting them. (Maybe I need more
granularity here?)
- documentation files
- 
Can documentation files be specially marked? This could be useful to help a
user find documentation. 
- ghost files
- 
Ghost files are files that are not actually present in the package, but are
listed as being a part of it once the package is installed. This is useful
for log files.
These are programs that are contained in the package, to be run by the
package manager when the package is installed, or uninstalled, or at other
times.
- binary programs allowed
- 
Must these programs be scripts, or can compiled binaries be used as
well?
- pre-install program
- 
A program to be run by the package manager before the package is installed
on the system.
- post-install program
- 
A program to be run by the package manager after the package is installed on
the system.
- pre-remove program
- 
A program to be run by the package manager before the package is removed.
- post-remove program
- 
A program to be run by the package manager after the package is removed.
- verify program
- 
A program to be run by the package manager when the state of the installed
package is being verified.
- triggers
- 
This is a whole set of programs, that are run not when this
package changes state, but when another package changes state.
Design and capabilities vary widely.
How well the package format is able to grow to meet future needs. This is of
great importance. Many of the comparisons above have little value in the
face of this section, because new package programs, new metadata fields, etc
can all be added to a scalable package format with little difficulty.
- no hard-coded limits
- 
Are there no limits hard-coded into the package format, that might prevent
it from expanding to meet future needs? For example, are package names or
versions of unlimited size? 
- new metadata
- 
Can new information (text, binary data, whatever) be added to the metadata
easily, without changing the package format?
- new section
- 
Can the whole new sections be added to the packages, without changing the
package format? For example, could the package format be expanded to have a
pgp signature attached at the end, or to have a second set of data files,
compiled for a different architecture or with different optimizations,
attached the end? This is the ultimate test of how flexible the format is,
I'm basically asking, was it designed to cope with unforeseen new requirements? 
- format version data
- 
Is there some way to look at a package and tell which version of the package
format it is using? In extreme cases, this means, the whole package format
can be thrown out and redesigned but old tools will still be able to read
enough of the packages to know they can't deal with them.
Todo.
- relocatable packages
- support for arch name in metadata, arch indep packages
- multiple version of the same package can be installed simultaneously
(is this really a package format issue?)
- info available to package programs -- The programs may find various
information useful to make decisions while they are running. Of course, all
of them can look at what's currently on the filesystem, run other programs and
look at the output, etc. This lists other information that may be useful.
(old package version, etc)
Footnotes.
1. Not yet widely used though.
2. md5sums file available in control data, but not
explicitly part of packaging format, some packages omit it
3. Why standard linux tools, not unix tools in general? It's
been pointed out that eg, gzip is not at all standard on all the unix systems
out there.
4. 
The admin would only have to remember that a deb is an ar archive, containing
some tarballs.
5. 
rpm2cpio can do it, but it's not a standard tool, except on rpm-based
systems. Some fairly short programs can do it, but none of them are
something you'd want to memorize.
6. 
Assuming that bunzip2 is a standard linux tool, or that the package uses
gzip compression instead. You need only remember that the package starts
with its payload; the metadata is tacked on the end and will be ignored.
7. 
Most repositories use a specific "datastream" format, while some
others simply use tarballs. In the case of tarballs, yes. For the
datastream format, a pkgtrans program is available on systems using the pkg
format, but not quite standard enough for the purposes of this question.
8. 
Most repositories use a specific "datastream" format, while some
others simply use tarballs. In the case of tarballs, yes.
9. 
Although apt currently has a bug (#222701) with debs created with ar.
10. 
There's an install/description file for this information in at
least some Slackware tgz files.		
11. 
Though you might have to do some factoring.
12. 
An rpm may depend on a list of packages, but boolean OR is not supported.
You can often get the same effect using virtual packages and
provides. This isn't quite the same, since it does require more coordination
between packagers, and the following relationship cannot be expressed with
provides:
foo (<< 1.1) | foo (>> 2.0)
13. Some
people consider file dependencies a gross misfeature.
14. 
Copyright info is included in debian packages, but not in an easily
extractable format.
15. 
Fields exist, but there is no standard way to use them.		
16. 
Supported by a version of this package format used at one time by SuSE Linux.
17. 
Technically, the rpm "lead" contains hard-coded limits on the package name,
but the lead is no longer really used by anything except file.
18. 
To be useful, you need to get a tag number assigned to your new piece of
metadata, which implies modifying the rpm program.
Copyright 1998-2003 by Joey
Hess under the terms of the GNU GPL, either version 2 or at
your option, any later version.
Last modified at Sat Apr 26 05:20:03 2008;
generated from this source XML by 
this program.