Revolutionizing CNR with OPIUM
by Chris Tucker
May 30th, 2007
Most
Linspire Letters address high-level topics about desktop
Linux and the state of the software industry as a whole.
Today, by contrast, we’re going to delve deeply into
the technology that makes CNR tick: the package management
system. As many of you are aware, Linspire announced a new
initiative a few months back that would bring CNR
Technology to the most popular desktop Linux distributions.
Digging into the depths of CNR and talking about the package
management system may not sound like the most scintillating
of topics, but bear with me as research started while I
was an employee at Linspire has since led to what I, and
hopefully others, consider a breakthrough in how we go about
installing software on the Linux platform. In particular,
I’m going to describe a tool we presented research
on last week at the 29th
International Conference on Software Engineering (ICSE ’07)
in Minneapolis, Minnesota, that significantly improves on
the current standard, APT, and that we hope to integrate
into both Debian and RPM based desktops in the near future.
As
with most stories, perhaps the best place to start is at
the beginning. Since the inception of Linspire we've been
making extensive use of the Debian system for packaging,
distributing, and installing software. A large part of this
system is the APT suite of tools, and in particular the
apt-get program. Behind the scenes CNR uses apt-get to figure
out just what exactly needs to be downloaded, and uses it
again later to perform the actual installations. On top
of this we’ve built an extensive system to facilitate
our work in all three areas: packaging, distribution, and
installation. As time has passed, we’ve become quite
familiar with the ups and downs of life in the Debian world,
and while for the most part the experience has been positive
there are some places where the sailing has not been so
smooth.
One
particular point of pain has always been the installation
part of the process. While APT typically does a pretty good
job of figuring out what needs to be done to install a piece
of software it does on occasion get things quite wrong.
In particular, we found over time that APT would sometimes
report that it could not install a piece of software even
though we knew that it was installable. In mathematical
terms we would say that APT is incomplete: if it doesn’t
find an installation solution it doesn’t necessarily
mean one doesn’t exist, just that APT was unable to
figure it out. This has led us to tweak CNR in several ways
to try to fool APT into finding a solution, but unfortunately
it’s still quite possible for a machine to get into
an awkward state that prevents an installation from proceeding.
To further complicate matters the problem is exceptionally
hard to test for, because it’s dependent on the packages
that are installed on the machine running the CNR client;
given there are around 20,000 packages available in CNR
(often only visible as dependencies of other packages),
the number of possible combinations of installed packages
is huge.
Runtime of
Apt-get vs. Opium
|
Another
place we’ve seen issues is in the amount of data that
has to be downloaded to install any particular application.
Sometimes our users would report being made to download
surprisingly large packages to install something, and we’d
see that there might well have been cheaper ways to deliver
what they wanted. And yet other times we’d see packages
being removed from our systems when APT would find a conflict
(a case where one package can’t be installed at the
same time as another) that could have been saved if APT
had used a different solution. All of this brought us to
three major questions, each of which we address in our ICSE
’07 paper. In this work, co-authored by myself and
another former Linspire employee, David Shuffelton, and
two professors of Computer Science at the University of
California, San Diego, Sorin Lerner and Ranjit Jhala, we
frame those questions in terms of the three problems:
1.
The Install Problem: Given a package to install,
can we figure out any way to install it on a user’s
system?
2. The Optimal Install Problem: If a package
is installable, can we find the "best" way to
install it according to some rules given by the user (e.g.
favoring the smallest total download size, or preferring
the highest-rated packages)?
3. The Optimal Uninstall Problem: If we
have to remove some package to install the requested package,
what are the "best" packages to remove according
to some rules given by the user (e.g. favoring packages
that were only installed as dependencies, rather than those
explicitly selected by the user, or simply removing as few
packages as possible)?
The
results of our research were both surprising and exciting.
We found that by treating the problem in an entirely different
way to APT (the details of which I’ll leave to the
paper, linked at the bottom of this letter) we were able
to provide solutions to all of these problems. Perhaps even
more valuably, we were able to quantify the benefits of
fixing these problems and get a concrete idea of how often
APT is wrong and what the cost is when it is.
The
paper describing our research, “OPIUM:
Optimal Package Install/Uninstall Manager” appears
in the proceedings of the ICSE ’07 conference and
is also available on my website.
In it we describe a system that treats figuring out what
to install as a system of logic, rather than as a process
of jumping from package dependency to package dependency
(in what we’d call a graph algorithm in Computer Science).
By doing this we are able to make use of extremely high-performance
tools for solving logical formulae, capable not only of
guaranteeing that they find a solution if one exists but
also of proving that the solution found is the best one
possible. Even in the early prototype stages we’ve
developed so far the runtime of OPIUM is reasonable for
use on a user machine and since the paper we’ve implemented performance
improvements that nearly eliminate the additional cost we
originally found in calculating package removals. Combine
this with our findings that APT sometimes downloads well
in excess of 100mb more than necessary on a single package
install, and that as many as one quarter of all users of
APT are at some point affected by its failure to find solutions,
and we think we have a remarkable tool on our hands. Now
that the research is published we’re integrating our
findings into the new CNR and hoping to find ways to do
the same with APT, and look forward to exploring what new
doors this may open up. If you’re scientifically minded
please read the paper, and any comments on it or this letter
are welcome to my newly configured OPIUM email address,
opium@cjtucker.com.
Keep your eye on the CNR release notes: you may be using
this technology sooner than you think!
-
Chris
To discuss this topic with others, click here!
The Linspire Letter Meter
View the Linspire Letter Meter Report
|