gmt at malth
Aug 18, 2012, 1:50 PM
Post #103 of 103
On 8/16/2012 6:26 PM, Rich Freeman wrote:
Re: Re: Questions about SystemD and OpenRC
[In reply to]
> On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol [at] gmail> wrote:
>> The limited-visibility build feature discussed a week or so ago would
>> go a long way in detecting unexpressed build dependencies.
> If portage has the
> dependency tree in RAM then you just need to dump all the edb listings
> for those packages plus @system and feed those into sandbox.
> That just requires reading a bunch of text files and no searching, so it
> should be pretty quick.
Portage could hypothetically compile such a list while it crawls the
package dependency tree, but I suspect the cost will not be small as you
> As far as I can tell the relevant calls to
> check for read access are already being made in sandbox already, and
> obviously they aren't taking forever. We just have to see if the
> search gets slow if the access list has tens of thousands of entries
> (if it does, that is just a simple matter of optimization, but being
> in-RAM I can't see how tens of thousands of entries is going to slow
> down a modern CPU even if it is just an unsorted list).
I appreciate your optimism but I think you're underestimating the cost.
Can't speak for others, but my portage db's churn too much for comfort
as is. Once we start multiplying per-package-dependency iteration by
the files-per-package iteration, that's going to be O(a-shit-load).
Of course, where there's a will there's a way. I'd be surprised if some
kind of delayed-evaluation + caching scheme wouldn't suffice, or,
barring that, perhaps it's time to create an indexed-database-based
drop-in replacement for the current portage db code.
I've enclosed some scripts you may find helpful in looking at the
numbers. They are kind-of kludgey (originally intended for
in-house-only use and modified for present purposes) but may help shed
some light, if they aren't too buggy, that is...
"dumpworld" slices and dices "emerge -ep" output to provide a list of
atoms in the complete dependency tree of a given list of atoms (add
'@system' to get the complete tree, dumpworld won't do so).
"dumpfiles" operates only on packages installed in the local system
(non-installed atoms are silently dropped), and requires/assumes that
'emerge -ep world' would not change anything if it is to give accurate
information. It takes a list of atoms, transforms them into the
complete lists of atoms in their dependency tree via dumpworld, merges
the lists together, and finds the number of files associated with each
atom in portage. Any collisions will be counted twice, since it doesn't
keep track. It also doesn't add '@system' unless you do. By default it
o A list of package atoms and the files owned by each atom (stderr)
o total atoms and files
o average filename length
What is, perhaps, more discouraging than the numbers it reports is how
long it takes to run (note: although I suspect an optimized python
implementation could be made to do this faster by a moderate constant
factor, I'm not sure if the big-oh performance characteristics can be
significantly improved without database structure changes like the ones
My disturbingly bloated and slow workstation gives these answers (note:
here it's even slower because it's running in an emulator):
greg [at] fedora64vm ~ $ time bash -c 'dumpfiles @system 2>/dev/null'
TOTAL: 402967 files (in 816 ebuilds, average path length: 66)
greg [at] fedora64vm ~ $ time bash -c 'dumpfiles chromium 2>/dev/null'
TOTAL: 401300 files (in 807 ebuilds, average path length: 66)
My workstation is surely an "outlier" as I have a lot of dependencies
and files due to multilib, split-debug, and USE+=$( a lot ). It's also
got slow hardware Raid6 and the emulator only gives it 2G of ram to work
with. But I'm a real portage user; I'm sure there's other ones out
there, if not many, with similar constraints.