brion at pobox
Oct 5, 2011, 2:30 PM
Post #9 of 32
On Wed, Oct 5, 2011 at 1:59 PM, Brion Vibber <brion [at] pobox> wrote:
Re: Preliminary git module splitup notes (MediaWiki core & extensions)
[In reply to]
> On Oct 5, 2011 1:03 PM, "Platonides" <Platonides [at] gmail> wrote:
> > I know about ControlMaster (which we can only those of us with ssh+git
> > can benefit), but just launching a new process and waiting if there's
> > something new will slow-down. OTOH git skips the "recurse everything
> > locking all subfolders" step, so it may be equivalent.
> > Maybe there's some way for fetching updates from aggregate repositories
> > at once and I am just when everything is solved, though.
> Submodules may actually work well for this, as long as something propagates
> the ext updates to the composite repo. The checked-out commit id of each
> submodule is stored in the tree, so if no changes were seen from that one
> containing repo it shouldn't have to pull anything from the submodule's
> (Not yet tested)
Ok, did some quick tests fetching updates for 16 repos sitting on my
Ping round-trip from my office desktop to Gitorious's server is 173ms,
making the theoretical *absolute best* possible time involving a round-trip
for each at 2-3 seconds.
Running a simple loop of 'git fetch' over each repo (auth'ing with my ssh
key, passphrase already provided) takes 53 seconds (about 3 seconds per
repo). This does a separate ssh setup & poke into git for each repo.
Clearly unacceptable for 600+ extensions. :)
Turning on ControlMaster and starting a long-running git clone in the
background, then running the same 'git fetch' loop took the time down to
about 10 seconds (<1s per repo). ControlMaster lets those looped 'git
fetch's piggyback on the existing SSH connection, but still has to start up
git and run several round-trips.
Better, but still doesn't scale to hundreds of extensions: several minutes
for a null update is too frustrating!
Checking them out as submodules via 'git submodule add' and then issuing a
single 'git submodule update' command takes... 0.15 seconds. Nice!
Looks like it does indeed see that there's no changes, so nothing has to be
pulled from the upstream repos. Good!
The downside is that maintaining submodules means constantly pushing commits
to the containing repo so it knows there are updates. :(
Probably the most user-friendly way to handle this is with a wrapper script
that can do a single query to fetch the current branch head positions of a
bunch of repos, then does fetch/pull only on the ones that have changed.
This could still end up pulling from 600+ repos -- if there are actually
changes in them all! -- but should make typical cases a *lot* faster.
We should check in a little more detail how Android & other big projects
using multiple git repos are doing their helper tools to see if we can just
use something that already does this or if we have to build it ourselves. :)
Wikitech-l mailing list
Wikitech-l [at] lists