Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

request for Git statistics (or, "don't stand back, I don't know regular expressions")

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


sumanah at wikimedia

Jul 4, 2012, 7:52 PM

Post #1 of 7 (1034 views)
Permalink
request for Git statistics (or, "don't stand back, I don't know regular expressions")

For use in our monthly report, due to come out tomorrow

https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/June

I'd like to know how many unique contributors ("owners") had commits
merged into the mediawiki & mediawiki/* Gerrit projects between June
1-30 inclusive. I've had luck in using "age:4d -age:34d status:merged
project:^mediawiki.* -owner:L10n-bot" as a search on
https://gerrit.wikimedia.org to get a big paginated table of all the
commits (and then I figure I'd look for all the unique owner names and
count them), but when I try that on the command line as

ssh -p 29418 gerrit.wikimedia.org gerrit query 'age:4d -age:34d
status:merged project:^mediawiki.* -owner:L10n-bot'

I get the error "fatal: "-age:34d" is not a valid option".

I'll accept either help in running this query correctly so I get the
giant table on the command line so I can gin up the status myself, or I
will simply accept a number if you want to do my homework for me. :-)
--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


marktraceur at riseup

Jul 4, 2012, 8:01 PM

Post #2 of 7 (966 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

> I'll accept either help in running this query correctly so I get the
> giant table on the command line so I can gin up the status myself, or I
> will simply accept a number if you want to do my homework for me. :-)

It's because you've passed in that string (which was good) as one
argument to the SSH command, which is then read as multiple arguments on
the remote server. This works for me, adding double quotes around the
remote command:

ssh -p 29418 gerrit.wikimedia.org "gerrit query 'age:4d -age:34d
status:merged project:^mediawiki.* -owner:L10n-bot'"


--
Mark Holmquist
Contractor, Wikimedia Foundation
mtraceur [at] member
http://marktraceur.info



_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


sumanah at wikimedia

Jul 5, 2012, 11:10 AM

Post #3 of 7 (958 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

On 07/04/2012 10:52 PM, Sumana Harihareswara wrote:
> For use in our monthly report, due to come out tomorrow
>
> https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/June
>
> I'd like to know how many unique contributors ("owners") had commits
> merged into the mediawiki & mediawiki/* Gerrit projects between June
> 1-30 inclusive. I've had luck in using "age:4d -age:34d status:merged
> project:^mediawiki.* -owner:L10n-bot" as a search on
> https://gerrit.wikimedia.org to get a big paginated table of all the
> commits (and then I figure I'd look for all the unique owner names and
> count them), but when I try that on the command line as
>
> ssh -p 29418 gerrit.wikimedia.org gerrit query 'age:4d -age:34d
> status:merged project:^mediawiki.* -owner:L10n-bot'
>
> I get the error "fatal: "-age:34d" is not a valid option".
>
> I'll accept either help in running this query correctly so I get the
> giant table on the command line so I can gin up the status myself, or I
> will simply accept a number if you want to do my homework for me. :-)
>

Got help from Mark Holmquist and advice from Giovanni Luca Ciampaglia --
needed to use double quote marks. Mark wrote:


> It's because you've passed in that string (which was good) as one argument to the SSH command, which is then read as multiple arguments on the remote server. This works for me, adding double quotes around the remote command:
>
> ssh -p 29418 gerrit.wikimedia.org "gerrit query 'age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot'"

Mark then also wrote:

> Hm, the SSH interface only returns 500 at a time, and won't accept limit: keywords to the contrary. I've hacked together some results, but I wouldn't recommend reproducing it by hand. I could write up a script with minimal effort, I think.

It took 3 queries but he got all 1401 results. And with that:

$ grep ' name:' wmf-results.txt | sort -u | wc -l
92

So, 92 unique committers in June. Which is way better than Ohloh has
been saying, yay.

--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


marktraceur at riseup

Jul 5, 2012, 4:40 PM

Post #4 of 7 (964 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

> So, 92 unique committers in June. Which is way better than Ohloh has
> been saying, yay.

I'd like to confirm that number, 92 unique contributors in June is
absolutely correct. I've also scriptified my method, so now I can do
multiple months.

Month | Unique contributors
------------+--------------------
July so far | 60
------------+--------------------
June | 92
------------+--------------------
May | 77
------------+--------------------
April | 67
------------+--------------------
March | 34
------------+--------------------
February | 2

A note or two: July is high because we have a lot of regular committers,
I suppose. You could confirm that by graphing how many contributors are
added by adding on one day at a time. My guess is you'll get a nice
steep line at first that tapers out to nearly 0 at the end of 30 days.
Also, I'm sure the earlier months have inadequate sample sizes to be
relevant, since the extensions had to take some time to transfer over,
and apparently February was just for testing.

Of course, the coolest thing is that each month so far has seen at least
10 additional contributors! :)

The script I used to generate it is attached (since it's only 1.1 kb).
If you have a sane SSH setup already, you should be able to make it
executable and do....

$ ./gerunique "ssh -p 29418 gerrit.wikimedia.org" 0d 30d

....and get the number of contributors for the past 30 days. It will
also give you some friendly notifications, though they're largely for
debugging.

The first option can be described as "how you would ssh into gerrit if
you had to", and it's provided for the convenience of those people (like
me) whose local username doesn't match their remote username.

Cheers,

--
Mark Holmquist
Contractor, Wikimedia Foundation
mtraceur [at] member
http://marktraceur.info
Attachments: gerunique (1.06 KB)


sumanah at wikimedia

Jul 5, 2012, 6:15 PM

Post #5 of 7 (961 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

On 07/05/2012 07:40 PM, Mark Holmquist wrote:
>> So, 92 unique committers in June. Which is way better than Ohloh has
>> been saying, yay.
>
> I'd like to confirm that number, 92 unique contributors in June is
> absolutely correct. I've also scriptified my method, so now I can do
> multiple months.
>
> Month | Unique contributors
> ------------+--------------------
> July so far | 60
> ------------+--------------------
> June | 92
> ------------+--------------------
> May | 77
> ------------+--------------------
> April | 67
> ------------+--------------------
> March | 34
> ------------+--------------------
> February | 2

Thanks for that, Mark! Yeah, that's also way better than Ohloh thinks
<https://www.ohloh.net/p/mediawiki>. I've gone back and updated the
April and May months of the engineering report on mediawiki.org, e.g.,
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/May .

> A note or two: July is high because we have a lot of regular committers,
> I suppose. You could confirm that by graphing how many contributors are
> added by adding on one day at a time. My guess is you'll get a nice
> steep line at first that tapers out to nearly 0 at the end of 30 days.
> Also, I'm sure the earlier months have inadequate sample sizes to be
> relevant, since the extensions had to take some time to transfer over,
> and apparently February was just for testing.
>
> Of course, the coolest thing is that each month so far has seen at least
> 10 additional contributors! :)

That is indeed a great thing to see! And, based on the monthly reports,
I believe the most unique committers we ever had in a month was 100,
including people making localisation commits - January 2012. So I infer
that we are recovering nicely from the transition cost of the Git move.

(Also, this isn't counting people who contribute to the mobile projects
on GitHub, and really the final monthly report stat ought to. I don't
quickly see a way to ask "how many unique contributors submitted unique
pull requests to a https://github.com/wikimedia/ repo in June?" on
GitHub, though, so I'll put that off till next month.)

--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


rlane32 at gmail

Jul 5, 2012, 6:35 PM

Post #6 of 7 (958 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

> (Also, this isn't counting people who contribute to the mobile projects
> on GitHub, and really the final monthly report stat ought to. I don't
> quickly see a way to ask "how many unique contributors submitted unique
> pull requests to a https://github.com/wikimedia/ repo in June?" on
> GitHub, though, so I'll put that off till next month.)
>

It also doesn't count people who are making operations changes, or
Wikimedia site configuration changes, or are packaging debs, etc, etc.
It would be awesome to see stats for those as well. I have a feeling
that we have more contributors then the record ;).

- Ryan

_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


marktraceur at riseup

Jul 5, 2012, 8:10 PM

Post #7 of 7 (963 views)
Permalink
Re: request for Git statistics (or, "don't stand back, I don't know regular expressions") [In reply to]

> (Also, this isn't counting people who contribute to the mobile projects
> on GitHub, and really the final monthly report stat ought to. I don't
> quickly see a way to ask "how many unique contributors submitted unique
> pull requests to a https://github.com/wikimedia/ repo in June?" on
> GitHub, though, so I'll put that off till next month.)

I was bored, so I made you a Python script this time :)

It's attached, it takes a year and month as its arguments, and fetches
all the repos at github/wikimedia, then fetches their pull requests, and
then finally checks to see which pull requests match the month you
specified. Something like

$ ./githubunique 2012 06 # should give "3 unique contributors"

And yes, most months only have very few contributors, but anything we
can do to increase the count :)

> It also doesn't count people who are making operations changes, or
> Wikimedia site configuration changes, or are packaging debs, etc, etc.
> It would be awesome to see stats for those as well. I have a feeling
> that we have more contributors then the record ;).

This should be as simple as removing the "project:^mediawiki.*" bit from
the previous bash script. I'm not sure if there are other bots to
exclude in that case, though, so I'll leave it up to someone more versed
with the rest of Gerrit (Ryan?)

If there are other github repositories *not* in the wikimedia github
account, it shouldn't be hard to add those to the consideration in this
script.

P.S., a word to the wise: don't try to parse github's API requests with
bash, it's just not worth it.

P.P.S., for those who like unified counts, adding this python script to
the end of the previous bash script should be easy enough, so you could
get all of the contributors (95!) in one command if you wanted.

--
Mark Holmquist
Contractor, Wikimedia Foundation
mtraceur [at] member
http://marktraceur.info
Attachments: githubunique (0.90 KB)

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.