Who authors the most popular crates on crates.io?
source link: https://www.tuicool.com/articles/hit/IrU3QnJ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
I had a question this morning: who authors the most popular crates on crates.io ?
First, we have to figure out what we mean by “most popular.” My first guess was “top 100 by recent downloads”, so I looked at crates.io
. Once I got to 100, I found that even the next few crates were ones that I heard of and would think are used often. I decided to keep going until I felt the results were more tenuous. This is obviously pretty subjective, but I also realized something: I felt like the data got a bit more noisy when I got to the 100k download mark. Sorting by this and removing a few outliers (the rustc-ap-*
crates don’t count, IMHO), I had a list of 264 crates in a text file.
Furthermore, how do I determine ‘crate authorship’? Many crates, especially popular ones, are worked on by more than one person. I’m trying to come up with some really rough numbers here, so I decided to go with the first author in the Cargo.toml
. It’s not perfect, but it’s good enough.
Additionally, this count counts each crate equally; if the top crate had a million downloads, and the second crate had ten downloads, both written by a different author, that counts as one for each, not a million for one and ten for the second. If that makes any sense…
So, I guess this post could have been titled “Who typed cargo new
for the crates on crates.io that had over 100k downloads recently as of October 3rd 2018” but that is even longer than the already long title.
I created the text file by hand, but I wasn’t gonna look up their authors and do that math myself. So I wrote some code:
use std::{ collections::HashMap, error::Error, fs::File, io::{prelude::*, BufReader}, }; fn main() -> Result<(), Box<dyn Error>> { let file = BufReader::new(File::open("top100.txt")?); let mut results = HashMap::new(); for line in file.lines() { let crate_name = line?; let url = format!("https://crates.io/api/v1/crates/{}/owners", crate_name); let json: serde_json::Value = reqwest::get(&url)?.json()?; let username = json["users"][0]["login"] .as_str() .expect(&format!("{} is not a valid crate name", crate_name)) .to_string(); *results.entry(username).or_insert(0) += 1; } let mut results: Vec<_> = results.iter().collect(); results.sort_by(|a, b| b.1.cmp(a.1)); println!("Results: {:?}", results); Ok(()) }
34 lines, not too bad! This is Rust 2018 , so you may spot a few new features in there. Here’s the output:
> Measure-Command {cargo run --release | Out-Default} Finished release [optimized] target(s) in 0.37s Running `target\release\effective-rust.exe` Results: [("alexcrichton", 61), ("carllerche", 20), ("SimonSapin", 16), ("BurntSushi", 13), ("sfackler", 11), ("seanmonstar", 10), ("bluss", 10), ("cuviper", 9), ("dtolnay", 8), ("retep998", 5), ("reem", 5), ("Amanieu", 4), ("jeehoonkang", 4), ("newpavlov", 4), ("Kimundi", 4), ("raphlinus", 3), ("nrc", 2), ("erickt", 2), ("Gankro", 2), ("Stebalien", 2), ("larsbergstrom", 2), ("withoutboats", 2), ("abonander", 2), ("dragostis", 2), ("malept", 2), ("briansmith", 2), ("tailhook", 2), ("danburkert", 2), ("jackpot51", 2), ("nikomatsakis", 2), ("vitiral", 1), ("KokaKiwi", 1), ("Aaronepower", 1), ("killercup", 1), ("Byron", 1), ("paholg", 1), ("ticki", 1), ("Gilnaa", 1), ("nox", 1), ("kbknapp", 1), ("chyh1990", 1), ("ogham", 1), ("remram44", 1), ("colin-kiegel", 1), ("droundy", 1), ("mgeisler", 1), ("sile", 1), ("tomaka", 1), ("softprops", 1), ("johannhof", 1), ("alicemaz", 1), ("emilio", 1), ("oli-obk", 1), ("TyOverby", 1), ("SergioBenitez", 1), ("mrhooray", 1), ("comex", 1), ("DaGenix", 1), ("ruuda", 1), ("sunng87", 1), ("fizyk20", 1), ("mcgoo", 1), ("indiv0", 1), ("jedisct1", 1), ("pyfisch", 1), ("Manishearth", 1), ("Geal", 1), ("lifthrasiir", 1), ("mitsuhiko", 1), ("dguo", 1), ("mackwic", 1), ("utkarshkukreti", 1), ("hsivonen", 1), ("debris", 1), ("brson", 1), ("lfairy", 1), ("steveklabnik", 1), ("mystor", 1), ("m-ou-se", 1)] Days : 0 Hours : 0 Minutes : 0 Seconds : 58 Milliseconds : 124 Ticks : 581240326 TotalDays : 0.000672731858796296 TotalHours : 0.0161455646111111 TotalMinutes : 0.968733876666667 TotalSeconds : 58.1240326 TotalMilliseconds : 58124.0326
This is using synchronous IO, so the ~250 HTTP requests likely dominate this time. For fun, let’s see how much async affects the code, as well as the runtime. Here’s an async version:
use std::{ collections::HashMap, fs::File, io::{prelude::*, BufReader}, }; use futures; use tokio::runtime::Runtime; use futures::{Future, Stream}; use reqwest::r#async::{Client, Decoder}; use std::mem; fn fetch() -> impl Future<Item = Vec<String>, Error = ()> { let file = BufReader::new(File::open("top100.txt").unwrap()); let mut futures = Vec::new(); for line in file.lines() { let crate_name = line.unwrap(); let url = format!("https://crates.io/api/v1/crates/{}/owners", crate_name); futures.push( Client::new() .get(&url) .send() .and_then(|mut res| { let body = mem::replace(res.body_mut(), Decoder::empty()); body.concat2() }) .map_err(|err| println!("request error: {}", err)) .map(move |body| { let body: serde_json::Value = serde_json::from_slice(&body).unwrap(); let username = body["users"][0]["login"] .as_str() .expect(&format!("{} is not a valid crate name", crate_name)) .to_string(); username }), ); } futures::future::join_all(futures) } fn main() { let mut rt = Runtime::new().unwrap(); let results = rt.block_on(fetch()).unwrap(); let mut counts = HashMap::new(); for name in results { *counts.entry(name).or_insert(0) += 1; } let mut results: Vec<_> = counts.iter().collect(); results.sort_by(|a, b| b.1.cmp(&a.1)); println!("Results: {:?}", results); }
A bit longer. I’m not super great at tokio, so there might be a way to improve this, please let me know if you know of any noob mistakes! I had to ask the Tokio gitter channel for a bit of help, but as always, they were very prompt and I got my questions answered quickly.
Here’s the runtime:
Seconds : 22 Milliseconds : 528
Not too bad, over twice as fast!
So, the data! The raw data is above, but if you eyeball the gaps, it looks like this:
authors count alexcrichton 61 packages carllerche 20 packages 7 authors 16-8 packages 8 authors 5-3 packages 14 authors 2 packages 233 authors 1 packageWe have quite the long tail going on here!
We can’t draw too many conclusions from this data, but I do think it’s neat. I also think that it was neat that Rust is usable enough for these kinds of tasks; I didn’t feel the need to drop into Ruby. There’s a little more type tetris going on, and some more explicit error handling, but they’re surprisingly close. The async version, on the other hand, is much more complex. I think async/await might help, but I didn’t have the time to try and put that together just yet.
35
Kudos
35
Kudos
Recommend
-
2
Books The problem with most authors is that they start by writing ByJosh Berno...
-
69
On Tuesday, Oct 15, starting at approximately 20:00 UTC, crates.io sustained an operational incident. You can find the status page report here , and our twe...
-
34
crate2nix crate2nix generates nix build files for rust crates using cargo . S...
-
9
On 2020-02-20 at 21:28 UTC we received a report from a user of crates.io that their crate was not available on the index even after 10 minutes since the upload. This was a bug in the crates.io webapp exposed by a GitHub ou...
-
11
There are a lot of reasons you might want to download all the crates ever uploaded to crates.io , Rust’s package registry: code analysis across the whole public ecosystem, hosting a m...
-
12
Webmaster's job is to protect the website from malicious traffic. One of the practices serving this purpose is the collection and analysis of th...
-
17
This is a cross-post of the official security advisory . The official post contains a signed version with our PGP...
-
2
Rust Package Registry
-
7
Closed Bug 1628074 Opened 9 months ago Closed 8 months ago...
-
0
Crates · arzg’s websitePart Seventeen: CratesPosted in Make A Language18 December 2020I’m not a fan of how, currently, all the submodules of crate::parser
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK