45

Who authors the most popular crates on crates.io?

 5 years ago
source link: https://www.tuicool.com/articles/hit/IrU3QnJ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I had a question this morning: who authors the most popular crates on crates.io ?

First, we have to figure out what we mean by “most popular.” My first guess was “top 100 by recent downloads”, so I looked at crates.io . Once I got to 100, I found that even the next few crates were ones that I heard of and would think are used often. I decided to keep going until I felt the results were more tenuous. This is obviously pretty subjective, but I also realized something: I felt like the data got a bit more noisy when I got to the 100k download mark. Sorting by this and removing a few outliers (the rustc-ap-* crates don’t count, IMHO), I had a list of 264 crates in a text file.

Furthermore, how do I determine ‘crate authorship’? Many crates, especially popular ones, are worked on by more than one person. I’m trying to come up with some really rough numbers here, so I decided to go with the first author in the Cargo.toml . It’s not perfect, but it’s good enough.

Additionally, this count counts each crate equally; if the top crate had a million downloads, and the second crate had ten downloads, both written by a different author, that counts as one for each, not a million for one and ten for the second. If that makes any sense…

So, I guess this post could have been titled “Who typed cargo new for the crates on crates.io that had over 100k downloads recently as of October 3rd 2018” but that is even longer than the already long title.

I created the text file by hand, but I wasn’t gonna look up their authors and do that math myself. So I wrote some code:

use std::{
    collections::HashMap,
    error::Error,
    fs::File,
    io::{prelude::*, BufReader},
};

fn main() -> Result<(), Box<dyn Error>> {
    let file = BufReader::new(File::open("top100.txt")?);

    let mut results = HashMap::new();

    for line in file.lines() {
        let crate_name = line?;

        let url = format!("https://crates.io/api/v1/crates/{}/owners", crate_name);

        let json: serde_json::Value = reqwest::get(&url)?.json()?;

        let username = json["users"][0]["login"]
            .as_str()
            .expect(&format!("{} is not a valid crate name", crate_name))
            .to_string();

        *results.entry(username).or_insert(0) += 1;
    }

    let mut results: Vec<_> = results.iter().collect();
    results.sort_by(|a, b| b.1.cmp(a.1));

    println!("Results: {:?}", results);

    Ok(())
}

34 lines, not too bad! This is Rust 2018 , so you may spot a few new features in there. Here’s the output:

> Measure-Command {cargo run --release | Out-Default}
    Finished release [optimized] target(s) in 0.37s
     Running `target\release\effective-rust.exe`
Results: [("alexcrichton", 61), ("carllerche", 20), ("SimonSapin", 16), ("BurntSushi", 13), ("sfackler", 11), ("seanmonstar", 10), ("bluss", 10), ("cuviper", 9), ("dtolnay", 8), ("retep998", 5), ("reem", 5), ("Amanieu", 4), ("jeehoonkang", 4), ("newpavlov", 4), ("Kimundi", 4), ("raphlinus", 3), ("nrc", 2), ("erickt", 2), ("Gankro", 2), ("Stebalien", 2), ("larsbergstrom", 2), ("withoutboats", 2), ("abonander", 2), ("dragostis", 2), ("malept", 2), ("briansmith", 2), ("tailhook", 2), ("danburkert", 2), ("jackpot51", 2), ("nikomatsakis", 2), ("vitiral", 1), ("KokaKiwi", 1), ("Aaronepower", 1), ("killercup", 1), ("Byron", 1), ("paholg", 1), ("ticki", 1), ("Gilnaa", 1), ("nox", 1), ("kbknapp", 1), ("chyh1990", 1), ("ogham", 1), ("remram44", 1), ("colin-kiegel", 1), ("droundy", 1), ("mgeisler", 1), ("sile", 1), ("tomaka", 1), ("softprops", 1), ("johannhof", 1), ("alicemaz", 1), ("emilio", 1), ("oli-obk", 1), ("TyOverby", 1), ("SergioBenitez", 1), ("mrhooray", 1), ("comex", 1), ("DaGenix", 1), ("ruuda", 1), ("sunng87", 1), ("fizyk20", 1), ("mcgoo", 1), ("indiv0", 1), ("jedisct1", 1), ("pyfisch", 1), ("Manishearth", 1), ("Geal", 1), ("lifthrasiir", 1), ("mitsuhiko", 1), ("dguo", 1), ("mackwic", 1), ("utkarshkukreti", 1), ("hsivonen", 1), ("debris", 1), ("brson", 1), ("lfairy", 1), ("steveklabnik", 1), ("mystor", 1), ("m-ou-se", 1)]


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 58
Milliseconds      : 124
Ticks             : 581240326
TotalDays         : 0.000672731858796296
TotalHours        : 0.0161455646111111
TotalMinutes      : 0.968733876666667
TotalSeconds      : 58.1240326
TotalMilliseconds : 58124.0326

This is using synchronous IO, so the ~250 HTTP requests likely dominate this time. For fun, let’s see how much async affects the code, as well as the runtime. Here’s an async version:

use std::{
    collections::HashMap,
    fs::File,
    io::{prelude::*, BufReader},
};

use futures;
use tokio::runtime::Runtime;

use futures::{Future, Stream};
use reqwest::r#async::{Client, Decoder};
use std::mem;

fn fetch() -> impl Future<Item = Vec<String>, Error = ()> {
    let file = BufReader::new(File::open("top100.txt").unwrap());

    let mut futures = Vec::new();

    for line in file.lines() {
        let crate_name = line.unwrap();
        let url = format!("https://crates.io/api/v1/crates/{}/owners", crate_name);

        futures.push(
            Client::new()
                .get(&url)
                .send()
                .and_then(|mut res| {
                    let body = mem::replace(res.body_mut(), Decoder::empty());
                    body.concat2()
                })
                .map_err(|err| println!("request error: {}", err))
                .map(move |body| {
                    let body: serde_json::Value = serde_json::from_slice(&body).unwrap();

                    let username = body["users"][0]["login"]
                        .as_str()
                        .expect(&format!("{} is not a valid crate name", crate_name))
                        .to_string();

                    username
                }),
        );
    }

    futures::future::join_all(futures)
}

fn main() {
    let mut rt = Runtime::new().unwrap();
    let results = rt.block_on(fetch()).unwrap();

    let mut counts = HashMap::new();

    for name in results {
        *counts.entry(name).or_insert(0) += 1;
    }

    let mut results: Vec<_> = counts.iter().collect();
    results.sort_by(|a, b| b.1.cmp(&a.1));

    println!("Results: {:?}", results);
}

A bit longer. I’m not super great at tokio, so there might be a way to improve this, please let me know if you know of any noob mistakes! I had to ask the Tokio gitter channel for a bit of help, but as always, they were very prompt and I got my questions answered quickly.

Here’s the runtime:

Seconds           : 22
Milliseconds      : 528

Not too bad, over twice as fast!

So, the data! The raw data is above, but if you eyeball the gaps, it looks like this:

authors count alexcrichton 61 packages carllerche 20 packages 7 authors 16-8 packages 8 authors 5-3 packages 14 authors 2 packages 233 authors 1 package

We have quite the long tail going on here!

We can’t draw too many conclusions from this data, but I do think it’s neat. I also think that it was neat that Rust is usable enough for these kinds of tasks; I didn’t feel the need to drop into Ruby. There’s a little more type tetris going on, and some more explicit error handling, but they’re surprisingly close. The async version, on the other hand, is much more complex. I think async/await might help, but I didn’t have the time to try and put that together just yet.

35

Kudos

35

Kudos


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK