Downloading all the crates on crates.io
source link: https://www.pietroalbini.org/blog/downloading-crates-io/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
There are a lot of reasons you might want to download all the crates ever uploaded to crates.io , Rust’s package registry: code analysis across the whole public ecosystem, hosting a mirror for your company, or countless other ideas and projects.
The team behind crates.io receives a lot of support request asking what’s the best and least impactful way to do this, so here is a little guide on how to do that!
Getting a list of all the crates
crates.io offers multiple way to interact with its data: the crates.io-index GitHub repository, experimental daily database dumps and the crates.io API.
The way I recommend to get the list of all the crates is to rely on the index: the experimental database dumps are more heavyweight and are only updated daily, while usage of the API is governed by the crawlers policy (limiting you to one API call per second). If you absolutely need to use the API please talk with us by emailing [email protected] , and we’ll figure out a solution.
The index is a git repository , and the format of its content is defined by RFC 2141 . There are crates such as crates-index that allow you to easily query its contents, and I recommend using them whenever possible.
Downloading the packages
The best way to download the packages is to fetch them directly from our CDN. Compared to calling the crates.io API, the CDN does not have rate limits and is faster (as the API redirects you to the CDN after updating the download count). The CDN URLs follow this pattern:
https://static.crates.io/crates/{name}/{name}-{version}.crate
For example, here is the link to download Serde 1.0.0
. Packages
are tar.gz
files.
If you want to ensure the contents of the CDN were not tampered with you can
verify the SHA256 checksum of the file you downloaded by comparing it with the cksum
field in the index.
Keeping your local copy up to date
The best way to keep your local copy up to date is to fetch a fresh list of crates available on crates.io and check if all of them are present in the local system, downloading the ones you’re missing. I personally recommend this approach as it’s less error-prone, and it heals your copy automatically if for whatever reason some of the changes are lost during a previous update.
Another interesting approach you could implement is to get the difference since
the last update of the index with git diff
, parsing its output to get the
list of crates that were added. There are also third-party crates such as crates-index-diff
that automate this process for you. This approach is more
fragile and error-prone, but it might be the only sensible solution if checking
whether you downloaded a crate or not is slow or expensive.
Common issues to be aware of
While the basics of downloading the contents of crates.io are simple, there are a couple of issues to be aware of when implementing such tooling:
-
The crates.io team strives to keep the registry as immutable as possible, but we can’t always keep that promise. The technology world doesn’t exist in a bubble, and there are laws everyone needs to abide to. Occasionally we receive takedown requests due to trademark or copyright issues, and we have to remove the crates both from the registry and the CDN: your tooling should handle existing crates disappearing.
-
To reduce the download size for cargo users we regularly squash the index repository into a single commit, and start the git history from scratch. The previous history is kept in a separate branch. To account for this we recommend running these commands to update the index:
git fetch git reset --hard origin/master
Recommend
-
45
I had a question this morning: who authors the most popular crates on crates.io ? First, we have to figure out what we mean by “most popular.” My first guess was “top 100 b...
-
69
On Tuesday, Oct 15, starting at approximately 20:00 UTC, crates.io sustained an operational incident. You can find the status page report here , and our twe...
-
34
crate2nix crate2nix generates nix build files for rust crates using cargo . S...
-
9
On 2020-02-20 at 21:28 UTC we received a report from a user of crates.io that their crate was not available on the index even after 10 minutes since the upload. This was a bug in the crates.io webapp exposed by a GitHub ou...
-
12
Webmaster's job is to protect the website from malicious traffic. One of the practices serving this purpose is the collection and analysis of th...
-
17
This is a cross-post of the official security advisory . The official post contains a signed version with our PGP...
-
3
Rust Package Registry
-
7
Closed Bug 1628074 Opened 9 months ago Closed 8 months ago...
-
0
Crates · arzg’s websitePart Seventeen: CratesPosted in Make A Language18 December 2020I’m not a fan of how, currently, all the submodules of crate::parser
-
8
Search Questions and Answers
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK