Bulk XC downloads

Three steps:

Use Xeno-Canto to generate a list of all the records
Convert records into a bulk-downloadable format
Download all files

1. List all records

Use Xeno-Canto's web API to generate a list of all the records. For instance, https://www.xeno-canto.org/api/2/recordings?query=northern%20cardinal gives a list of all Northern Cardinal recordings.

Some species have multiple "pages" in their recording lists. You can tell if this is the case based on whether "numPages" is greater than 1. If that's the case, you won't get all of the records unless you specifically go to the next page: https://www.xeno-canto.org/api/2/recordings?query=red%crossbill&page=3

Check your record list by visiting online. When you're satisfied by the records you see at the URL, use wget to download the list of records (technically a "JSON document") onto your computer. Save the download into a file named with extension .json, e.g., noca_query.json, as in the command below:

wget -O noca-query.json https://www.xeno-canto.org/api/2/recordings?query=northern%20cardinal

2. Prepare for bulk downloading

Use a few lines of Python to transform the .json into a .csv file, and then the .csv into a .txt file containing only the download URLs. The .txt file of download URLs is all you'll use to actually get the file, but the .csv will help you keep track of which files you downloaded, what license they had, and what the creator's information was. (You need to give attribution to the creator wherever you use these files!)

import json
import pandas as pd

# Get the json entries from your downloaded json
jsonFile = open('/Users/tessa/Documents/noca-query.json', 'r')
values = json.load(jsonFile)
jsonFile.close()

# Create a pandas dataframe of records & convert to .csv file
record_df = pd.DataFrame(values['recordings'])
record_df.to_csv('xc-noca.csv', index=False)

# Make wget input file
url_list = []
for file in record_df['file'].tolist():
    url_list.append('https:{}'.format(file))
with open('xc-noca-urls.txt', 'w+') as f:
    for item in url_list:
        f.write("{}\n".format(item))

3. Download files

Use wget in your terminal to download the recordings. You may want to run save the files in their own directory; to do so, use the -P flag as below:

mkdir /Users/tessa/Downloads/XC
wget -P /Users/tessa/Downloads/XC --trust-server-names -i xc-noca-urls.txt

Downloading files from Xeno-Canto

Bulk XC downloads

1. List all records

2. Prepare for bulk downloading

3. Download files

Recommend

泡泡玛特回应“盲盒规范指引”：会积极配合监管部门落实相关细则

复杂渲染引擎架构与设计--2.插件的实现

Google’s EU launch of AI chatbot Bard delayed by privacy concerns

学历贬值，微证书会是高等教育的未来吗？

The Field Museum

Windows Server Summit 2022: Modernize your Apps with Windows Containers and AKS

Young people designed 15,000 images for astronauts in Astro Pi Mission Zero 2022...

Belkin’s BoostCharge Pro iPhone MagSafe charger is on sale for just $32.99 - The...

如何做好B端业务？与几位牛人对谈后总结出四点

央企、国企、民企三方联手，推动社区居家养老服务数智产业生态建设

About Joyk