10

Top 20 News Datasets Available on the Web for Free

 2 years ago
source link: https://dzone.com/articles/top-20-news-datasets-available-on-the-web-for-free-2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Top 20 News Datasets Available on the Web for Free

This article lists the top 20 news datasets that are available on the web that are for free right now. Read on to find out more!

Join the DZone community and get the full member experience.

Join For Free

15442570-1639121728132.png

Digital news sources have flourished at an extraordinary rate, ranging from a handful of digital news posts to many digital news sources and publications. This is because news posts now cover a wide range of issues and events, increasing their reach. These publications not only represent the world but also change and shape our perception of it.

Storing news data is now common due to the high demand for instant access to historical news data, for which people commonly use the News API. These news datasets can be useful for research purposes and for personal and professional artificial intelligence (AI) and machine learning (ML).

If you are looking for historical news data to power your AI and ML algorithms, you can use these free news datasets or the Newsdata.io tool which I will mention below. News datasets can help you find a wide range of historical stories related to any topic, organization, person, and more.

In this article, we will discuss a simple and reliable way to access historical news data sets. Let’s get right into it.

Here are the top 20 news datasets that you can download for free for your personal and professional AI, machine learning, and data analytics projects.

1. Newsdata.io

Name: Covid-19 news dataset

Link

This Covid-19 dataset contains the latest world news related to Coronavirus.

2. Kaggle.com

Name: BBC News Classification (News article categorization)

Link

The dataset is broken into 1490 records for training and 735 for testing. The goal will be to build a system that can accurately classify previously unseen news articles into the right category.

3. BBC

Name: BBC datasets

Link

Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research.

4. Harvard Dataverse

Name- A Million News Headlines

Link

This contains data on news headlines published over a period of eighteen years. Sourced from the reputable Australian news source ABC (Australian Broadcasting Corporation)

5. Newsdata.io

Name: Covid-19 and vaccine news dataset

Link

This contains data on the latest published news headlines from across the web. News headlines with all the metadata and full description.

6. Webz.io

Name- Political news articles

Link

This contains world politics-related news article data fetch with the help of Webz.io news API.

7. Paperswithcode

Name- COVID-19 Fake News Dataset

Link

Along with the COVID-19 pandemic, we are also fighting an `infodemic’. Fake news and rumors are rampant on social media. Believing in rumors can cause significant harm.

8. Kaggle

Name: India News Headlines Dataset

Link

This news dataset is a persistent historical archive of notable events in the Indian subcontinent from start-2001 to end-2020, recorded in real-time by the journalists of India. It contains approximately 3.4 million events published by the Times of India.

9. Data.world

Name: Economic News Article Tone

Link

Contributors read snippets of news articles. They then noted if the article was relevant to the US economy and, if so, what the tone of the article was.

10. Archive.org

Name: World Politics news dataset

Link

This dataset contains the latest news related to politics around the world with the available news article’s metadata.

11. IEEE.org

Name: Covid-19 and vaccine

Link

This dataset contains world news related to Covid-19 and vaccine and also with the news article’s available metadata.

12. IEEE.org

Name: World politics news

Link

This dataset contains world news related to politics and also with the news article’s available metadata.

13. IEEE.org

Name: Covid-19 news

Link

This dataset contains all the latest news data related to Covid-19 from around the world.

14. IEEE.org

Name: COVIFN : FAKE NEWS ON COVID19

Link

COVIFN is a CoVID-19-specific dataset that consists of fact-checked fake news scraped from Poynter and true news from news publishers’ verified portals. The dataset was pre-processed, the removal of special characters and non-vital information is performed.

15. IEEE.org

Name: FAKE NEWS ON HEALTHCARE

Link

The Internet is a vast repository of useful knowledge, but it has been contaminated by the spread of false information. Relying on misinformation can be disastrous. According to a World Health Organization survey, about 6,000 individuals were hospitalized throughout the world as a result of fake news on COVID-19 in the first three months of 2020.

16. IEEE.org

Name: NEWS CREDIBILITY DATASET

Link

Features of each news according to seven credibility categories.

17. IEEE.org

Name: AI-Based automated extraction of entities, entity categories, and sentiment on Covid-19 situation.

Link

Artificial Intelligence (AI) based on in-depth analysis of social media content would allow a strategic decision-maker to obtain evidence-based responses to complex queries.

18. Kaggle

Name: Reddit Omicron Panic

Link

As we all know, a new variant of COVID-19 is spreading worldwide causing massive panic. This dataset captures mentions of the new variant on Reddit.

19. Kaggle

Name: Omicron daily cases by country (COVID-19 variant)

Link

Tracking the progression of the new omicron COVID-19 variant.

20. IEEE.org

Name: Daily report of Covid-19 confirmed cases in Thailand.

Link

A dataset contains a total of 578,375 COVID-19 confirmed cases reported in Thailand that were being recorded between 22 January 2021 to 30 July 2021.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK