Repository files navigation

README

ArXiv Paper Reader

Official implementation of the algorithm behind:

YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

The main idea of this work is to simplify and streamline ArXiv paper reading. If you're a visual learner, this code will covert a paper to an engaging video format. If you are on the run and like listening, this code will also generate audio for listening.

Overview

Here are the main steps of the algorithm:

Download paper source code, given its ArXiv ID
Use latex2html or latexmlc to convert latex code to HTML page
Parse HTML page to extract text and equations, ignoring tables, figures, etc
If creating video, also create a map that matches pdf page to text and also text chunks to page blocks.
Split the text into sections and pass them through OpenAI GPT api to paraphrase, simplify and explain.
Split GPT-generated text into chunks and convert them to audio using text-to-speach Google api
Pack all the necessary pieces and create a zip file for further video processing
Using earlier computed text-block map, create video using ffmpeg

Note 1 The code can create both long, more detailed, as well as short, summarized versions of the paper.

Note 2 The long video version will also contain summary blocks after each section

Note 3 The short video version will contain automatically generated slides summarizing the paper

Note 4 The code can also upload the generated audio files to your Google Drive, if provided with proper credentials

Setup

Python Packages

openai, PyPDF2, spacy, tiktoken, pyperclip, google-cloud-texttospeech, pydrive2, pdflatex

How to run

# to create audio, both short and long, and prepare for video creation

python main.py --verbose --include_summary --create_short --create_video --openai_key <your_key> --paperid <arxiv_paper_id> --l2h

The default latex conversion tool latex2html sometimes fails, in this case remove --l2h to use latexmlc. Also, by default the code will process the whole paper up to references, if you want to stop earlier, pass --stop_word "experiments" (e.g., to stop before Experiments Section).

Output

<arxiv_paper_id>_files/
├── final_audio.mp3
├── final_audio_short.mp3
├── abstract.txt
├── zipfile-<time_stamp>.zip
├── ...
├── extracted_orig_text_clean.txt
├── original_text_split_pages.txt
├── original_text_split_sections.txt
├── ...
├── gpt_text.txt
├── gpt_text_short.txt
├── gpt_verb_steps.txt
├── ...
├── slides
    ├── slide1.pdf
    ├── ...

The output directory, among other things, contains generated audio files, slides, extracted original text and GPT generated output, split across pages or sections. The output also contains zipfile-<time_stamp>.zip which includes data for video generation.

# to extract only the original text from ArXiv paper, without any GPT/audio/video processing

python main.py --verbose --extract_text_only --paperid <arxiv_paper_id>

Now, we are ready to generate the video:

# to generate video based on the results from above, point to the 

python makevideo.py --paperid <arxiv_paper_id>

Output

output_<time_stamp>/
├── output.mp4
├── output_short.mp4
├── ...

The output directory now contains two video files, one for the long and another for the short video.

GitHub - imelnyk/ArxivPapers: Code behind Arxiv Papers

Repository files navigation

ArXiv Paper Reader

Overview

Setup

Python Packages

How to run

Output

Output

Recommend

Apple TV+ has cornered the market on 'Prestige Dad TV' - MacDailyNews

App+1 | 自己动手解决 Pixel 启动器的图标问题：不规则图标补全计划

US Senate declines to fast-track TikTok bill

AirPods case that looks like Mac is the coolest ever [Review]

高并发架构设计（三大利器：缓存、限流和降级） - 架构 - dbaplus社群：围绕Data、Blo...

315曝光的同程金融APP已被下架

英睿达T705 PCIe 5.0 SSD上市：读取速度高达14500MB/s，售价1899元起

人工智能与云：虚拟世界中的绝配！

Your User Stories Are Too Big

315曝光｜N部手机同时发布！网络水军利用主板机随意更改IP

About Joyk