

Tricks in R to Boost Your Productivity (Part 2)
source link: https://www.tuicool.com/articles/mUzmQjr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I am always keen on tools and tricks that can help me get things done faster. While increasing productivity does not necessarily mean that it can give you good or right results, it can definitely reduce your working hours which means less time to make mistakes. Spending some time learning tricks and tools will pay itself back later in your work and is totally worthwhile. Following my previousarticle on tricks in R and RStudio, I will continue to share with you in this article some other tricks that I think is useful.
Set up R Profiles
When an R session starts, it first tries to search for a .Rprofile
file in your project directory or in your home directory and execute the function .First
if it exists. A few things you can use the .Rprofile
to execute at the beginning of each session to save your time are
- Load packages that you constantly work (such as tidyverse, data.table, ggplot2, etc.).
- Set Environmental variables (such as a development environment identifier (“sandbox”, “staging”, “prod”, etc.), AWS credentials, your database connection credentials, .etc.).
- Set Java parameters (you definitely need to set this if you need to pull large data from a database through DBI/rJava package).
- Set ggplot2 theme and color palette (useful if you need to use a brand-specific color palette).
An example of my .Rprofile
is shown below:
Make your Database Query Wrapper Functions
If you use R to query the database a lot, you should consider better organizing your file structures and writing wrapper functions to make your life easier. First of all, you should put all your R files into a folder and all SQL files to another one. Some people like to write SQL commands directly into the R functions or scripts, which I don’t think is the best practice and it will make your R code ugly and your SQL code is not reusable by another R code. After the organization, you can write the following wrapper functions:
-
connect_db
is the wrapper function that returns a database connection handler after you call the driver and establish the database connection with the url and credentials. -
get_sql
is the SQL file parser. -
db_query
is the function you call to execute your query, where you can send either a SQL file path or a pure sql string. You can pass additional arguments to the function to substitute parameters in the SQL query throughDBI::sqlInterpolate
function. Most importantly, we need to close the connection after executing the query.
With the above wrapper functions, so you can perform the SQL query by simply calling the following commands for example.
# String based query accounts <- db_query('select count(*) from accounts where created_at < ?time', time = '2019-01-01')# File based query accounts <- db_query('sql/accounts.sql', time = '2019-01-01')
You can easily extend the above wrapper functions with more functionality such as delete, load, etc. to meet your needs.
Never Save Workspace
Data on Exit
If you use the R command-line tool in a terminal for all your work, you will be asked whether to save your workspace data when you want to exit the R session. If you choose to save the workspace data, a hidden “.RData” file will be created in your working directory. It is OK to save the data if your workspace only contains a small amount of data. However, if your workspace contains a large amount of data (512MB or more), the saving process could take a long time to execute because the workspace data needs to be compressed into the file, which is slow for large data. Therefore, my personal suggestion is that you should never save the workspace data store. If you use RStudio, you can set the options like below. If you really want to save the data on some occasions, you can simply call the save
command before you exit.
Package Development
Package Reload
If you are writing an R package and want to reload the whole developing package without restarting the R session and hence losing the data, you can use the function devtools::reload()
from devtools
package to achieve that. The shortcut for reloading is CMD + SHIFT + L
in RStudio.
Package Documentation
Inside the function of your package, you can press CMD + Option + Shift + R
to call Roxygen2
to generate a document skeleton for your function as shown below.
RStudio Code Setting
I am a heavy user of RStudio and have been using RStudio for over 4 years. Below is my code setting recommendation for RStudio. The purpose is to make the coding experience more pleasant and efficient.
Development Environment
I believe that 99% of R users use RStudio as their primary R programming IDE. There are two major products of RStudio IDE. One is RStudio desktop and the other is RStudio server. Most of the R users start with the desktop version mainly for ad-hoc analysis and model development. If their needs expand to model deployment, task automation, shiny website, dashboard, reports, etc., then the server version becomes a better choice. If you use the server version, you had better create your dev, staging, and production environment on your servers for better software management. For me, I only have one EC2 server and don’t have three dedicated servers to dev, staging, and production environments, respectively. As a workaround, you can create three R projects in three different folders version controlled by git and Github to mimic the dev, staging, and production environments as shown below:
- dev : it is my sandbox for modeling and analysis.
- stg : it is a testing environment for new features.
- prod : it contains code and cron jobs used to generate reports, dashboard, and automated tasks. In the production environment, you should only change the code through git pull but never manually.
If you use the free version of RStudio server, switching between projects will restart your R session and hence lose data. If you use the RStudio Pro version, you can run 3 R sessions side by side in your browser, which is more convenient.
Recommend
-
41
Here are 5 command line tools I've found recently which can speed up your workflow. fx , a command-line JSON processing tool fx ( ...
-
22
#Tools
-
6
...
-
14
Android ADB Tricks for your shell to boost your productivity If you are Android developer and working a lot with different phones, there's a high chance you have collected a set of different scripts which help y...
-
6
Top 16 Tips to Quickly Boost Your Productivity in 2021January 1st 2021 1
-
14
7 Tips to Boost Your Digital Marketing Team’s Productivity in 2021 Approximately 35% – 38% of businesses rely on the ‘speed to market’ factor. Depe...
-
7
-
7
There are many ways to increase employee productivity, such as providing them with the right tools, finding ways to motivate them, and keeping their mental and physical health in mind. If you can increase employee productivity, it has a positive i...
-
3
Welcome to the first part of a series of command line tips to boost your Linux productivity. The command line interface can be a lifesaver for those who work on these systems regularly. In this series, we will explore five ways to help you work fa...
-
8
You have 2...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK