Reproducible and Accessible Research Using Plumber
Sometimes we want our research to be accessible for most of the people. Say, we discovered a pattern in a stock market using some machine learning algorithm we invented, and we want everyone to be able to access to our algorithm. One way to do that, is to make a package and distribute it on CRAN (if you are using R). However, there is a limitation when we distribute it as a package. That is, it is limited to specific type of users (people who are not in R background, will need to learn R to get benefit from the package).
To bridge the users from other technology stacks, we can use API call. By building it as an API, it does not matter what programming language the clients use, they can still have the benefits.
Now, what if the only language we can use is R? How to make an API using R? Turns out, there is already a library in R that helps us building API and we will discuss about what’s the benefit of using the library and what’s the challenge.
Plumber
The library is called Plumber. Plumber is an R library that easily converts your R function into an API call by using a special comment. What is special comment? Let’s see the example.
# Get list of all companies
#' @get /all
#' @json
getAllCompanyInfo <- function() {
companies
}
Notice the #'
, that is called a special comment. If you are familiar with building a package in R, you should notice that we can also use special comment for generating documentation in Roxygen.
There are few more steps required, but I won’t explain it here. If you want to learn how Rplumber works, you can refer to the documentation.
Advantages
Flexibility
Plumber is a library, not a framework. Therefore, there is no rule about in what folder we should put our source code. We can follow the structure of R package. For example, we put our source code in R folder and our test in tests folder. I choose to follow the structure of R package as it is already neat and structured enough for people who want to see the code. We can also generate DESCRIPTION file. Although, it is not needed for the library to work, we can use it for Travis CI (Travis will read the DESCRIPTION file, and download the packages needed to run the test).
Extensibility
We can combine Plumber with Docker and CI
Although, in deploying the API, we don’t necessarily need Docker, we can use Docker so that we can deploy our API easier. We can even distribute the image to the docker registry. It is also already mentioned on the documentation on how to setup the docker, but for reference, here is my Dockerfile.
FROM rocker/r-base
MAINTAINER Bagus Trihatmaja <trihatmaja.a@husky.neu.edu>
RUN apt-get update -qq && apt-get install -y \
git-core \
libssl-dev \
libcurl4-gnutls-dev
RUN R -e 'install.packages(c("plumber", "tibble", "dplyr", "readr", "purrr", "modelr"))'
EXPOSE 8000
RUN mkdir app
ADD R app/R
ADD data app/data
WORKDIR app
CMD ["R", "-e", "source('R/app.R')"]
As we can use Travis CI to test our codes (as mentioned above, we need DESCRIPTION file), we can also use Travis CI to build the image automatically for us.
This way, we can distribute our code even more: if we have multiple algorithms, we can build API for each of the algorithms and connect them together using Docker Compose.
Notes
There are things that I haven’t tested yet. Initially when I built my application, I expected some challenges would appear as I built more complex API. Although, I didn’t find any so far. Here is the list of things that I am not sure yet.
Security
On the documentation, it is already explained about the security. However, the documentation itself, by the time I write this, does not completely explain about the security. For the API that I built, it is not yet tested.
Performance
This is also another aspect that I haven’t tested yet. How is the perfomance in production environment. Probably if we have many algorithms we can distribute the API in separate containers, that might help with the performance. Of course, using Rcpp will also help.