Rserve is something I never knew I needed and now don’t know how I ever did without. The core technology is available from Rforge and is a server designed to make R available from other languages without any startup cost. My task was to make this technology available as a containerized service. This introduced me to several new technologies including Rserve itself, Docker and the Rocker project.
Background
This project was motivated by a need for stronger support for using R in production. Consistently we were finding it difficult to create performant applications which relied on R for statistics and niche scientific analysis. I suppose most of the trouble centered around the fact that R is single-threaded. To have multiple jobs running simultaneously means having multiple R processes running on the same server. Each of those processes comes with their own startup cost.
Rserve solves this problem by allowing you to create an R process where any needed libraries are already attached. As new requests come in that process is forked, providing instant access in a scalable way without the upfront load time. My goal was to incorporate this resource into our infrastructure in a reliable way.
Development
I started familiarizing myself with Docker, and quickly found the Rocker project. I decided to build our Rserve container using a base container from Rocker rather than from an empty Linux container where I installed R myself. Installing R myself into an Alpine Linux container could have maybe given us a smaller image, but it would have required a lot more resources not just to develop but also to maintain. Using Rocker’s `r-ver` base container guaranteed us reproducibility. It ensured a consistent version of R and any installed packages.
With Rocker doing most of the heavy lifting, I focused on adding some basic features. There is, for example, some custom logging, the ability to have Rserve start in debug mode using environment variables and a means to have custom functions sourced at startup. From there the project has remained mostly unchanged for several years.
For anyone interested, the source code can be found here.
Pingback: The EDA Java Services - Danielle Callan