Research » R2C


R is a statistical-computing language, popular among analysts and researchers who rely on statistical techniques, including financial analysts, medical researchers, and, of course, statisticians. The language R was designed by practitioners, not computer scientists. As a result, it has “features” that make it extremely difficult to reason about in an algorithmic manner. No programming language theorists would ever design a language like that! However, that is exactly how most of the contemporary popular domain-specific scripting languages (DSLs) have evolved. Arguably, practitioners are the best people to design a DSL for their domain.

At the same time, as almost all other similar DSLs, R suffers from extremely poor performance, making it (ironically) difficult to use for realistic statistical problems. The goal of this project is to develop automatic performance optimization techniques, especially targeting R. The project uses lessons learned from our MATLAB research, but R has unique characteristics that make it closer to highly dynamic languages, such as Ruby, than MATLAB, creating challenges for type inference—a prerequisite for most static optimizations. Therefore, R is likely to require radically different compilation techniques, requiring a greater support from the run-time system.

Arun Chauhan / Computer Science / Indiana University