Analysis Workflow in R

The idea is to break the code into four files, all stored in your project directory. These four files are to be processed in the following order.

load.R
This file includes all code associated with loading the data. Usually, it will be a short file reading in data from files.
clean.R
This is where you do all the pre-processing of data such as taking care of missing values, merging data frames, handling outliers. By the end of this file, the data should be in a clean state, ready to use. It is much better to do this here rather than clean the data on the original file as this enables you to have a complete record of everything done to the data.
functions.R
All of the functions needed to perform the actual analysis are stored here.  This file should do nothing other than define the functions you need for analysis. (If you require your own functions for loading or cleaning the data, include them at the top of either load.R or clean.R.) In particular, functions.R should not do anything to the data. This means that you can modify this file and reload it without having to go back and repeat steps 1 & 2 which can take a long time to run for large data sets.
do.R
Here is the code to actually do the analysis. This file will use the functions defined in functions.R to do the calculations, produce figures and tables, etc. All figures and tables that end up in your report, paper or thesis should be coded here. Never create figures and tables manually (i.e., with the mouse and menus) as then you can’t easily reproduce.

The main motivation for this set up is for working with large data whereby you don't want to have to reload the data each time you make a change to a subsequent step. Also, keeping my code compartmentalized like this means I can come back to a long forgotten project and quickly read load.R and work out what data I need to update, and then look at do.R to work out what analysis was performed.

Filed under  //  R  
Comments (0)
Posted

Google's R Style Guide

R is a high-level programming language used primarily for statistical computing and graphics. The goal of the R Programming Style Guide is to make our R code easier to read, share, and verify. The rules below were designed in collaboration with the entire R user community at Google.
Important Ideas:
1. File names should end in .R and, of course, be meaningful.
2. Don't use underscores ( _ ) or hyphens ( - ) in identifiers. Identifiers should be named according to the following conventions. Variable names should have all lower case letters and words separated with dots (.); function names have initial capital letters and no dots (CapWords); constants are named like functions but with an initial k.
3. The maximum line length is 80 characters.
4. When indenting your code, use two spaces. Never use tabs or mix tabs and spaces.
5. Place spaces around all binary operators (=, +, -, <-, etc.).
6. An opening curly brace should never go on its own line; a closing curly brace should always go on its own line. 
7. Use <-, not =, for assignment.
8. Do not terminate your lines with semicolons or use semicolons to put more than one command on the same line.
9. Comment your code. Entire commented lines should begin with # and one space. Short comments can be placed after code preceded by two spaces, #, and then one space.
10. Function definitions should first list arguments without default values, followed by those with default values.
11. Functions should contain a comments section immediately below the function definition line.
12. Use a consistent style for TODOs throughout your code.

For more discussion of R coding style, convnetions and recomondations please check out R Coding Convention by Henrik Bengtsson and Style guide for R code by Andrew Gelman. 

Filed under  //  R  
Comments (0)
Posted

How to Implement a Web Service in R in under one hour

Cumulo SAASi comes with many examples that show how to build web services in the R programming language. This article will break down one of the examples, and show you how easy it is to build web services in R. Of special note is the fact that there is no network programming involved and no web server is required. All the plumbing is done for you so that you can concentrate on implementing the web service in R, and you don't have to worry about anything else.

 

Filed under  //  R  
Comments (0)
Posted