R
hash-2.0.0
Posted by Christopher Brown in Blog, R on April 30, 2010
The hash-2.0.0 package has been uploaded to CRAN. This version was developed in conjunction with R-2.11.0 and was refactored for performance. hash-2.0.0 requires R-2.10.0 or later and will not be supported on earlier versions of R. This is a result of recent changes to the language itself.
R : NA vs. NULL
Posted by Christopher Brown in Blog, R on April 25, 2010

It is common for programming languages to have a NULL value. What often leads to confusion is the fact NULL can have two distinct meanings. In the first, NULL is used to represent missing or undefined values. This is well appreciated in SQL. In the second case, NULL is the logical representation a statement that is neither TRUE nor FALSE. This indeterminacy is the basis for ternary logic. While these meanings are distinct, they are very often related. When missing values (the first meaning) are evaluated, the desired result is often an ambiguous result (the second). That is, the former implies the latter. In programming, the distinction is often unnecessary and glossed over and the concepts become confounded.
hash-1.99.x
Posted by Christopher Brown in Blog, R on February 17, 2010
hash-2.0.0 has been released please read about it here:
Earlier today, hash-1.99.x was released to CRAN. This is a stable release and adds some more functions to an already full-featured hash implementation. This version fixes some bugs, adds some features, improves performance and stability. You can read about the hash package in my previous blog post, The hash package: hashes come to R. All changes were responsible from users who wrote in and contributed, thoughts, ideas and use cases. Keep the good ideas coming. Two of the major changes are summarized below.
[, [[, $: R accessors explained
Posted by Christopher Brown in Blog, R, analytics on October 21, 2009

R Accessors
For more than ten years, I have been teaching R both formally and informally. One thing that I find often trips up students is the use of R’s accessors and mutators. ( For those readers not from a formal computer science background, an accessor is a method for accessing data in an object usually an attribute of that object.) A simple example is taking a subset of a vector:
letters[1:3]
[1] "a" "b" "c"
As you can see, the result is a character vector containing the first three letters of letters vector.
Good programming languages have a standard pattern for accessor and mutators. For R, there are three: [, [[, and $. This confuses beginners coming from other programming languages. Java and Python have one: '.'. Why does R need three?
The reason derives from R's data centric view of the world. R natively provides vectors, lists, data frames, matrices, etc. In truth, one can get by using only [ to extract information from these structures, but the others are handy in certain scenarios. So much so that after a while, they feel indispensible. I will explain each and hopefully by the end of this article you will understand why each exists, what to remember and, more importantly, when to each should be used.
Subset with [
When you want a subset of an object use [. Remember that when you take a subset of an the object you get the same type of thing. Thus, the subset of a vector will be a vector, the subset of a list will be a list and the subset of a data.frame will be a data.frame.
There is one inconsistency, however. The default in R is to reduce the results to the lowest dimension, so if your subset contains only result, you will only get that one item which may be something of a different type. Thus, taking a subset of the iris data frame with only one column
class( iris[ , "Petal.Length" ] )
[1] numeric
returns a numeric vector and not a data frame. You can override this behavior with the little publicized drop parameter, which indicates not to reduce the result. Taking the subset of iris with drop = FALSE
iris[ , "Petal.Length", drop=FALSE ]
is a proper data frame.
Things to Remember:
- Most often, a subset is the same type as the original object.
- Both indices and names can be used to extract the subset. ( In order to use names, object must have a name type attribute such as names, rownames, colnames, etc. )
- You can use negative integers to indicate exclusion.
- Unquoted variables are interpolated within the brackets.
Extract one item with [[
The double square brackets are used to extract one element from potentially many. For vectors yield vectors with a single value; data frames give a column vector; for list, one element:
letters[[3]]
iris[["Petal.Length"]]
The mnemonic device, here is that the double square bracket look as if you are asking for something deep within a container. You are not taking a slice but reaching to get at the one thing at the core.
Three important things to remember:
- You can return only one item.
- The result is not (necessarily) the same type of object as the container.
- The dimension will be the dimension of the one item which is not necessarily 1.
- And, as before:
- Names or indices can both be used.
- Variables are interpolated.
Interact with $
Interestingly enough, the accessor that provides the least unique utility is also probably used the most often used. $ is a special case of [[ in which you access a single item by actual name. The following are equivalent:
iris$Petal.Length
iris[["Petal.Length"]]
The appeal of this accessor is nothing more than brevity. One character, $, replaces six, [[""]]. This accessor is handiest when doing interactive programming but should be discouraged for more production oriented code because of its limitations, namely the inability to interpolate the names or use integer indices.
Things to Remember:
- You cannot use integer indices
- The name will not be interpolated.
- Returns only one item.
- If the name contains special characters, the name must be enclosed in backticks: ``
That is really all there is to it. [ - for subsets, [[ - for extracting items, and $ - for extracting by name.
R: The Dummies Package
Posted by Christopher Brown in Blog, R, analytics on September 30, 2009
R-2.9.2 was released in August. While R can be considered stable and battle-ready, it is also far from stagnation. It is humbling to see such an intelligent and vibrant community helping CRAN grow faster than ever. Every day I see a new package or read a new comment on R-Help gives me pause to think.
As much as I like R, on occasion I will find myself lost in some dark corner. Sometimes, I find light. Sometimes I am gnashing teeth and wringing hands. Frustrated. In a recent foray, I found myself trying to do something that I thought exceedingly trivial: expanding character and factor vectors to dummy variables. There must be some function, but what? Trying ?dummy didn’t turn up anything. Surely some else must have encountered this and provided a package. I went to the Internet and sure enough the R-wiki was here to save me. And looking even harder, I found some who had treaded before me on the R-Help archives. It turns out, it’s simple. Expanding a variable as a dummy variable can be done like so:
x <- c(2, 2, 5, 3, 6, 5, NA)
xf <- factor(x, levels = 2:6)
model.matrix( ~ xf - 1)
Two problems. The first problem is that without an external source (Google), I would have never stumbled upon what I wanted. ( Thanks Google!) I understand it now, but for what I wanted to do, I would never have thought, “oh, model.matrix.”
The second problem is the arcane syntax, wtf <- ~ xf - 1. I get it now, but it took me some time to figure out what was going on. I get it, but why not just dummy(var)? This is what I want to do.
The solution on the wiki wasn’t quite what I was looking for. For instance, you can’t say:
model.matrix( ~ xf1 + xf2 + xf3- 1)
It turns out, you can only expand one variable at a time. Well, this is not good. I know that you could solve this with some sapply’s and some tests, but next time I might forgot about how to do it. So with a couple of spare hours, I decided that the next guy, wouldn’t have to think about it. He could just use my dummies package.
Like the R-wiki solution, the dummies package provides a nice interface for encoding a single variable. You can pass a variable -or- a variable name with a data frame. These are equivalent:
dummy( df$var )
dummy( "var", df )
Moreover, you can choose the style of the dummy names, whether to include unused factor level, to have verbose output, etc.
But more than the R-wiki solution, dummy.data.frame offers to something similar to data.frames. You can specify which columns to expand by name or class and whether to return non-expanded columns.
The package dummies-1.04 is available in CRAN. Comments and questions are always appreciated.