Posts Tagged R packages
hash-2.0.0
Posted by Christopher Brown in Blog, R on April 30, 2010
The hash-2.0.0 package has been uploaded to CRAN. This version was developed in conjunction with R-2.11.0 and was refactored for performance. hash-2.0.0 requires R-2.10.0 or later and will not be supported on earlier versions of R. This is a result of recent changes to the language itself.
hash-1.99.x
Posted by Christopher Brown in Blog, R on February 17, 2010
hash-2.0.0 has been released please read about it here:
Earlier today, hash-1.99.x was released to CRAN. This is a stable release and adds some more functions to an already full-featured hash implementation. This version fixes some bugs, adds some features, improves performance and stability. You can read about the hash package in my previous blog post, The hash package: hashes come to R. All changes were responsible from users who wrote in and contributed, thoughts, ideas and use cases. Keep the good ideas coming. Two of the major changes are summarized below.
The hash package: hashes come to R
Posted by Christopher Brown in Blog, R on July 26, 2009
hash-2.0.0 has been released. Read about it here.
Perl has hashes. Python has dictionaries. Why doesn’t R have an equivalent? Hash tables and associative arrays are indispensable tools for the programmer. One of the most common and basic tasks of a programmer is to “look up” or “map” a key to a value. In fact, there are projects whose sole raison d’être is making the hash as fast and as efficient as possible.
R actually has two equivalents, both lacking. The first is R’s named vectors and lists. Elements of vectors and lists can be accessed by name, through the standard R methods:
obj$name
obj['name']
obj[['name']]
Vectors are not stored using internal hash tables and as they grow large, performance can suffer. The performance impact is tangible even on small lists. For programs doing many look-ups or look-ups on many objects, this can create a bottleneck.
R’s environments are much closer to Perl hashes and Python’s dictionary. The structure of the environment is a hash table internally and look-ups do not appreciably degrade with object size. To use a R environment, you need to create it and assign key-value pairs to it.
hash = new.env(hash=TRUE, parent=emptyenv(), size=100L)
assign(key, value, hash)
get(key, hash)
We can even get the keys from the hash with the ls function:
ls( env=hash )
This works well and perfomance is good. So what’s the problem?
Usability. In designing, the S language, John Chambers put much thought into how the analyst and statistician interact with data. All varaibles are designed to be vectors and a standard set of accessors( $, [, [[ ) were defined to retrieve and set slices, subsets or elements of the data. The problem is that R environments don't follow this pattern. And this is where the hash package comes in.
The hash package is designed to provide an R-syntax to R's environments and give programmers a hash. The package provides one constructor function, hash that will take a variety of arguments, always doing the right thing. All of the following work:
hash()
hash( keys=c('foo','bar','baz'), values=1:3 )
hash( foo=1, bar=2, baz=3 )
hash( c( foo=1, bar=2, baz=3 ) )
hash( list( foo=1, bar=2, baz=3 ) )
hash( c('foo','bar','baz'), 1:3 )
It pretty much does what you mean.
The standard accessors: [, [[, $ are also available.
h <- hash( c('foo','bar','baz'), 1:3 )
h[ c('foo','bar') ]
h[[ 'foo' ]]
h$foo
As does their corresponding replacement methods.
h <- hash( c('foo','bar','baz'), 1:3 )
h[ c('foo','bar') ] <- c( 'fred', 'wilma' )
h[[ 'foo' ]] <- 'dino'
h$foo <- 'bam bam'
There you have it, hashes for R.
I am the maintainer of the package, so if you have any suggestions for the package, please let me know.