<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Open Data Group &#187; Blog</title>
	<atom:link href="http://opendatagroup.com/category/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://opendatagroup.com</link>
	<description>Open Data Group&#039;s Home Page and Blog</description>
	<lastBuildDate>Sat, 04 Sep 2010 00:51:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Prototyping Cloud Analytic Applications</title>
		<link>http://opendatagroup.com/2010/07/27/prototyping-cloud-analytic-applications/</link>
		<comments>http://opendatagroup.com/2010/07/27/prototyping-cloud-analytic-applications/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 20:50:40 +0000</pubDate>
		<dc:creator>Robert Grossman</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[cloud analytics]]></category>
		<category><![CDATA[data privacy]]></category>
		<category><![CDATA[private clouds]]></category>
		<category><![CDATA[protecting data privacy]]></category>
		<category><![CDATA[prototyping cloud applications]]></category>
		<category><![CDATA[simulated data]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=453</guid>
		<description><![CDATA[Cloud computing is changing the way that companies build and deploy their analytic solutions.  With cloud computing, computing is available on demand, scales elastically, and can be self-provisioned.  This flexibility sometimes requires developing new analytic infrastructure and new analytic algorithms, which, in turn, requires some experimenting.  This process can usually benefit from [...]]]></description>
			<content:encoded><![CDATA[<p>Cloud computing is changing the way that companies build and deploy their analytic solutions.  With cloud computing, computing is available on demand, scales elastically, and can be self-provisioned.  This flexibility sometimes requires developing new analytic infrastructure and new analytic algorithms, which, in turn, requires some experimenting.  This process can usually benefit from an external perspective.   </p>
<p>The fastest way forward is to use a public cloud, external experts, and to do some quick experiments and prototyping.  At this point, for many companies, there is a problem.  It is quite common these days for companies to have policies that prohibit placing proprietary data, or data that contains information that can identify customers,  on public clouds.  Providing access to this data to third parties is also usually quite difficult. </p>
<p>One practical approach is to replace actual data with simulated data, and, instead of using public clouds, to use instead private clouds operated by third parties.  This requires using data simulators that produce realistic data.  For example, large data is rarely normally distributed, but more often follows power laws or similar types of distributions.  </p>
<p>As a reminder, a private cloud is a cloud that is used exclusively by a single organization. It may be managed by the organization or by a third party; and, it may exist on premise (an in-house private cloud) or off premise (a third-party private cloud).  In contrast, in a public cloud, the cloud infrastructure is made available to the general public, or a large group, and is owned by an organization selling cloud services (a cloud service provider).   In this post, we assume that private third party clouds are also single tenant clouds; that is, only one client&#8217;s data is on the cloud at a time and the cloud is sanitized between use by different clients.</p>
<p>In more detail, one approach for moving your analytics to clouds is:</p>
<ul>
<li> use simulated data following realistic simulations, instead of actual data;
<li> supplement in-house expertise with third party experts who specialize in analytics and cloud computing;
<li> use third party private clouds instead of public clouds to decrease risk or perceived risk;
<li> experiment with different analytic approaches and different analytic infrastructures;
<li> agree on APIs up front and transfer technology by transferring code that uses these APIs.
</ul>
<p>We have found this approach works well.  We would be interested in hearing your experiences.</p>
<p>Full disclosure: Open data operates private clouds, has developed software that provides simulated data for a variety of industries, including financial services, and provides consulting services using simulated data on private clouds so that companies can rapidly explore the use of cloud computing to develop innovative cloud computing applications, especially analytic applications. </p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/07/27/prototyping-cloud-analytic-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>hash-2.0.0</title>
		<link>http://opendatagroup.com/2010/04/30/hash-2-0-0/</link>
		<comments>http://opendatagroup.com/2010/04/30/hash-2-0-0/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 15:34:58 +0000</pubDate>
		<dc:creator>Christopher Brown</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[CRAN]]></category>
		<category><![CDATA[hash package for R]]></category>
		<category><![CDATA[open source analytics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R packages]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=346</guid>
		<description><![CDATA[Come see my talk on hashes in R at useR! 2010.  (http://user2010.org/)
July 20-23
National Institute of Standards and Technology (NIST),
Gaithersburg, Maryland, USA]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-433 alignleft" title="hash" src="http://opendatagroup.com/files/2010/04/hash.png" alt="hash" width="221" height="128" />The <strong>hash-2.0.0</strong> package has been uploaded to <strong><a href="http://cran.r-project.org">CRAN</a></strong>.  This version was developed in conjunction with R-2.11.0 and was refactored for performance.   <strong>hash-2.0.0 </strong>requires R-2.10.0 or later and will <strong>not </strong>be supported on earlier versions of R.  This is a result of recent changes to the language itself.</p>
<p><span id="more-346"></span><span style="color: #ff0000"><span style="color: #000000">Importantly: Understand that </span><strong>hash-2.0.0, breaks backward compatibility</strong><span style="color: #000000">;</span> <span style="color: #000000">code written with previous versions of the hash package are not guaranteed to work with this or future versions. </span></span>This is due to changes made in order to achieve much higher performance.  Assignments and look-ups are achieved more quickly through  direct inheritance of environments, stripping of non-essential customizations  and reliance on core and primitive functions.</p>
<p>Here is a summary of major changes:</p>
<ul>
<li>Coercion of keys to valid R names ( i.e. non-blank character values) is not the responsibility of the user.  The four accessor functions: [, [[, $, values, no longer do this automatically.  An error results if a proper R name is not provided.</li>
</ul>
<ul>
<li>The default for missing keys has changed from <span style="color: #333333"><strong><span style="color: #808080">NA</span> </strong></span>to <span style="color: #808080"><strong>NULL</strong></span><span style="color: #000000">. This is to match the behavior lists in trying to access non-existing objects in R.  ( For a more complete, discussion, see my previous blog post discussing the <a href="http://opendatagroup.com/2010/04/25/r-na-v-null/">differences between NA and NULL</a>. )<br />
</span></p>
<ul>
<li><span style="color: #000000">Custom behavior for accessing non-existent keys has been removed.  Access to non-existing keys will always yield NULL.  Consistency is often better than customization.</span></li>
</ul>
</li>
</ul>
<p><em>ChangeLog</em> and <em>TODO</em> track many technical details; here I will discuss only the more  important changes:</p>
<h2>Performance</h2>
<p>Included in this version is a demo script that runs benchmarks (demo(hash-benchmarks).  One of the questions that has been repeatedly posed, often in the context of look-up, is:  <em>how does this compare to native R named lists and vectors?</em> In other words, how much quicker is accessing a value on a hash / environment as opposed to a list (or vector)?  This is a difficult questions, and generally depends on the size of the hash or list.  My rule of thumb is that it is quicker to look-up elements on lists and vectors less than about 500 elements.  After ~500 elements, hashes and environments greatly outperform lists.  The difference increases relative to the size of the object.  However, look-ups for all these objects are very fast if objects are small  ( &gt;120,000 / sec ).  So unless you are doing many serial look-ups, hashes are likely the better option.</p>
<p>I have written previously about hashes in R [<a href="../2009/07/26/hash-package-for-r/">1</a>]  [<a href="../2010/02/17/hash-1-99-x/" target="_self">2</a>], and will continue to  discuss the  evolution of R hashes on this blog.  Additionally I will be speaking on this and related work at <a href="http://user2010.org/" target="_blank">useR!2010</a> (July 20-23.)</p>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/04/30/hash-2-0-0/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>R : NA vs. NULL</title>
		<link>http://opendatagroup.com/2010/04/25/r-na-v-null/</link>
		<comments>http://opendatagroup.com/2010/04/25/r-na-v-null/#comments</comments>
		<pubDate>Sun, 25 Apr 2010 12:51:02 +0000</pubDate>
		<dc:creator>Christopher Brown</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=348</guid>
		<description><![CDATA[The R language has two closely related NULL-like values, NA  and NULL ... Both are used to represent missing or undefined values.  This has lead to much confusion. ]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-350 alignleft" title="na-null" src="http://opendatagroup.com/files/2010/04/na-null.png" alt="na-null" width="253" height="50" /></p>
<p>It is common for programming languages to have a <a href="http://en.wikipedia.org/wiki/NULL">NULL</a> value.  What often leads to confusion is the fact NULL can have two distinct meanings.  In the first, NULL is used to represent missing or undefined values.  This is well appreciated in SQL. In the second case, NULL is the logical representation a statement that is neither TRUE nor FALSE.  This indeterminacy is the basis for <a href="http://en.wikipedia.org/wiki/Ternary_logic">ternary logic</a>.  While these meanings are distinct, they are very often related.  When missing values (the first meaning) are evaluated, the desired result is often an ambiguous result (the second).  That is, the former implies the latter.  In programming, the distinction is often unnecessary and glossed over and the concepts become confounded.</p>
<p><span id="more-348"></span></p>
<p>The <strong>R</strong> language has two closely related NULL-like values<strong>, <span style="color: #888888">NA</span></strong> and <span style="color: #888888"><strong>NULL</strong></span>.  Both are fully support in the language by core functions (e.g, <span style="color: #888888"><strong>is.na</strong>, <strong>is.null</strong>, <strong>as.null</strong><span style="color: #000000">, etc.)</span>. </span>And, while <strong><span style="color: #808080">NA</span></strong> is used exclusively in the logical sense, both are used to represent missing or undefined values.  This has lead to much confusion.  Here&#8217;s what the R documentation has to say:</p>
<blockquote><p><strong>NULL</strong> represents the null object in R: it is a reserved word.<br />
NULL is often returned by expressions and functions whose values are<br />
undefined.</p></blockquote>
<blockquote><p><strong>NA</strong> is a logical constant of length 1 which contains a missing<br />
value indicator. NA can be freely coerced to any other vector<br />
type except raw.  There are also constants NA_integer_,<br />
NA_real_, NA_complex_ and NA_character_ of the other atomic<br />
vector types which support missing values: all of these are<br />
reserved words in the R language.</p></blockquote>
<p>There is a lot of subtlety in the treatment of these values.  A good way to understand the distinction between  <span style="color: #888888">NA</span> and <span style="color: #888888">NULL</span> is through some examples:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="312" valign="top"><strong>NA</strong></td>
<td width="293" valign="top"><strong>NULL</strong></td>
</tr>
<tr>
<td width="312" valign="top">
<pre><span style="color: #ff0000">
 &gt; NA</span><span style="color: #0000ff">
 [1] NA

</span><span style="color: #ff0000"> &gt; class(NA)</span><span style="color: #0000ff">
 [1]   "logical"

</span><span style="color: #ff0000"> &gt; NA &gt; 1</span>
<span style="color: #0000ff"> [1] NA</span></pre>
</td>
<td width="293" valign="top">
<pre><span style="color: #ff0000">
 &gt; NULL</span><span style="color: #0000ff">
 NULL</span><span style="color: #ff0000"> 

 &gt; class(NULL)</span><span style="color: #0000ff">
 [1]   "NULL"<span style="color: #ff0000">

 &gt; NULL &gt; 1</span>
 logical(0)
</span></pre>
</td>
</tr>
</tbody>
</table>
<p>The important distinction is that <span style="color: #808080">NA</span> is a &#8216;logical&#8217; value that when evaluated in an expression, yields NA.  This is the expected behavior of a value that handles logical indeterminacy.   <span style="color: #808080">NULL</span> is its own thing and does not yield any response when evaluated in an expression, which is not how we would want or expect <span style="color: #808080">NA</span> to work.</p>
<p>To delve deeper into the behavior we must look at how R&#8217;s basic data structures, vectors (including matrices and arrays) and lists (including data.frames) behave.  Vectors and lists are similar structures, both allow for multiple values with similar <a href="http://opendatagroup.com/2009/10/21/r-accessors-explained">accessors</a>.  There are subtle differences in the treatment of <span style="color: #808080">NA </span>and<span style="color: #808080"> NULL</span>.  Let&#8217;s take a look at how they compare:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr style="text-align: left">
<td width="361" valign="top">
<p style="text-align: left"><strong> Vectors ( inc. Matrices and Arrays )</strong></p>
</td>
<td width="330" valign="top"><strong> List ( inc. data frames )</strong></td>
</tr>
<tr>
<td width="361" valign="top">
<pre> <span style="color: #ff0000">&gt; v &lt;-  c( 1, NA, NULL)
 &gt; v</span><span style="color: #0000ff"> 
 [1]  1 NA

 </span></pre>
</td>
<td width="330" valign="top">
<pre><span style="color: #ff0000">
 &gt; list(1, NA, NULL)
</span><span style="color: #0000ff"> [[1]]
 [1] 1</span><span style="color: #0000ff">

 [[2]]</span><span style="color: #0000ff">
 [1] NA
</span>
<span style="color: #0000ff"> [[3]]
 NULL</span></pre>
</td>
</tr>
</tbody>
</table>
<p>What happened?  <span style="color: #808080">NULL </span>is not allowed in a vector.  When you attempt to set it as a value in a vector, it is it is quietly ignored.  This is because <span style="color: #808080">NULL </span>is an object and type of its own.  <span style="color: #808080">NULL </span>does not have various types such as NULL_integer_.  There is just <span style="color: #808080">NULL</span>. By contrast, <span style="color: #808080">NA<span style="color: #000000"> has NA_integer, etc. and </span></span><span style="color: #000000">happ</span>ily coexists with any of the basic vector types vector.  <em>So for any vector (matrix or array), <span style="color: #808080">NA </span>represents a missing value.  <span style="color: #808080">NULL does not</span></em>.</p>
<p>Now, let&#8217;s look at the lists example. This is interesting! Unlike the vector, the list can hold objects and values other than the basic types.  This includes the <strong><span style="color: #808080">NULL</span> </strong>value/object.  Perhaps a little inconsistent and not what we would expect.  But from here, things get a little quirky, let&#8217;s try value assignment:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="361" valign="top"><strong>Vectors ( inc. Matrices and Arrays )</strong></td>
<td width="330" valign="top"><strong>List ( inc. data frames )</strong></td>
</tr>
<tr>
<td width="361" valign="top">
<pre><span style="color: #ff0000">
  &gt; v[[1]] &lt;- NULL</span><span style="color: #0000ff">
   Error in v[[1]] &lt;- NULL :</span><span style="color: #0000ff">
    more elements supplied than there are to replace

 </span></pre>
</td>
<td width="330" valign="top">
<pre><span style="color: #ff0000">
 &gt; li &lt;- list( 1, 2, 3 )
 &gt; li[[1]] &lt;- NULL
 &gt; li</span>
<span style="color: #0000ff"> [[1]]
 [1] 2
</span>
<span style="color: #0000ff"> [[2]]
 [1] 3</span></pre>
</td>
</tr>
</tbody>
</table>
<p>Sure enough <span style="color: #808080">NULL <span style="color: #000000">cannot be assigned to a vector.  So for all purposes, <span style="color: #666699">NA <span style="color: #000000">with respect to the basic vector behaves like <span style="color: #666699">NULL</span> in other languages.  <span style="color: #666699">NULL</span> is almost never what you want.  On the list side, however, we see an idiom of <span style="color: #666699">NULL. </span></span></span></span></span><em>Assigning <span style="color: #808080">NULL </span>to list items, removes them</em>.  This behavior is a bit unexpected, but it is the idiom.</p>
<p>There is one final idiom to know about <span style="color: #888888">NULL </span>and lists. Namely, that trying to access a list element by a non-existing name yields a <span style="color: #888888">NULL </span>value.</p>
<pre style="padding-left: 30px"><span style="color: #ff0000">&gt; li$aa</span>
<span style="color: #0000ff">NULL</span>
<span style="color: #ff0000">&gt; li[['aa']]</span>
<span style="color: #0000ff">NULL

</span><span style="color: #0000ff"> </span></pre>
<p>( Note: the same is true for trying to access non-existing objects on an environment )</p>
<p>R does not have a consistent or intuitive way of dealing with missing and logically ambiguous values, i.e. addressing the two meanings from the beginning of this post.  For vectors and basic variables, R mimics other languages and uses <span style="color: #888888">NA</span>.  For lists however, the syntax is more idiomatic.  It is this latter case that presents difficulty.  R has other quirks <a href="http://www.r-statistics.com/2010/04/the-difference-between-lettersc1na-and-letterscnana/">too</a>.  But all languages have quirks, and given R&#8217;s strength for statistical analysis, I have found no better tool for this.</p>
<p><span style="color: #000000"> </span></p>
<div id="_mcePaste" style="width: 1px;height: 1px;overflow: hidden"><!--[if gte mso 9]&gt;  Normal 0     false false false  EN-US X-NONE X-NONE              MicrosoftInternetExplorer4              &lt;![endif]--><!--[if gte mso 9]&gt;                                                                                                                                            &lt;![endif]--><!--  /* Font Definitions */  @font-face 	{font-family:"Cambria Math"; 	panose-1:2 4 5 3 5 4 6 3 2 4; 	mso-font-charset:0; 	mso-generic-font-family:roman; 	mso-font-pitch:variable; 	mso-font-signature:-1610611985 1107304683 0 0 415 0;} @font-face 	{font-family:Calibri; 	panose-1:2 15 5 2 2 2 4 3 2 4; 	mso-font-charset:0; 	mso-generic-font-family:swiss; 	mso-font-pitch:variable; 	mso-font-signature:-520092929 1073786111 9 0 415 0;}  /* Style Definitions */  p.MsoNormal, li.MsoNormal, div.MsoNormal 	{mso-style-unhide:no; 	mso-style-qformat:yes; 	mso-style-parent:""; 	margin-top:0in; 	margin-right:0in; 	margin-bottom:10.0pt; 	margin-left:0in; 	line-height:115%; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	font-family:"Calibri","sans-serif"; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:Calibri; 	mso-fareast-theme-font:minor-latin; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:"Times New Roman"; 	mso-bidi-theme-font:minor-bidi;} .MsoChpDefault 	{mso-style-type:export-only; 	mso-default-props:yes; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:Calibri; 	mso-fareast-theme-font:minor-latin; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:"Times New Roman"; 	mso-bidi-theme-font:minor-bidi;} .MsoPapDefault 	{mso-style-type:export-only; 	margin-bottom:10.0pt; 	line-height:115%;} @page Section1 	{size:8.5in 11.0in; 	margin:1.0in 1.0in 1.0in 1.0in; 	mso-header-margin:.5in; 	mso-footer-margin:.5in; 	mso-paper-source:0;} div.Section1 	{page:Section1;} --><!--[if gte mso 10]&gt; &lt;!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:&quot;Table Normal&quot;; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-priority:99; 	mso-style-qformat:yes; 	mso-style-parent:&quot;&quot;; 	mso-padding-alt:0in 5.4pt 0in 5.4pt; 	mso-para-margin-top:0in; 	mso-para-margin-right:0in; 	mso-para-margin-bottom:10.0pt; 	mso-para-margin-left:0in; 	line-height:115%; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:&quot;Times New Roman&quot;; 	mso-fareast-theme-font:minor-fareast; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin; 	mso-bidi-font-family:&quot;Times New Roman&quot;; 	mso-bidi-theme-font:minor-bidi;} --> <!--[endif]--></p>
<p class="MsoNormal" style="margin-bottom: 0.0001pt;line-height: normal">&gt;Here</p>
<p class="MsoNormal" style="margin-bottom: 0.0001pt;line-height: normal">c( 1, NA, NULL)</p>
<p>[1]  1 NA</p></div>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/04/25/r-na-v-null/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>GPU Computing&#8217;s Next Decade</title>
		<link>http://opendatagroup.com/2010/03/03/gpu-computings-next-decade/</link>
		<comments>http://opendatagroup.com/2010/03/03/gpu-computings-next-decade/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 16:40:39 +0000</pubDate>
		<dc:creator>Christopher Brown</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[AMD]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Intel]]></category>
		<category><![CDATA[Nvidia]]></category>
		<category><![CDATA[Tom's Hardware]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=329</guid>
		<description><![CDATA[Alan Dang at Tom&#8217;s Hardware has posted an article prognosticating what the next decade holds for GPUs.  Even if you don&#8217;t usually find pundit predictions useful, Alan&#8217;s is worth the read.  Alan has been there since the beginning and he takes his readers through the history, the motivating economics toward a coherent vision [...]]]></description>
			<content:encoded><![CDATA[<p>Alan Dang at Tom&#8217;s Hardware has posted an <a href="http://www.tomshardware.com/reviews/future-3d-graphics,2560.html">article</a> prognosticating what the next decade holds for GPUs.  Even if you don&#8217;t usually find pundit predictions useful, Alan&#8217;s is worth the read.  Alan has been there since the beginning and he takes his readers through the history, the motivating economics toward a coherent vision of the GPU&#8217;s future.  The article compares and contrasts the product mixes, technology and strategies of the three existing competitors: Nvidia, AMD and Intel.</p>
<p>Despite an emphasis toward gaming and video &#8212; the primary market and impetus for technology &#8212; the high performance computing enthusiast should take away a better understanding the technology and what hardware and software toys are on the horizon.</p>
<address>Posted by: Christopher Brown, Principle Open Data Partners</address>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/03/03/gpu-computings-next-decade/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>hash-1.99.x</title>
		<link>http://opendatagroup.com/2010/02/17/hash-1-99-x/</link>
		<comments>http://opendatagroup.com/2010/02/17/hash-1-99-x/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 15:26:45 +0000</pubDate>
		<dc:creator>Christopher Brown</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[hashes]]></category>
		<category><![CDATA[R packages]]></category>
		<category><![CDATA[R programming]]></category>

		<guid isPermaLink="false">http://opendatagroup.com/?p=322</guid>
		<description><![CDATA[hash-2.0.0 has been released please read about it here: 
Earlier today, hash-1.99.x was released to CRAN.  This is a stable release and adds some more functions to an already full-featured hash implementation.  This version fixes some bugs, adds some features, improves performance and stability.  You can read about the hash package in [...]]]></description>
			<content:encoded><![CDATA[<p><span style="color: #ff0000"><strong><span>hash-2.0.0 has been released please read about it <a href="http://opendatagroup.com/2010/04/30/hash-2-0-0/">here</a>: </span></strong></span></p>
<p>Earlier today, hash-1.99.x was released to CRAN.  This is a stable release and adds some more functions to an already full-featured hash implementation.  This version fixes some bugs, adds some features, improves performance and stability.  You can read about the hash package in my previous blog post,<a href="http://opendatagroup.com/2009/07/26/hash-package-for-r/"> The hash package: hashes come to R</a>.  All changes were responsible from users who wrote in and contributed, thoughts, ideas and use cases.  Keep the good ideas coming.  Two of the major changes are summarized below.</p>
<p><span id="more-322"></span></p>
<p>Matthias Buch-Kromann of the Copenhagen Business School recommended the ability to access multiple keys from a single call and even access the same key multiple times.  This was previously allowed using the <code>[[</code> method, but was deprecated.  By convention, the <code>[[</code> method returns only one value.  ( You can read about the conventions of this and other R accessors in my previous blog post, <a href="http://opendatagroup.com/2009/10/21/r-accessors-explained/">R Accessors Explained</a>. ) This behavior has returned to hash-1.99.x the use of the <code>values</code> method and the and optional <code>keys</code> argument:</p>
<p style="padding-left: 30px"><code><span style="color: #333333"><br />
h &lt;- hash( c('a','b','c'), 1:3 )<br />
values(h)<br />
values(h, keys=c('a','b','c','a','b','c' ) )</span><br />
</code></p>
<p>Matthias suggested calling the method <code>mget</code>, but there was some disparity with the <code>mget</code> function in base.  The generic function that I needed just wouldn't play nice with base::mget.</p>
<p>Another change in the behavior was prompted by Mohammad Fahim of the Department of Computer Engineering and Computer Science at the University of Louisville.  He wrote me to ask if there is a way to suppress warnings when trying to access non-existent keys.  When accessing  hashes hundreds of thousands of times, it becomes a drag to continually see:</p>
<p style="padding-left: 30px"><code>key: xxxx not found in the hash : hash_table_name</code></p>
<p>I have refactored the behavior to be more R-like by following <span style="color: #333333"><code>na.action</code>-</span>type conventions.  Now the default behavior is to return <span style="color: #333333"><code>NA</code></span> when trying to access non-existing keys.</p>
<p style="padding-left: 30px"><code><br />
<span style="color: #333333">&gt; library(hash)<br />
&gt;h &lt;- hash( c('a','b','c'), 1:3 )<br />
&gt; h  h[ letters[1:5] ]<br />
containing 6 key-value pair(s).<br />
a : 1<br />
b : 2<br />
c : 3<br />
d : NA<br />
e : NA</span><br />
</code></p>
<p>The behavior is also controllable by <code>na.action.hash</code> option.  The functions are provided for most use cases:</p>
<ul>
<li><code>na.default.hash</code> (default) returns <code>NA</code> silently ,</li>
<li><code>na.fail.hash</code> (old default) errors on non-existing keys</li>
<li><code>na.warn.hash</code> returns <code>NA</code> but issues a warning.</li>
</ul>
<p>Behaviors can be set by setting the <code>na.hash.action</code> option.  For example, to get the default behavior:</p>
<p style="padding-left: 30px"><code><br />
&gt; <span style="color: #333333">options( na.hash.action = na.fail.hash )<br />
&gt; h$d<br />
Error: key, d, not found in hash.<br />
&gt; h[[ 'd' ]]<br />
Error: key, d, not found in hash.</span><br />
</code></p>
<p>And , for the <span style="color: #333333"><code>[</code></span> and <span style="color: #333333"><code>[[</code> </span>methods, this behavior can be declared at access time:</p>
<p style="padding-left: 30px"><code><br />
&gt; h[[ 'd', na.action=na.warn.hash ]]<br />
Warning: key, d, not found in hash.<br />
d<br />
NA<br />
&gt; h[[ 'd', na.action=na.fail.hash ]]<br />
Error: key, d, not found in hash.<br />
&gt; h[[ 'd', na.action=na.default.hash ]]<br />
d<br />
NA<br />
</code></p>
<p>If you don&#8217;t like these hash-key-miss behaviors, you are free to write your own.  Functions should minimally accept arguments of the hash and the key.</p>
<p>Thanks to both Matthias and Mohammed for your feedback.</p>
<p>New features are on their way.  Notably, the ability to use any object as keys and to preserve the order of the hash.  These are sometimes called Indexed Hashes.  Look for that in the hash-2.00.x release.  If you would like to see features added contact me at cbrown -at- opendatagroup.com</p>
<p>References:</p>
<ul>
<li><a href="http://opendatagroup.com/2009/07/26/hash-package-for-r/">The hash package: hashes come to R</a></li>
<li><a href="http://opendatagroup.com/2009/10/21/r-accessors-explained/">R Accessors Explained</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://opendatagroup.com/2010/02/17/hash-1-99-x/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
