<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jacob Repp &#187; algorithms</title>
	<atom:link href="http://jrepp.com/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://jrepp.com</link>
	<description>Game programming, music and life</description>
	<lastBuildDate>Fri, 16 Dec 2011 06:03:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>More Database Linkage / Research Material</title>
		<link>http://jrepp.com/2009/12/23/more-database-linkage-research-material/</link>
		<comments>http://jrepp.com/2009/12/23/more-database-linkage-research-material/#comments</comments>
		<pubDate>Wed, 23 Dec 2009 08:39:58 +0000</pubDate>
		<dc:creator>proj</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[db]]></category>

		<guid isPermaLink="false">http://jrepp.com/2009/12/23/more-database-linkage-research-material/</guid>
		<description><![CDATA[Papers: Weaving Relations for Cache Performance C-Store a Column oriented DBMS Terms: Online Transaction Processing OLTP Online Analytic Processing OLAP Batch Processing Publish Subscribe Bitmap Indexes Strict Weak Ordering DB Normalization People: Ted Codd And his 12 rules Christopher Date &#8230; <a href="http://jrepp.com/2009/12/23/more-database-linkage-research-material/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Papers:</p>

<p><a href="http://www.cs.cmu.edu/~natassa/aapubs/conference/pax.pdf">Weaving Relations for Cache Performance</a></p>

<p><a href="http://db.csail.mit.edu/projects/cstore/vldb.pdf">C-Store a Column oriented DBMS</a></p>

<p>Terms:</p>

<p><a href="http://en.wikipedia.org/wiki/OLTP">Online Transaction Processing OLTP</a></p>

<p><a href="http://en.wikipedia.org/wiki/OLAP">Online Analytic Processing OLAP</a></p>

<p><a href="http://en.wikipedia.org/wiki/Batch_Processing">Batch Processing</a></p>

<p><a href="http://en.wikipedia.org/wiki/Publish_subscribe">Publish Subscribe</a></p>

<p><a href="http://en.wikipedia.org/wiki/Bitmap_index">Bitmap Indexes</a></p>

<p><a href="http://www.sgi.com/tech/stl/StrictWeakOrdering.html">Strict Weak Ordering</a></p>

<p><a href="http://en.wikipedia.org/wiki/Database_normalization">DB Normalization</a></p>

<p>People:</p>

<p><a href="http://en.wikipedia.org/wiki/Ted_Codd">Ted Codd</a>  <a href="http://en.wikipedia.org/wiki/Codd's_12_rules">And his 12 rules</a></p>

<p><a href="http://en.wikipedia.org/wiki/Christopher_J._Date">Christopher Date</a></p>

<p>If you have more feel free to leave comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://jrepp.com/2009/12/23/more-database-linkage-research-material/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nice Collection of Hash Functions</title>
		<link>http://jrepp.com/2008/11/05/nice-collection-of-hash-functions/</link>
		<comments>http://jrepp.com/2008/11/05/nice-collection-of-hash-functions/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 19:54:15 +0000</pubDate>
		<dc:creator>proj</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[d]]></category>

		<guid isPermaLink="false">http://jrepp.com/2008/11/05/nice-collection-of-hash-functions/</guid>
		<description><![CDATA[Found a nice collection of hash routines written in D. These are easy to convert to other languages. I was just looking for Robert Sedgewick&#8217;s hash function since I didn&#8217;t have his book handy.]]></description>
			<content:encoded><![CDATA[<p>Found a nice collection of <a href="http://derrick.pallas.us/d/hash/">hash routines written in D</a>. These are easy to convert to other languages. I was just looking for <a href="http://en.wikipedia.org/wiki/Robert_Sedgewick_(computer_scientist)">Robert Sedgewick&#8217;s</a> hash function since I didn&#8217;t have his book handy.</p>
]]></content:encoded>
			<wfw:commentRss>http://jrepp.com/2008/11/05/nice-collection-of-hash-functions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Investigating DB File Formats</title>
		<link>http://jrepp.com/2008/10/20/investigating-db-file-formats/</link>
		<comments>http://jrepp.com/2008/10/20/investigating-db-file-formats/#comments</comments>
		<pubDate>Mon, 20 Oct 2008 07:04:29 +0000</pubDate>
		<dc:creator>proj</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[db]]></category>

		<guid isPermaLink="false">http://jrepp.com/2008/10/20/investigating-db-file-formats/</guid>
		<description><![CDATA[Just a small post, maybe more to come detailing actual implementation details: Tree Structured Indexes Design of BTRFS]]></description>
			<content:encoded><![CDATA[<p>Just a small post, maybe more to come detailing actual implementation details:</p>

<p><a href="http://www.isqa.unomaha.edu/haworth/isqa3300/fs010.htm">Tree Structured Indexes</a></p>

<p><a href="http://btrfs.wiki.kernel.org/index.php/Btrfs_design">Design of BTRFS</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jrepp.com/2008/10/20/investigating-db-file-formats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Build Huffman Compression in Ruby</title>
		<link>http://jrepp.com/2008/01/08/build-huffman-compression-in-ruby/</link>
		<comments>http://jrepp.com/2008/01/08/build-huffman-compression-in-ruby/#comments</comments>
		<pubDate>Tue, 08 Jan 2008 03:19:21 +0000</pubDate>
		<dc:creator>proj</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://jrepp.com/2008/01/08/build-huffman-compression-in-ruby/</guid>
		<description><![CDATA[Many wonderful things can be done with binary trees. One brilliant usage of the binary tree was proposed by David Huffman in 1951 at MIT which has since become the foundation for much of the compression technology available today. Huffman &#8230; <a href="http://jrepp.com/2008/01/08/build-huffman-compression-in-ruby/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Many wonderful things can be done with binary trees. One brilliant usage of the binary tree was proposed by <a href="http://en.wikipedia.org/wiki/Huffman_coding">David Huffman in 1951 at MIT</a> which has since become the foundation for much of the compression technology available today. Huffman discovered a simple way to generate a provable minimal binary encoding given a set of input symbols. It&#8217;s a deep subject, and one which I&#8217;d eventually like to spend more time getting into. For now I&#8217;ll walk through how this simple idea can be used to implement some decent compression using pure Ruby.</p>

<p>First of all you have your basic binary tree representation. I previously showed a <a href="http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/">simple binary tree example</a> that demonstrates how binary trees are easily manipulated in ruby. Every binary tree node holds a value and links off to a right and left nodes that may also hold values. A leaf node is a node with dead links (left and right are both set to nil). </p>

<p>In the Huffman tree the leaves contain a single symbol while interior nodes summarize their sub-tree. I have simplified the idea so that all nodes wrap what is called an association list (assoc list). This is an old lisp idiom of a list containing name value pairs in the form ((name value) (name value)). To find a value in an association list you just walk the list looking at the first element of each sub-list. A reverse association list (rassoc list) takes the same form as an assoc list with the name and value pair swapped. For very small sets of data the performance of assoc lists is better than deeper structures such as a hash or tree. Ruby provides a few array APIs to deal with assoc and rassoc lists. I always store a rassoc list with the (occurance count, symbol) in a leaf and a sum of all contained lists in interior nodes. None of this information is important in understanding how Huffman encoding works but I use this concept below so I thought I should at least call it out.</p>

<pre><code>class Node
    # binary tree representation
    attr_accessor :value, :left, :right
    def initialize(value=nil, left=nil, right=nil)
        @value, @left, @right = value, left, right
    end
    def leaf?
        return @left == nil &amp;&amp; @right == nil
    end
end
</code></pre>

<p>By default the node is initialized to be empty with no links. A simple query method &#8216;leaf?&#8217; is added for some syntactic sugar below when dealing with the tree.</p>

<p>The first step in building the Huffman tree is to build a set of occurrences for the data you want to encode. Some algorithms are able to this dynamically for streams of data but in this case I walk the entire input text first to find all symbols. In this implementation a symbol is a single character. Keep in mind that it is possible to use words or other unique symbols instead of single characters which could result in better compression. </p>

<pre><code>def occurrences(text)
    # return an ordered array of [occurrences, character] ascending order
    occ = Hash.new(0)
    text.scan /./m do |b| occ[b] += 1 end 
    return occ.keys.map {|k| [occ[k], k]}.sort
end
</code></pre>

<p>The hash table <strong>occ</strong> is initialized with a default value of zero which causes all new entries to start at zero. Every character of the text is scanned and the characters are counted using the hash table. The final stage is to create an array of [count, character] entries in sorted order. Map executes the block over each key looking up the value of the key and outputting a single rassoc element per hash entry.</p>

<p>The final stage in building the tree is to convert the list into a tree representation. The occurrence entries are converted into leaf nodes and put into a queue in ascending order. Two entries are removed from the queue merged onto an interior node and placed onto a second queue. The interior node sums the rassoc list from both nodes that are consumed. If there are no nodes in the primary queue a node will be removed from the secondary queue. The final node remaining is the root of the tree.</p>

<pre><code>def huffman(occlist)
    # build a huffman encoding binary tree from the occurrence list
    if occlist.empty? then
        puts "warning: no occurrences provided to build huffman tree"
        return nil
    end

    # create the initial queue with leaves, trees contain assoc lists
    leaves = occlist.map {|entry| Node.new([entry]) }
    interior = []
    deq = lambda {
        if (leaves.length &gt; 0) then leaves.delete_at 0 
        else interior.delete_at 0 end
    }

    # create interior nodes
    while leaves.length + interior.length &gt; 1
        l, r = deq.call, deq.call
        node = Node.new(l.value + r.value, l, r)
        interior &lt;&lt; node
    end

    deq.call
end
</code></pre>

<p>Interior nodes are only formed in the second portion of the Huffman build function. It will only consume leaf nodes or or other interior nodes that represent the edge of the tree. In this way the tree is built bottom up and balanced.</p>

<p>Next up is a bit of a utility function that takes a Huffman tree and a symbol and converts it into an array of 0s and 1s that represent the binary encoding of the symbol. Walking the tree is represented by a 0 or 1 for the left and right sub-trees respectively.</p>

<pre><code>def bits(ht, sym)
    # given a tree and symbol return an array of encoded bits  
    def bits_internal(ht, sym, enc)
        return enc, nil if not ht
        return enc, ht.value if ht.leaf?

        if ht.left.value.rassoc(sym) != nil
            enc &lt;&lt; 0
            bits_internal(ht.left, sym, enc)
        else
            enc &lt;&lt; 1
            bits_internal(ht.right, sym, enc)
        end
    end

    if ht.value.rassoc(sym) == nil
        puts "warning: no binary encoding for: '#{sym}'"
        return []
    end

    enc, value = bits_internal(ht, sym, [])
    if value[0][1] != sym 
        puts "warning: binary encoding: #{enc.inspect}, value: #{value.inspect} does not match '#{sym}'"
        return []    
    else
        return enc
    end
end
</code></pre>

<p>Bits makes use of the internally stored assoc lists to figure out which sub-tree should be visited. There are more efficient ways to implement this but this method is fine for example purposes. Bits() uses an internal method bits<em>internal() which is implemented in a recursive style similar to visit() from the previous binary tree code. Bits does a bit of error checking on the result from bits</em>internal() before returning the encoding value back to the caller.</p>

<p>Similar to bits_internal() above decode() will visit the tree using the binary representation in an array and choose the left or right sub-trees. Each time decode calls itself it uses a slice of the original array. If you are familiar with lisp code this method of traversal will look very familiar. When we have arrived at a leaf we know that we have found the symbol and it is returned.</p>

<pre><code>def decode(ht, bits)
    if ht.leaf? 
        return ht.value[0][1]
    end

    case bits[0]
        when 0 
            decode(ht.left, bits[1..bits.length-1])
        when 1 
            decode(ht.right, bits[1..bits.length-1])
    end
end
</code></pre>

<p>Now for a simple test of the Huffman encoding procedures above. The standard Lorem Ipsum text will be used. The <a href="http://lipsum.com/">history of this randomized Latin text</a> is very interesting, it is extracted from an early text on ethics by Cicero which was written in 45 BC.</p>

<pre><code>text = &lt;&lt;TEXT
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed est nulla, suscipit vel, tempus sit amet, viverra sit amet, dui. Nunc ultrices, purus vulputate luctus sodales, mauris augue elementum diam, in ornare neque nisi pharetra lectus. In hac habitasse platea dictumst. Phasellus justo turpis, laoreet id, semper at, convallis a, nisi. Duis iaculis erat et mauris. Donec a arcu. Ut sed risus vel mi mollis vehicula. Aenean laoreet, lorem dapibus aliquam ultrices, ante velit vestibulum sem, vel molestie arcu elit sit amet nunc. Vivamus venenatis placerat dui. Mauris porttitor varius velit. 
TEXT
</code></pre>

<p>To measure the effectiveness of the compression I will sum the bits of the raw text assuming each character were to take a modern standard byte (8bits). </p>

<pre><code># build a huffman encoding tree
ht = huffman(occurances(text))

# measure the compression
bitsum = 0
for c in text.scan /./m
    bitsum += bits(ht, c).length
end

orig = text.length
new = bitsum / 8.0
printf("original %s bytes, huffman encoded: %s bytes, ratio: %.2f%%", 
orig, new, new/orig*100)
</code></pre>

<p>And what do we get for our efforts on this humble piece of ruby code:</p>

<pre><code>original 598 bytes, huffman encoded: 374.25 bytes, ratio: 62.58%
</code></pre>

<p>~63% compression ratio. Not too shabby! </p>

<p><a href="http://jrepp.com/code/huffman.rb.txt">Click here to view the sample source code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jrepp.com/2008/01/08/build-huffman-compression-in-ruby/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Build a Binary Tree in Ruby</title>
		<link>http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/</link>
		<comments>http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/#comments</comments>
		<pubDate>Mon, 07 Jan 2008 06:29:24 +0000</pubDate>
		<dc:creator>proj</dc:creator>
				<category><![CDATA[algorithms]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/</guid>
		<description><![CDATA[Building algorithms in ruby is fun and rewarding. This binary tree doesn&#8217;t balance itself but it is simple and flexible using ruby blocks for visit and insert. Traversal style can be selected optionally to visit with :inorder, :preorder or :postorder. &#8230; <a href="http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Building algorithms in ruby is fun and rewarding. This binary tree doesn&#8217;t balance itself but it is simple and flexible using ruby blocks for visit and insert. Traversal style can be selected optionally to visit with :inorder, :preorder or :postorder.</p>

<p>Here&#8217;s your basic binary tree node representation, it holds a value and connects to a left and right node:</p>

<pre><code>class Node
    # binary tree representation
    attr_accessor :value, :left, :right
    def initialize(value=nil, left=nil, right=nil)
        @value, @left, @right = value, left, right
    end
end
</code></pre>

<p>Insert receives a value and a block which is responsible for doing the comparison. Comparison would normally be done with the Ruby <strong>&lt;=></strong> operator (this is the case later when the chunkybacon is inserted into the tree). The block passed to insert receives two elements and should yield a -1 for less, 1 for greater or 0 for equal. This function is implemented in terms of itself, notable for it&#8217;s lack of any balancing.</p>

<pre><code>def insert(node, v, &amp;block)
    # binary tree insert without balancing, 
    # block performs the comparison operation
    return Node.new(v) if not node
    case block[v, node.value]
        when -1 
            node.left = insert(node.left, v, &amp;block)
        when 1 
            node.right = insert(node.right, v, &amp;block)
    end
    return node
end
</code></pre>

<p>Visit receives the order that the nodes should be visited as well as a block that will act as a visitor of the stored values. This function is also implemented in terms of itself.</p>

<pre><code>def visit(n, order=:preorder, &amp;block)
    # visit nodes in a binary tree, order can be determinied
    # block performs visit action
    return false unless (n != nil)

    case order 
        when :preorder 
            yield n.value
            visit(n.left, order, &amp;block)
            visit(n.right, order, &amp;block)
        when :inorder
            visit(n.left, order, &amp;block)
            yield n.value
            visit(n.right, order, &amp;block)
        when :postorder
            visit(n.left, order, &amp;block)
            visit(n.right, order, &amp;block)
            yield n.value
    end
end
</code></pre>

<p>And here is a simple example of inserting the string &#8216;chunkybacon&#8217; into the binary tree. The result of visiting this tree using :inorder traversal is the string &#8216;abchknouy&#8217;.</p>

<pre><code>if $0 == __FILE__
    # a simple test case
    root = nil
    "chunkybacon".scan(/./m) {|c| root = insert(root, c) {|a,b| a&lt;=&gt;b}}
    visit(root, :inorder) {|v| print v}
end
</code></pre>

<p>You may like it you will see, try it, try it and leave a comment for me.</p>
]]></content:encoded>
			<wfw:commentRss>http://jrepp.com/2008/01/07/build-a-binary-tree-in-ruby/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

