This will be a brief overview of the hash data structure, how it is implemented and how it can be manipulated in Ruby.
What is a Hash?
A Hash is a data structure that organizes data in key-value pairs. It is also referred to as a dictionary or associative array. These properties of a hash make it one of the most useful tools in a programmer’s pocket and it is available in the core libraries of most programming languages.
Why is it efficient?
To better understand how data is stored into hashes and why they are so efficient, we need to rewind back the bases of the array
data structure. As you know array
allows us to access no matter which element if we know the index of element beforehand.
array = [1,2,3,4,5,6,7] => [1, 2, 3, 4, 5, 6, 7] array[4] => 5
As an example, we could just use Array
for a simple list where we would know the range of it. An integer list [1..100]
we would just use the integer
as the key. But it’s not always so simple in real life situations. Do not misunderstand me. Array
are really powerful and we are using them every day. We are also using Array
as values in Hash
.
I just don’t want to get any deeper into ‘Arrays’, this post is simple to about them.
Some while ago I wrote a post about Arrays.
Basics
The interesting thing to note is that hashes are unique for each Ruby process. The murmur hash
seeds it with a random value, which results in a different hash for a particular key for each Ruby process.
Let’s say we have a multi-language web page and we want to store all the languages we provide in one place.
countries = {lv: "Latvia", ee: "Estonia", lt: "Lithuania", ru: "Russia", de: "Germany", fi: "Finland", se: "Sweden"} => {:lv=>"Latvia", :ee=>"Estonia", :lt=>"Lithuania", :ru=>"Russia", :de=>"Germany", :fi=>"Finland", :se=>"Sweden"}
Now we have a Hash
with seven countries, as :key
we are using two-digit ISO code, for simplicity and the :value
is the full name of that country.
And now we can easily get each of them.
countries[:lv] => "Latvia" countries[:ee] => "Estonia" countries[:lt] => "Lithuania"
Let’s add two more.
countries[:us] = "United States" => "United States" countries[:ch] = "Switzerland" => "Switzerland"
We can very easily check if they have been added.
countries.has_key? :us => true countries.has_key? :ch => true
or
countries.value? "United States" => true
Next question that could raise is, how we can delete something from a hash?
It might be simpler as you expected.
countries.delete(:ru) => "Russia" countries.delete(:ee) => "Estonia" countries.has_key? :ee => false countries.has_key? :ru => false
There is plenty of other methods
you can call on hash
, tha you can check in ruby-docs.
More advanced Ruby Hash Techniques
I am a Ruby developer so I use hashes
a lot and I thought I have seen everything there is.
Ruby is an excellent language and it has some neat trick hiding up its sleeve. From the ‘Basic’ part I wrote above, you might think it’s just a dump key-value system.
We still have our countries
hash
and if we will try to access a county that doesn’t exist in our hash
? It will return nil
that’s the default value for every hash unless you had specified something else.
countries[:jp] => nil
You can set whatever you want for your default values, this might be a really useful and powerful thing to use.
If we pass a block into the constructor, we are able to generate default values programmatically.
hash = Hash.new { |hash, key| "#{key}: #{ Time.now.to_i }" } => {} hash[:x] => "x: 1445412975" hash[:y] => "y: 1445412979" hash[:z] => "z: 1445412981" hash[:z] = "Some text..." => "Some text..." hash => {:z=>"Some text..."}
In the example above I added to set the default value as timestamp, so you can see that the default value can be generated dynamically.
One of the problems that I see in ruby hash
is that it dies without any noise. We all are just humans and we all can make mistakes. Who hasn’t made a type?
To avoid this pain that is caused by typos we could raise an exception.
countries = Hash.new { |hash, key| raise ArgumentError.new("Woops... country '#{ key }' is not specified") } => {} countries[:lv] = "Latvia" => "Latvia" countries[:de] ArgumentError: Woops... country 'de' is not specified from (irb):20:in `block in irb_binding' from (irb):22:in `yield' from (irb):22 from /home/lauris/.rvm/rubies/ruby-2.2.3/bin/irb:11:in `<main>'
This might be useful in refactoring or debugging your code. It’s much less obtrusive then, for example, monkey-patching the Hash class. I would still suggest you to use Hash.fetch
in new code.
We also have the power to control default value even after a hash has been initialized. To manage this we need to use the default
and default_proc
setters.
hash = Hash.new => {} hash[:first] => nil hash[:secind] = "Some random text..." => "Some random text..." hash => {:secind=>"Some random text..."} hash.default = "Placeholder!" => "Placeholder!" hash[:first] => "Placeholder!" hash => {:secind=>"Some random text..."}
and
hash.default_proc = Proc.new { Time.now.to_i } => #<Proc:0x00000000ffea10@(irb):33> hash[:third] => 1445417143 hash => {:secind=>"Some random text..."}
Conclusion/lesson
In my programming career till this day, there hasn’t been any issue that was impossible to solve. You just need to dig till you have gotten what you need. Like these default values for ruby hash
, there was a real-life situation where this saved my life.