Skip to content
Aug 27 / oneofthesedays

Hosting your Rails App

So you’ve got this sweet rails app made, and it runs fine locally, but you’d like other people to be able to use it, right? For that, you can sign up for an account with a web hosting company, but with so many, how do you choose one?

So far, I’ve used 2 different companies to host a rails app, Rackspace and Heroku. I’ve settled with Heroku, and couldn’t be happier.

Let’s look at Rackspace first. They’re a large company with a lot of infrastructure and support. When they say that have ‘fanatical customer support’, they really mean it. I needed their support 3 times during my time with them, and it took less than 5 minutes from identifying the problem, to talking with a real person from the company on how to fix my problem. They offer a 24/7 live chat service, and, living in New Zealand this is particularly useful as my 9AM is most web company’s 6PM.

What they offer is essentially a Virtual Private Server (VPS) solution, along with your run of the mill cloud storage, much like Amazon’s ubiquitous S3. Having an entirely blank server that you can install any OS and software on is a real charm. Instead of having too little control, you have too much control. If you’re a sysadmin geek or bash guru, you will have a blast, making this a service that probably wouldn’t suit someone with limited server experience.

I installed a typical LAMP stack, with Ubuntu 8.x as my distribution of choice. There really was no difference from setting up this server than if I were doing it on a local development machine, with the exception of DNS. Alongside setting up your /etc/hosts file, you will need to use a simple control panel to add in your domain.

This excess of freedom however does create a large amount of setup work to get your server running, especially when it comes to getting rails going. Following this tutorial on setting up Passenger and Capistrano, it took around 30 minutes and a couple of headaches.

One downside with Rackspace is that it is down to the administrator of the account to ensure the server is secure. For an experience sysadmin this is probably a non-issue, but for those competent enough to get a server running but lacking any solid security experience, it can be a tad dangerous.

To sum up, if you’re looking for complete control, extremely reasonable pricing, on demand support, and don’t mind getting your hands dirty in the command line, then you will probably love Rackspace.


Heroku is the opposite of Rackspace, swapping complete control for effortless setup and deployment. You can have your app deployed and running on their servers with 2 lines:

	heroku create MyAwesomeRailsApp
	git push heroku

If you then navigate to the URL it gives you, you will see your rails app. The only caveat to this is that you must be in the root directory of your app, and it must be in a git repository. If you’re starting a brand new rails app, the setup would look like this:

	rails Skynet
	cd Skynet
	git init
	git add .
	git commit -m "Importing Skynet"
	heroku create skynet
	git push heroku

This will then have your new rails app running at http://skynet.heroku.com.

Heroku also comes with a swathe of plugins. From hosted MongoDB installations, to effortless DNS, to Cron jobs and auto backups, it’s like a Sith lord having the keys to the Padawan training room, and all the Jedis are away on holiday.

Essentially, Heroku has done to hosting what rails did for web development. Convention over configuration, excellent user experience, and having everything “just work” with the minimum amount of effort.

Another neat thing with Heroku is that all of the commands and tools it provides are accessible through a command line tool. You can add domain names, add more concurrency to your app, run rake commands, and just about every other feature they provide.

So if you’re looking for a no fuss, beautifully designed, effortless workflow, rubyesque, affordable and scalable solution, you might want to check out Heroku. (However, it’s built on Rails, which can’t actually scale, so you may want to take that into consideration.)

Jul 19 / oneofthesedays

Ruby, Day 13: Ruby on Rails

Ruby on Rails

Web development that doesn't hurt

Let’s be honest, Ruby on Rails is probably why you’re learning Ruby. It couldn’t be an easier framework to switch to, not because of the framework necessarily, but because of all the support and information available on it. Rather than start another Rails tutorial series, I will be compiling lists of the tutorials and guides that have helped me, for each part of Rails. To give this some context, I am working on a Facebook application in rails and so this will flavour the resources I use.

Getting Started

  • Creating a weblog in 15 minutes
    This was the video I started with, and it blew my mind (as it said it would). It really doesn’t exaggerate when it says create a blog in 15 minutes. Start with this; you may not understand the intricacies and details, but you will be exposed to the features and power of rails, and after watching you will want to dive right in.
  • Getting Started with Rails
    From the official rubyonrails.org website, this is the official quickstart. The site offers a wealth of information on all aspects of rails, with this particular guide taking you through the blog creation in the video above.
  • A List Apart: Getting Started with Ruby on Rails
    If you’re a little more cynical about this whole rails thing, A List Apart nicely outlines the who, where, whys and hows of rails. Comparing it to PHP, this is a good read if you come from that background.
  • Ruby Forums
    When you start running in to strange errors and bugs, try doing a search at the Ruby Forums, or even posting a topic about it. There is an absolute wealth of knowledge, and a wonderful group of people.

Tools and Plugins

  • RubyGems User Guide
    While not part of rails, gems will become an essential part of your work. There is a wealth of functionality that you can add to your app with only a few commands. This guide will explain a little more about this tool.
  • 10 TextMate bundles/plugins to boost your Ruby on Rails development productivity
    Hopefully by now you’re getting the hang of rails. These Textmate plugins will make your time with rails even easier, and more enjoyable. (You are using Textmate, aren’t you?)
  • Getting Started with User Authentication
    Authlogic is an amazing plugin that lets you implement a fully functional user authentication system into your app with the absolute minimal amount of effort and code. This NETTUTS tutorial gives you a nice overview of how to implement it with an AJAX login form.
  • How-To Setup a Linux Server for Ruby on Rails
    This excellent and in-depth guide shows you how to set up a linux server to host your rails apps. It uses Capistrano, Phusion Passenger and GitHub to manage deployment in a way that, after you’ve done it once, you’ll wonder why you ever did it any other way.

Podcasts / Screencasts

  • Ruby on Rails Screencasts
    With over 200 screencasts, Railscasts is a fantastic resource for new tools, best practices, tips, and more.
  • Peepcode Rails from Scratch Screencast
    At just $9, this is fantastic value. The screencast shows you how to create a complete rails app, touching all of the important areas. Even though there is a lot of free information and tutorials available, the money spent on this is well worth it for the complete picture it provides on how to use rails. After deciding rails was too difficult for a period of time, this screencast got me right back into it and over the bump.
  • The Ruby Show
    If you spend a lot of time on public transport, The Ruby Show is a fantastic podcast to have with you. It goes over the latest Ruby news in a humorous and insightful way. A great laugh, and a great resource.

That’s it for now, but I think these resources should definitely be sufficient to get you going, without completely overloading you with information.

Jul 10 / oneofthesedays

Ruby, Day 12: Unit Testing

Before we get into unit testing in Ruby, I should apologise for the inconsistent posting of content. Amidst other work commitments, writing 1 post per day for 30 days is proving to be quite a challenge. I imagine at the end of this I will be able to write a post on “How Not to Write a Blog Post Series”. (Here’s a tip now: start writing at least a week before you start publishing)

We have a reasonable HTTP server working for us now. It can handle multiple simultaneous requests, log important information out to a file, and handle query parameters in the requested URL. While we can see that it works by running it, it would be good to have a more thorough test which we can run often and easily to ensure that any future changes do not break it. We will be using the Ruby Unit Testing Framework to achieve this.

Writing unit tests is a bit like using source control. Once you’ve set it up and are using it regularly, your level of sanity is slightly higher than if you weren’t using it. Similarly, once you’ve written some good unit tests, they will always be there for you to check that your code runs ok. An example of where a good set of unit tests would be handy is in a series of text manipulation operations. On a UNIX machine you may simply be checking for the existence of ‘\n’ for newlines, but if the same code is run on a Windows machine it may fail. By running the tests on each new machine, issues like this will become apparent with very little extra effort.

As the synchronised buffer is a crucial component of the multi-threaded web server, we will create a set of unit tests specifically to test this class.

A unit test class at its most basic looks like this:

class SyncBufferTest < Test::Unit::TestCase
	def test1
		assert(true)
	end
end

We are simply extending from the 'Test::Unit::TestCase' class and then defining our test in the method 'test1'. There is a convention that must be followed for defining new tests, but it is a simple one: each method must begin with 'test'. You can have whatever you like afterwards, but with 'test' at the front Ruby will know this is to be treated as a test case. The following are all valid test case names:

test1
test_my_test
test_if_exception_thrown
test123ABC

In order to examine more of the Ruby Unit Test API, we will need to make a quick addition to the SynchronisedBuffer class. Add the following line to the beginning of the initialize method. (i.e. before @capacity = capacity)

raise "Capacity must be greater than 0" unless capacity > 0

This does exactly what it says it will: raises an exception with the message “Capacity must be greater than 0″ unless the capacity is greater than 0. This stops someone creating a new buffer with a capacity of 0 or -1, which simply does not make sense. We can then write test cases to check that this actually happens.

def test_zero_capacity
      begin
          SynchronisedBuffer.new(0)
      rescue
        assert_equal("Capacity must be greater than 0", $!.message)
      end
  end

  def test_negative_capacity
      begin
          SynchronisedBuffer.new(-1)
      rescue
        assert_equal("Capacity must be greater than 0", $!.message)
      end
  end

‘test_zero_capacity’ and ‘test_negative_capacity’ are testing this line we just added in above. An exception should be thrown when the supplied capacity is less than or equal to 0. By using the error handling begin/rescue syntax we can catch any exceptions thrown. In the rescue section, we are calling ‘assert_equal’ which is a method provided by Test::Unit::TestCase to ensure that 2 values are equal. It is by convention that the expected value comes before the actual value, thus we are seeing if “Capacity must be greater than 0″ is equal to the message contained in the thrown exception. ‘$!’ is a global variable that contains the most recently thrown exception, and the message method returns the message that the exception contains.

If we now run this file we should see output like below

Test Cases Passings

Need more dots

A dot, ‘.’, represents a test that has passed. We can see results in a sentence as well, ’2 tests, 2 assertions, 0 failures, 0 errors’. We have 2 tests that asserted 1 statement each, and neither failed nor threw an error. If we remove the begin/rescue block from each test, and simply have ‘SynchronisedBuffer.new(0)’ then there would appear an ‘E’ for error instead of a ‘.’ as the thrown exception is not being rescued. Alternatively, if in our SynchronisedBuffer class we did not put in the line to raise the exception, we would see an ‘F’ in place of a ‘.’ as no exception is being thrown, and so no assertion is true. (This idea of writing tests that will fail before the function has actually been implemented is known as Test Driven Development).

The trick to unit testing is finding the right set of test cases that don’t overlap, but don’t miss anything out, either. We have tested the capacity of 0, which is neither a positive nor a negative number. It is in a class of it’s own and so we must include this test. -1, however, is a negative number just like -2 or -9,235,123. It is in a class with an infinite amount of other negative numbers. It would be impossible, and senseless, to test each negative number as we know that the condition only requires the number to be greater than 0. -1 is therefore sufficient to ensure that it will work with any negative number.

Let’s now test the empty? and full? methods.

def test_empty?
    buffer = SynchronisedBuffer.new(1)
    assert(buffer.empty?)
end

def test_full
    buffer = SynchronisedBuffer.new(1)
    buffer.put("item 1")
    assert(buffer.full?)
end

Here we are creating a new buffer with a valid capacity and then calling the ‘assert’ method. This simply checks if the value inside returns true. If it does, then the assertion is true. Without putting in any objects, empty? should return true. With a buffer of capacity 1, after putting in 1 item full? should return true.

def test_not_full
    buffer = SynchronisedBuffer.new(2)
    buffer.put("item 1")
    assert(!buffer.full?)
end

def test_not_empty
    buffer = SynchronisedBuffer.new(2)
    buffer.put("item 1")
    assert(!buffer.empty?)
end

These tests check that full? does not return true when it shouldn’t, and likewise with empty?.

We now have some simple test cases in place, but they only ensure a limited part of the functionality. What we really want to test are the put and get methods. As these methods can potentially put to sleep the thread that calls them, we have to be careful when testing them. The scenarios we must test for are

  • calling put while the buffer is empty
  • calling put while the buffer is full
  • calling get while the buffer is empty
  • calling get while the buffer is full
  • calling get while a thread is waiting on put
  • calling put while a thread is waiting on get

and we can translate them to the following test case names

def test_put_while_empty
end

def test_put_while_full
end

def test_get_while_empty
end  

def test_get_while_full
end

def test_get_waking_up_sleeping_thread
end

def test_get_waking_up_sleeping_thread
end

Our first test is rather simple

def test_put_while_empty
    buffer = SynchronisedBuffer.new(1)
    buffer.put("item 1")
    assert_equal('run', Thread.current.status)
end

We create a new buffer with a capacity of 1. After putting in 1 item, the buffer should be full. The buffer was empty when we called put, so the current thread should not be put to sleep while waiting for the buffer to empty. Thus we are asserting that the status of the current thread is equal to ‘run’. This concept of checking thread status will be very important for the rest of the test cases.

def test_put_while_full
    buffer = SynchronisedBuffer.new(1)
    buffer.put("item 1")
    thread = Thread.new(buffer) { |buffer|
        buffer.put("item 2")
    }
    Thread.pass  # run 'thread' to ensure it sleeps
    assert_equal('sleep', thread.status, 'Thread should be asleep waiting to put')
    thread.kill!
end

As before, we are creating a new buffer and placing in an item so it becomes full. As the buffer is full, any subsequent calls of put will cause the calling thread to sleep. As we don’t want to put our main thread to sleep, we create a new thread and pass in the buffer. By calling ‘put’ in this thread, we expect the thread to go to sleep. The line that comes after,

Thread.pass

is important as it without it, assert_equal might be called before the new thread has had a chance to actually call ‘put’. If this were the case, the thread would not yet be asleep, and so it’s status would be ‘run’, and the test would fail. ‘Thread.pass’ says to the Ruby thread scheduler “I’ve had enough, let another thread run for a while”. This would give the thread to actually call buffer.put and then we can test if it’s state is set to ‘sleep’. Lastly, while perhaps not necessary, we are destroying this new thread.

Having written this test case, the subsequent ones will follow much the same pattern.

def test_get_while_empty
    buffer = SynchronisedBuffer.new(1)
    thread = Thread.new(buffer) { |buffer|
        buffer.get
    }
    Thread.pass # run 'thread' to ensure it sleeps
    assert_equal('sleep', thread.status, 'Thread should be asleep waiting to get')
    thread.kill!
end
def test_get_while_full
    buffer = SynchronisedBuffer.new(1)
    buffer.put("item 1")
    buffer.get
    assert(buffer.empty?)
end
def test_get_waking_up_sleeping_thread
    buffer = SynchronisedBuffer.new(1)
    buffer.put("item 1")
    thread = Thread.new(buffer) { |buffer|
        buffer.put("item 2")
    }
    Thread.pass
    buffer.get
    thread.join
    assert(!thread.alive?)
end

This last test case has a minor addition, ‘thread.join’. We are testing the case when we are trying to put an item into a full buffer. This call will make the thread sleep until someone else calls ‘get’. Once get is called, we want to ensure that the thread waiting actually completes calling ‘buffer.put’ and so we use the join method. Join makes the thread behave synchronously, pausing the current thread until the joined thread has finished executing. If thread.alive? returns false, we know that this thread has died. For it to have died it must have completed executing, and for that to have happened it must have been woken up. Thus if the assertion is true, calling ‘get’ successfully woke up our thread. The same applies for calling get on an empty buffer.

def test_get_waking_up_sleeping_thread
    buffer = SynchronisedBuffer.new(1)
    thread = Thread.new(buffer) { |buffer|
        buffer.get
    }
    Thread.pass
    buffer.put("item 1")
    thread.join
    assert(!thread.alive?)
end

The results of all our test cases running should look something like the following:

Many passed test cases

You will learn to love this green

With all these test cases in place, we can be fairly certain that our implementation is correct and the any future bugs that may be introduced will be caught. Note that each test case only makes 1 assertion. While this is not entirely necessary, it is good practice. If a test case fails to pass, then there is no doubt as to which assertion failed. By placing 10 assertions into a test case ‘test_buffer’ it becomes much more difficult to find the exact cause of the failure as the name ‘test_buffer’ failing tells us nothing and what was tested. If multiple assertions are made in the same test case, either one or more of them is redundant, or the test cases should be separated into multiple separate cases.

Hopefully these examples give you an understanding of how to use the Ruby Unit Testing Framework, as well as how to test synchronised and multi-threaded classes. As always, please leave a comment or send me an email with any feedback, criticisms, questions or comments.

Jul 5 / oneofthesedays

Ruby, Day 11: Zlib and Gzip

While high speed internet is gradually working its way into most homes, it’s not absolutely everywhere, and when it is, your ISP may be limiting speeds during certain times of the day or for certain types of internet traffic. To make things easier on users, we can have our web servers compress the data they send. Instead of sending a 200kb Javascript file, it can first be compressed on the server to ~50kb, sent via TCP, and then uncompressed by the web browser. The amount of data being sent is 1/4 of the size, and so the total time is reduced significantly (time taken to compress and uncompress is considered negligible).

Unfortunately, my attempts to wrangle Ruby’s Zlib compression library have not been successful. The documentation is patchy, at best, with comments such as “???”, “TODO: better comments”, and the occasional snippet of Japanese. Given today’s time constraints, I’m shipping an example that I feel should work, but does not. I will work on resolving the issue, and update this post when it works.

socket.puts "HTTP/1.1 200 OK\n"
socket.puts "Connection: close\n"
socket.puts "Content-Type: application/gzip\n"

File.open(file, 'r') { |f|
    gz = Zlib::GzipWriter.new(socket, Zlib::BEST_COMPRESSION, Zlib::FINISH)
    while (line = f.gets)
        gz < < line
    end
    gz.close
}

I have modified the header to say we are sending the browser compressed data, and created a new Gzip writer object inside the response loop which spits out the contents of the requested file into the gzip writer, which has been given the socket. The documentation states that Gzipwriter can take an IO object, which socket happens to be, however when it runs an exception is thrown saying that the IO file has been closed. While possibly something simple, the answer has thus far eluded me.

If you have any suggestions or solutions, leave a comment or send a tweet/email. Stay tuned for updates.

Jul 4 / oneofthesedays

Ruby, Day 10: URI

Today we will take a brief look at Ruby’s Uniform Resource Identifier (URI) class in order to handle a wider range of requests in our web server. Currently, if we enter a URL such as ‘http://localhost:8080/test.html’ into our web browser we will receive a request that looks like ‘GET /index.html HTTP/1.1′. If we want to send any extra parameters such as an id (http://localhost:8080/test.html?id=123) then our server will naively assume that the file we are requesting is named ‘index.html?id=123′, and so it will not be able to find it.

By treating the request as an HTTP URI, we can interpret the request more intelligently, separating it into the filename and the query parameters.

We can modify our worker class like so

file = request.split(' ')[1]
file = '.' + file

uri =  URI.split(file)
file = uri[5]
query = uri[7]

The ‘split’ method will take a URL and split it up into 9 components which will be explained by using the URL ‘http://www.google.com/index.html?user=123′ as an example

Scheme - http
Userinfo - nil
Host - www.google.com
Port - 80
Registry - nil
Path - /index.html
Opaque - nil
Query = user=123
Fragment - nil

(Userinfo, registry, opaque and fragment are not likely to be used for an HTTP request so we can ignore them.)

The two components we are interested in are path and query, elements 5 and 7 respectively. By passing the middle part of our GET request to the split method, we can grab the 2 elements we need. ‘uri[5]‘ contains the file we want, so that will be passed to the File class. ‘uri[7]‘ contains the query parameters which can be dealt with later on.

Jul 2 / oneofthesedays

Ruby, Day 9: Logging

With most applications, it is important to keep an accurate and comprehensive record of any problems or significant events that occur during its execution. This typically involves sending this information to a log file stored on the computer. The Ruby Logger class lets us easily take care of this.

After importing ‘logger’, we can create a new logger like so:

log = Logger.new('server.log')
log.level = Logger::DEBUG

We want to save all messages into the file ‘server.log’. Alternatively, we could have entered ‘STDOUT’ to send all logging information to standard out. The second line sets the types of messages we want to log, which can be any of

FATAL:	an unhandleable error that results in a program crash
ERROR:	a handleable error condition
WARN:	a warning
INFO:	generic (useful) information about system operation
DEBUG:	low-level information for developers

(Taken from Ruby API)

The level you set log.level to will mean that any messages logged equal to, or above this level, will be logged.

log.level = Logger::WARN

This would result in FATAL, ERROR and WARN messages being logged, but INFO and DEBUG being ignored. In a production environment you may only want to log fatal errors, while in development mode you would most likely want to see DEBUG information.

To add a message to the log we can simply call a method that has the name of the level we are logging to.

log.fatal "Program will now crash"
log.error "An exception occurred, but we're onto it"
log.warn "Memory running low"
log.info "1000 customers have now signed up"
log.debug "The program is on line 25"

Let’s modify our multi-threaded server to include some logging.

log = Logger.new('server.log')
log.level = Logger::DEBUG

server = TCPServer.new('127.0.0.1', '8080')
log.info "Server started on 127.0.0.1:8080"

buffer = SynchronisedBuffer.new(100)

workers = []

for i in (1..40)
   workers[i] = Worker.new(buffer)
   workers[i][:name] = 'worker' + i.to_s
   log.info "'Worker #{workers[i][:name]} created"
end

while socket = server.accept
    log.info "New connection from #{socket.peeraddr[2]}"
    buffer.put(socket)
end

In each case we are printing out some useful, but non-critical information, so it is logged as info. We can also add logging to the worker class – the file has been attached for brevity’s sake.

If we run the web server with the changes made above, we will see entries in our server.log file like the following:

# Logfile created on Fri Jul 02 19:57:47 +1200 2010 by logger.rb/22285
...
I, [2010-07-02T19:57:54.231591 #14068]  INFO -- : New connection from localhost
I, [2010-07-02T19:57:54.232061 #14068]  INFO -- : worker1 has received a new socket

As each message has a precise timestamp, it can useful for tracking down bugs and performance issues. While ‘puts’ may be a simpler and quicker method for printing out this kind of information to standard out, the extra information and structure that logger provides makes it well worth using.

Jul 2 / oneofthesedays

Ruby, Day 8: Multi-Threading, Synchronisation, and Buffers

Now that you’ve seen the HTTP protocol in action, we can move on to a more advanced version. We will turning the server into a multi-threaded one, allowing it to handle multiple requests simultaneously. Previously, once we had accepted a socket we had to finish processing it until we could accept another one. This meant that if another person tried to connect while it was processing an earlier request, it would have to wait. By sending each request to a separate thread, more than one can be processed at a time resulting in a more responsive server for everyone.

The approach we will take to implement this is to use so-called circular buffer. It is essentially a ring with a start and an end. Items go into the end, and come out the front. As we add items, we move the back along to an empty slot. When we take items out, we also move the front along to an empty slot. When the front is the same as the back, we know the buffer is either completely full or completely empty. Lastly, the buffer will have a fixed capacity so that when the front is moved to a position greater than the capacity, it wraps back around and starts from 0.

When we accept a connection the returned socket will be placed into the buffer. A group of threads will continually be checking the buffer for new sockets, and when one is found a thread will take it and process it. This process is useful because it allows us to completely separate our request handling and response generation. We will call these threads ‘Worker’ threads, and the collection of them will be called a thread pool. Thus, a new request will be placed into the buffer and processed by a worker from the thread pool.

There is a slight caveat to working with threads that use any form of shared data structure (each thread has access to the same circular buffer). A thread can be interrupted at any time by the scheduler and so a number of problems could ensue. For example, let’s say we have a method called ‘put’, which takes a given object and stores it, and then updates the count of the number of stored objects. If the thread is interrupted in between the storing of the object and the updating of the count, the data structure is no longer consistent with itself. If there were no elements added before ‘put’ was called, the buffer will still say it is empty even though one object has been added. What’s worse is that these problems are wholly unpredictable, so we must find a way to ensure these problems can’t occur. We need a way of guaranteeing that once ‘put’ is called, no one else will access the buffer until the method has finished.

Ruby provides a class called ‘Monitor’ that will allow us to achieve this by letting us create ‘synchronised’ blocks of code. A synchronised block is one that can only be accessed by one thread at a time. A thread can get a lock on the method, and no other thread will have access to it until the lock is released. Let’s go over the logic we will need to implement for our buffer.

We need to put items in, and get items out.
The buffer will have a fixed size, and when it is full we should not be able to put anything more in.
When it is empty, we should not be able to get anything out.
If we want to get something while it is empty, it should pause until something has been put in, and then return it.
If we want to put in something while it is full, it should pause until something has been removed.

Let’s look at the implementation of this, then go over it.

require 'monitor'

class SynchronisedBuffer < Monitor

   def initialize(capacity)
      @capacity = capacity
      @front = 0
      @back = 0
      @elements = Array.new(capacity)
      @empty_cond = new_cond
      @full_cond = new_cond
      super()
   end

   def get
       synchronize do
           @empty_cond.wait_while {empty?}
           element = @elements[@front]
           @elements[@front] = nil
           @front = (@front + 1) % @capacity
           @full_cond.signal
           return element
        end
   end

   def put(element)
       synchronize do
           @full_cond.wait_while {full?}
           @elements[@back] = element
           @back = (@back + 1) % @capacity
           @empty_cond.signal
        end
   end

   def full?
     synchronize do
       (@front == @back and @elements[@front] != nil)
     end
   end

   def empty?
       synchronize do
           (@front == @back and @elements[@front] == nil)
        end
   end

end

We are extending the Monitor class, which gives us access to 'synchronize' and the wait/signal methods. Next, the buffer is initialised

def initialize(capacity)
   @capacity = capacity
   @front = 0
   @back = 0
   @elements = Array.new(capacity)
   @empty_cond = new_cond
   @full_cond = new_cond
   super()
end

After setting capacity, front position, back position, and creating a new array to hold our elements, we come to our first bit of the Monitor class. ‘new_cond’ is a method that returns a ConditionVariable object. This condition variable is an important construct that ensures the code will only execute when the condition is satisfied. We have a condition that elements can’t be removed when the buffer is empty (empty_cond) and a condition that we can’t add new elements when the buffer is full (full_cond).

def get
    synchronize do
        @empty_cond.wait_while {empty?}
        element = @elements[@front]
        @elements[@front] = nil
        @front = (@front + 1) % @capacity
        @full_cond.signal
        return element
     end
end

To have a section of code that is synchronised, we call the ‘synchronize’ method and have the code inside a do-end block. Once inside, we call the ‘wait_while’ method on the empty condition. We are passing in the method ‘empty?’ as our condition, and so the line can be read as ‘have any thread that calls this method wait as long as the buffer is empty’. Alternatively, we could write ‘wait_unless { !empty? }’ which would read ‘let any thread call this method unless the buffer is empty, in which case, make them wait’.

When the buffer is empty, any thread that calls get will be put to sleep. If the buffer isn’t empty, then the element at the front is removed, and the front position is moved along by 1. The line ‘@full_cond.signal’ will wake up a thread that was put to sleep waiting for the full buffer to have some space. We have just removed an item, therefore there is now space to put in a new item, so we can wake up, or ‘signal’ a sleeping thread.

def put(element)
    synchronize do
        @full_cond.wait_while {full?}
        @elements[@back] = element
        @back = (@back + 1) % @capacity
        @empty_cond.signal
     end
end

‘put’ is the opposite of get. Threads must wait if the buffer is full, and when there is space, an element is placed at the back of the buffer. Once this is done, ‘@empty_cond.signal’ is called and a thread who was put to sleep waiting for the buffer to have some items placed in it, is woken up. As only one thread is woken up at a time, it functions on a first come first served basis.

The final two methods we have already seen used above

def full?
  synchronize do
    (@front == @back and @elements[@front] != nil)
  end
end

def empty?
    synchronize do
        (@front == @back and @elements[@front] == nil)
end

If the front is the same position as the back we know that the buffer is either empty or full. We know it will be empty if the element at this position is nil, and full if the element at this position is not nil. These methods must also be synchronised as otherwise they could be interrupted in-between the two conditions, and a corrupted buffer state could result.

Sincere thanks must go to Robert Klemme from the Ruby Forum who helped me work out the bugs in this implementation, as well as Craig Taverner from his blog for introducing the monitor class.

So now we have a synchronised buffer which we can fill with incoming requests. What we need next is a thread pool full of workers who are ready to process the contents of the buffer. The class below should look fairly familiar to what was in the basic web server, with just a few additions.

require 'thread'

class Worker < Thread

    def initialize(buffer)
        super(buffer) { |buffer|
            begin
                loop do
                    socket = buffer.get

                    request = socket.readline

                    validGET = request.match(/GET .* HTTP\/1\.1/)

                    unless (validGET)
                        socket.puts "HTTP/1.1 400 Bad Request"
                        socket.close
                        next
                    end

                    file = request.split(' ')[1]
                    file = '.' + file

                    unless ( File.exists?(file) )
                        socket.puts "HTTP/1.1 404 File Not Found"
                        socket.close
                        next
                    end

                    socket.puts "HTTP/1.1 200 OK\n"
                    socket.puts "Connection: close\n"
                    socket.puts "Content-Type: text/html\n"
                    File.open(file, 'r') { |f|
                        while (line = f.gets)
                            socket.puts line
                        end
                    }

                    socket.close
                end
            rescue Exception => e
                $stderr.puts $!.inspect
            end
        }
    end
end

The main difference is at the top, and the bottom

class Worker < Thread

    def initialize(buffer)
        super(buffer) { |buffer|
            begin
                loop do
                    socket = buffer.get
					...
				end
			rescue Exception => e
   				$stderr.puts $!.inspect
			end
		}
	end
end

Extending the Thread class means we can treat Worker objects exactly as if they were threads. To do this however, we also need to implemenet the initialize method. When creating a regular thread, the instructions to execute are passed to it as a block

Thread.new { # do something }

Therefore, all we need to do is call ‘super’, as this will call the initialize method on the parent class, Thread. Placing the block after this will mean that it is executed by the Thread class. We want to pass in the buffer for it to use, so this is passed in as an argument and then into the block.

As each thread will be running outside of the main execution thread, we will not be informed of any errors that occur which can make bug finding difficult. If we wrap the code to be executed in Ruby’s equivalent of a try/catch block then we can grab any exceptions thrown and send them to standard error. ‘$stderr.puts $!.inspect’ is a neat shortcut that uses the $! global variable. This contains the most recently thrown exception, and so we are calling inspect on this and sending it to stderr.

Lastly, instead of executing the processing code once, we are looping infinitely. We can to continually check the buffer for new sockets with ‘buffer.get’. When the buffer is empty, ‘buffer.get’ will put the worker to sleep, and it will be woken up eventually when a new socket is placed into the buffer.

Last but not least, we need to create a number of workers, and set up our buffer.

server = TCPServer.new('127.0.0.1', '8080')

buffer = SynchronisedBuffer.new(100)

workers = []

for i in (1..40)
   workers[i] = Worker.new(buffer)
   workers[i][:name] = 'worker' + i.to_s
end

while socket = server.accept
    buffer.put(socket)
end

We are creating our buffer with a capacity of 100. This means that we can have 100 requests queued up until we have to start waiting for the buffer to be emptied. We are also creating 40 workers, and naming them so we can tell them apart. Lastly, we have our familiar ‘while socket = server.accept’ loop but instead of doing any processing, we simply put it into the buffer.

All things going well, your web server should now be capable of handling many simultaneous requests. Leave a comment, email me, send me a tweet or a message on Facebook with any questions, criticisms or comments.

Jun 29 / oneofthesedays

Ruby, Day 7: HTTP Protocol

We have created a simple server that responds with whatever text it received when we connect to it via TELNET. The uses for this are limited and so we want now to begin implementing the HyperText Transfer Protocol (HTTP) to enable us to show a web page. Being a standard, and universally adopted protocol, a web browser knows that if it uses HTTP to talk to a web server, they will both understand each other. HTTP defines a set of actions that a web server must carry out. These are:

  • OPTIONS – Returns information and options available for the server, or a specified resource. ‘OPTIONS *’ will tell you what the entire server is capable of, whereas ‘OPTIONS filename’ will tell you what it can do with that particular file.
  • GET – Returns the contents of the specified file or directory. ‘GET index.html’ will return the file index.html. The GET method is what we will be implementing.
  • HEAD – Works the same as GET, except the contents of the file or directory is not returned. Instead, meta-information is returned, such as the file-type or when it was last modified.
  • POST – Sending information along with a POST request will associate with the specified file. ‘POST index.php?foo=bar’ will send the value ‘bar’ to the file ‘index.php’, storing it under the name ‘foo.’
  • PUT – PUT is the opposite of GET: it sends a file and a location, and the server will store that file in that location.
  • DELETE – You can remove a specified file with ‘DELTE filename’, however there is no guarantee that the server will actually do this.
  • TRACE – Lists the exact servers a request must go through to reach the destination server. This may be a proxy server, for example.
  • CONNECT – Not used in HTTP 1.1, CONNECT is reserved for future implementations that may wish to set up some form of secure communication.

(Paraphrased and adapted from W3C)

For a basic web server we will only need to implement the GET method, as that will at least allow us to send content to the browser. We’ll also want to send back a few status codes. Status codes let the browser determine what the server has done. The most prevalent example is 404 – File Not Found. Equally as important is 200 – OK, which is returned when the server can successfully carry out the request. Along with these 2, we will send back 400 – Bad Request if something other than a GET request is sent. The entire server code is below, and each step is explained in detail afterwards.

require 'socket'

server = TCPServer.new('127.0.0.1', '8080')

while socket = server.accept
    request = socket.readline

    validGET = request.match(/GET .* HTTP\/1\.1/)

    unless (validGET)
        socket.puts "HTTP/1.1 400 Bad Request"
        socket.close
        next
    end

    file = request.split(' ')[1]
    file = '.' + file

    unless ( File.exists?(file) )
        socket.puts "HTTP/1.1 404 File Not Found"
        socket.close
        next
    end

    socket.puts "HTTP/1.1 200 OK\n"
    socket.puts "Connection: close\n"
    socket.puts "Content-Type: text/html\n"
    File.open(file, 'r') { |f|
        while (line = f.gets)
            socket.puts line
        end
    }

    socket.close
end

To test the server, create an html file called test.html in the same directory as the Ruby file containing the above code. Run the program, and point your browser to ‘http://127.0.0.1:8080/test.html’. Hopefully you should see the contents of the html file rendered by the browser.

The first 3 lines should look familiar.

server = TCPServer.new('127.0.0.1', '8080')

while socket = server.accept
    request = socket.readline

We’re setting the program to listen on TCP port 8080, and once a connection is made, we’re storing the received information in the ‘request’ variable.

After this we’re checking if the request is a valid HTTP 1.1 GET request.

validGET = request.match(/GET .* HTTP\/1\.1/)

We are using the regular expression ‘/GET .* HTTP\/1\.1/’ to check if the request fits the format ‘GET file HTTP/1.1′. ‘.*’ represents any sequence of characters, of any length. While this isn’t perfect, it should suffice for now.

unless (validGET)
	socket.puts "HTTP/1.1 400 Bad Request"
	socket.close
	next
end

‘unless’ functions identically to ‘if not’, so the code inside will only be executed if validGET is not equal to something other than nil. validGET will be nil if the request string did not match the regular expression. Hence, if the request doesn’t match we are sending back ‘HTTP/1.1 400 Bad Request’, and closing the socket. ‘next’ stops Ruby from executing the remaining contents of the loop, and forces it to start the loop again.

file = request.split(' ')[1]
file = '.' + file

Here we are simply finding the name of the file they requested the server to load. The GET request consists of 3 parts, with a space in between each: ‘GET’ ‘filename’ ‘HTTP/1.1′. Thus, the second element in the array returned by split will be the filename. We prepend the filename with a ‘.’ to ensure that the resulting file path is relative to the current directory. If they request index.html, the request will come through as /index.html. By prepending ‘.’ the file becomes ./index.html which is relative to the current directory.

unless ( File.exists?(file) )
    socket.puts "HTTP/1.1 404 File Not Found"
    socket.close
    next
end

Just as we checked for a correct GET request, we need to check if the requested file can actually be sent. File.exists? will return true if the given file exists on the computer. If not, we send back the famous 404 error.

socket.puts "HTTP/1.1 200 OK\n"
socket.puts "Connection: close\n"
socket.puts "Content-Type: text/html\n"

Having got to this point in the code, we can say that we have received a correctly formed request for a file that we are able to send back. What we do next is part of the HTTP protocol, and is us telling the client that the request can indeed be processed. This is the 200 OK response. Following this, we send a ‘Connection: close’ to indicate that we have finished sending back this message. ‘Content-Type: text/html’ is the beginning of a new message, and it is telling the client that they can expect to receive an html file. If we want our web server to handle images, and other file types, we will have to add in logic to send back the correct content type. Today we will just stick with text/html.

File.open(file, 'r') { |f|
    while (line = f.gets)
        socket.puts line
    end
}

Finally, we can send the contents of the html to the client. We simply open up the requested file and loop over each line in it. For each line, we send it back through the socket.

There you have it. A web server that handles HTTP 1.1 GET requests to successfully send back an HTML file, while also handling malformed requests, and requests for non-existent files. Granted, there is a large amount of functionality that we didn’t implement, including some important error checking, but hopefully it highlights the overall mechanisms of HTTP and web servers.

Jun 28 / oneofthesedays

Ruby, Day 6: Sockets and TCP

To kick off this web-server series we’ll create a server which, at it’s simplest, consists of just 3 lines of Ruby code. It can be called an echo server, and it does just that; it sends back to you whatever you send to it. Find a large, empty room, yell out “Ruby is awesome” and you’ll hear it repeated (this also works in a small room filled with Ruby developers). If you’re not fond of going outside (or socialising) however, you can use an echo server to achieve the same effect. Send the text “Ruby is awesome” to this echo server, and it will reply with the same message.

This simple example expresses a concept that is fundamental to how any server works. They listen for requests, do something, and (sometimes) send something back. A web server is listening to requests to load a particular page, and in response, it sends back the contents of that page. A mail server listens for requests to download mail messages, and in response it sends the latest emails. As most servers follow this basic pattern, standards have been developed and adopted that describe in what format the requests and responses should be. Warning: You are now entering the world of acronyms.

TCP, UDP, HTTP, STMP, NTS, FTP, ARP, SSH, IMAP, DNS, DHCP, IRC, to name but a few, are all protocols that describe requests and responses. Today we will be using the Transmission Control Protocol (TCP) to implement our simple echo server. To describe TCP, we can look at it’s simpler cousin: the Universal Datagram Packet (UDP). UDP is a protocol that specifies how data should be broken up into chunks, in order to send over a network. It also specifies how that data knows where it has to go. By pairing up a packet with an Internet Protocol (IP) address, network hardware will (all things going well) be able to send a packet it receives to the correct destination. The packet also contains the IP address of the sending machine, so that the receiver knows where to send its response. TCP does exactly what UDP does, except it does a few extra things to ensure that packets sent do actually arrive correctly. This includes error checking and correction, as well as ensuring that the sender isn’t sending more packets that the receiver can handle at once.

If we want to create an ‘echo’ server, we will need to listen for packets that are sent to us via the TCP protocol. When we receive one, we need to extract the message, and send it back again. Here’s the server:

require 'socket'

server = TCPServer.new('127.0.0.1', '8080')
socket = server.accept
socket.puts socket.readline

Firstly, we’re creating a new TCPServer object that will listen for requests at the IP address 127.0.0.1 and on port 8080. 127.0.0.1 is an IP address reserved to represent the local machine. If the operating system is asked to send a packet to this address, it realises that this is actually the same machine and so it is a good way of testing without requiring 2 computers. Port 8080 is a common port used for alternative web servers, which we will be creating eventually, so we may as well use it now. As the network card receives packets for the entire operating system, and not specific programs (mail client, chat client, etc.), a program must associate itself with a port number. The operating system can then look at the port number in the received packet and determine which program it should go to. Packets with port number 21 will be sent to your FTP client, port 110 packets will go to your mail client. We want packets with port 8080 to be sent to our Ruby program. To ensure there is no confusion, only one program can take one port number at a time, so if you receive an error about port 8080 being use, you may have another program already attached to it.

When you run this program, nothing will happen. The program will continue running forever and nothing will be displayed to the screen. Why doesn’t the program end? The answer is in the second line, server.accept. Accept is a method which causes the program to pause until the TCPServer object receives a packet. As soon as it does, it returns a Socket object which contains the received packet and information on the machine who sent it. This socket acts as our end of the connection created between us and them, a tube of the internet, if you will allow the analogy. Whatever is sent down this tube, we will eventually receive, and vice versa. Therefore, to create our echo, we come to the third line.

socket.puts socket.readline

We read the socket, which will contain the message they sent, and we use ‘puts’ to sent it straight back. To see this in action, we can use telnet to connect to our running server and send it a message.

telnet 127.0.0.1 8080

The screencast below will show it working.

A Simple TCP Echo Server in Ruby from Sam Dalton on Vimeo.

Voila! Hopefully you managed to get it working, and understood why it works as well. While this is a server in just 3 lines, it’s not particularly useful. Once someone connects, the programs ends and the server closes. To make it run indefinitely we will need some kind of infinite loop.

while socket = server.accept
    socket.puts socket.readline
    socket.close
end

In Ruby, every operation returns something, even variable assignment. The returned value will be equal to whatever value was set. So in this case, when server.accept detects a connection and returns a socket object, the while loop will see that an object was returned. An object is not regarded as false, and so the loops is entered. When the loop ends, server.accept is called once more, and the process repeats indefinitely. The program will never end because server.accept will always wait until it gets a connection, and hence will always return a non-nil value to the while loop condition. Just as with files, we also have to close the socket. We didn’t do this in the first example because the program exited straight afterwards so there wasn’t much need.

So there you have it, a 3 (or 4) line echo server that uses the TCP protocol.

Jun 27 / oneofthesedays

Ruby, Day 5: Building a Web Server

Looking through the upcoming topics that I plan to cover, I noticed a connection: they all cover important aspects of a web server. Threads let us handle multiple requests simultaneously, queues let us store pending requests and a mutex ensures that it won’t become corrupted from multiple threads accessing it. We also need a way of logging requests, interpreting URLs, and certainly, we’d like to test it to be sure it all works. So instead of covering the remaining topics in my proposed order, I will instead create sections of the web server that each focus on a different part of the Ruby API. Not only will this provide a more hands-on and relevant example, but it will also serve as a basic tutorial on how the HTTP protocol and web servers work. The resulting server will by no means be any competition to existing Ruby servers such as Mongrel or Thin, but it will serve us well for the purposes of teaching.

First up will be sockets, as with out that we have no hope of communicating with anyone.