Ruby HTTP server from the ground up
Getting something to work quickly is important when you are starting out, but if you want to become better at programming it's important to know a few levels below the abstractions you are used to be working with.
When it comes to Web development it's important to know how HTTP works, and what better way to do that than go through baptism by fire and build our own HTTP server.
How does HTTP look anyway?
HTTP is plaintext protocol implemented over TCP so we can easily inspect what requests look like (HTTP 2 is actually no longer plaintext, it's binary for efficiency purposes).
One way to look at request structure is to use curl with -v
(verbose) flag:
curl http://example.com/something -H "x-some-header: value" -v
Outputs
GET /something HTTP/1.1
Host: example.com
User-Agent: curl/7.64.1
Accept: */*
x-some-header: value
And in response we get
HTTP/1.1 404 Not Found
Age: 442736
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 03 Jul 2021 15:02:03 GMT
Expires: Sat, 10 Jul 2021 15:02:03 GMT
...
Content-Length: 1256
<!doctype html>
<html>
<head>
...
The plan
Let's define the steps we are going to need:
- Listen on a local socket for incoming TCP connections
- Read incoming request's data (text)
- Parse the text of the request to extract method, path, query, headers and body from it
- Send the request to our app and get a response
- Send the response to the remote socket via the connection
- Close the connection
With that in mind let's setup the general structure of our program:
require 'socket'
class SingleThreadedServer
PORT = ENV.fetch('PORT', 3000)
HOST = ENV.fetch('HOST', '127.0.0.1').freeze
# number of incoming connections to keep in a buffer
SOCKET_READ_BACKLOG = ENV.fetch('TCP_BACKLOG', 12).to_i
attr_accessor :app
# app: a Rack app
def initialize(app)
self.app = app
end
def start
socket = listen_on_socket
loop do # continuously listen to new connections
conn, _addr_info = socket.accept
request = RequestParser.call(conn)
status, headers, body = app.call(request)
HttpResponder.call(conn, status, headers, body)
rescue => e
puts e.message
ensure # always close the connection
conn&.close
end
end
end
SingleThreadedServer.new(SomeRackApp.new).start
Listening on a socket
A "full" version of the implementation of listen_on_socket
looks like that:
def listen_on_socket
Socket.new(:INET, :STREAM)
socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_REUSEADDR, true)
socket.bind(Addrinfo.tcp(HOST, PORT))
socket.listen(SOCKET_READ_BACKLOG)
end
However, there's a lot of boilerplate here and all this code could be replaced with:
def listen_on_socket
socket = TCPServer.new(HOST, PORT)
socket.listen(SOCKET_READ_BACKLOG)
end
Parsing a request
Before we start let's define what an end should look like. We want our server to be Rack compatible. Here's an example I found of what Rack expects in its environment as a part of the request:
{"GATEWAY_INTERFACE"=>"CGI/1.1", "PATH_INFO"=>"/", "QUERY_STRING"=>"", "REMOTE_ADDR"=>"127.0.0.1", "REMOTE_HOST"=>"localhost", "REQUEST_METHOD"=>"GET", "REQUEST_URI"=>"http://localhost:9292/", "SCRIPT_NAME"=>"", "SERVER_NAME"=>"localhost", "SERVER_PORT"=>"9292", "SERVER_PROTOCOL"=>"HTTP/1.1", "SERVER_SOFTWARE"=>"WEBrick/1.3.1 (Ruby/2.2.1/2015-02-26)", "HTTP_HOST"=>"localhost:9292", "HTTP_ACCEPT_LANGUAGE"=>"en-US,en;q=0.8,de;q=0.6", "HTTP_CACHE_CONTROL"=>"max-age=0", "HTTP_ACCEPT_ENCODING"=>"gzip", "HTTP_ACCEPT"=>"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "HTTP_USER_AGENT"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36", "rack.version"=>[1, 3], "rack.url_scheme"=>"http", "HTTP_VERSION"=>"HTTP/1.1", "REQUEST_PATH"=>"/"}
We are not going to return all of these params, but let's at least return the most important ones.
First thing we are going to need is to parse a request line, it's structure probably looks familiar to you:
MAX_URI_LENGTH = 2083 # as per HTTP standard
def read_request_line(conn)
# e.g. "POST /some-path?query HTTP/1.1"
# read until we encounter a newline, max length is MAX_URI_LENGTH
request_line = conn.gets("\n", MAX_URI_LENGTH)
method, full_path, _http_version = request_line.strip.split(' ', 3)
path, query = full_path.split('?', 2)
[method, full_path, path, query]
end
After the request line come the headers:
Let's remember how they look like, each header is a separate line:
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Content-Length: 1256
MAX_HEADER_LENGTH = (112 * 1024) # how it's defined in Webrick, Puma and other servers
def read_headers(conn)
headers = {}
loop do
line = conn.gets("\n", MAX_HEADER_LENGTH)&.strip
break if line.nil? || line.strip.empty?
# header name and value are separated by colon and space
key, value = line.split(/:\s/, 2)
headers[key] = value
end
headers
end
As a result we get:
{
"Cache-Control" => "max-age=604800"
"Content-Type" => "text/html; charset=UTF-8"
"Content-Length" => "1256"
}
Next we need to read the body, not all requests are expected to have a body, only POST and PUT:
def read_body(conn:, method:, headers:)
return nil unless ['POST', 'PUT'].include?(method)
remaining_size = headers['content-length'].to_i
conn.read(remaining_size)
end
Having all the blocks from above we can finish our simplified implementation:
class RequestParser
class << self
def call(conn)
method, full_path, path, query = read_request_line(conn)
headers = read_headers(conn)
body = read_body(conn: conn, method: method, headers: headers)
# read information about the remote connection
peeraddr = conn.peeraddr
remote_host = peeraddr[2]
remote_address = peeraddr[3]
# our port
port = conn.addr[1]
{
'REQUEST_METHOD' => method,
'PATH_INFO' => path,
'QUERY_STRING' => query,
# rack.input needs to be an IO stream
"rack.input" => body ? StringIO.new(body) : nil,
"REMOTE_ADDR" => remote_address,
"REMOTE_HOST" => remote_host,
"REQUEST_URI" => make_request_uri(
full_path: full_path,
port: port,
remote_host: remote_host
)
}.merge(rack_headers(headers))
end
# ... (methods we implemented above)
def rack_headers(headers)
# rack expects all headers to be prefixed with HTTP_
# and upper cased
headers.transform_keys do |key|
"HTTP_#{key.upcase}"
end
end
def make_request_uri(full_path:, port:, remote_host:)
request_uri = URI::parse(full_path)
request_uri.scheme = 'http'
request_uri.host = remote_host
request_uri.port = port
request_uri.to_s
end
end
end
Sending a response
Let's skip the Rack app part for a time, we are going to implement it later, and implement sending a response:
class HttpResponder
STATUS_MESSAGES = {
# ...
200 => 'OK',
# ...
404 => 'Not Found',
# ...
}.freeze
# status: int
# headers: Hash
# body: array of strings
def self.call(conn, status, headers, body)
# status line
status_text = STATUS_MESSAGES[status]
conn.send("HTTP/1.1 #{status} #{status_text}\r\n", 0)
# headers
# we need to tell how long the body is before sending anything,
# this way the remote client knows when to stop reading
content_length = body.sum(&:length)
conn.send("Content-Length: #{content_length}\r\n", 0)
headers.each_pair do |name, value|
conn.send("#{name}: #{value}\r\n", 0)
end
# tell that we don't want to keep the connection open
conn.send("Connection: close\r\n", 0)
# separate headers from body with an empty line
conn.send("\r\n", 0)
# body
body.each do |chunk|
conn.send(chunk, 0)
end
end
end
That's an example of what we can send:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 53
<html>
<head></head>
<body>hello world</body>
</html>
Rack App
Any Rack app needs to return status
, headers
, body
. Status is an integer, body is an array of strings (chunks).
With that in mind let's make an app that's going to read files from the file system based on the request path:
class FileServingApp
# read file from the filesystem based on a path from
# a request, e.g. "/test.txt"
def call(env)
# this is totally unsecure, but good enough for the demo
path = Dir.getwd + env['PATH_INFO']
if File.exist?(path)
body = File.read(path)
[200, { "Content-Type" => "text/html" }, [body]]
else
[404, { "Content-Type" => "text/html" }, ['']]
end
end
end
Final word
That was pretty simple, was it not?
Because we skipped all the corner cases!
If you want you dive into the topic in greater detail I encourage you to jump into WEBRick code, it's implemented in pure Ruby. You can learn more about Rack from this article.
If you want see the full version of the code we just wrote, you can check out the Github repo: github.com/TheRusskiy/ruby3-http-server/blob/master/servers/single_threaded_server.rb.
Next we are going to experiment with different ways of processing requests: single threaded server, multi-threaded server and even Fibers / Ractors from Ruby 3.
Head over to part #2.