Comet Tail

Comet

Comet is a technique for HTTP content delivery that uses persistent connections between server and browser. HTTP 1.1 introduced a "chunked" transfer encoding, allowing data to be sent in "chunks" to the client over an extended period of time. Most modern web browsers will render or process each chunk of data as it is received, creating a stream of page items, executable scriptlets, or raw data.

tail -f

When invoked with the "-f" option, the unix command tail will continuously print newly added lines from a given file to stdout. This is pretty handy when monitoring logs, debugging servers, tracking progress of a running application, etc.

"tailing" logs over HTTP using Comet

Implementing tail -f in Erlang

First off, we are going to need a mechanism for "tailing" a given file. Handily, Erlang's io module provides a function for reading from a file, line by line...here is a simple set of functions that read a file from beginning to end, printing each line, and then blocking while waiting for further appendations to that file (and printing each newly appended line):

		
 %% read a line from a file and print it...
 %% ... if at end of file, try and try again until a new line appears
		
 tail(File) ->
  case io:get_line(File, "") of
   eof  -> tail(File);
   Line ->	io:fwrite(Line),
            tail(File)
  end.

  %% example:
  %% {ok, MyFile}=file:open("/path/to/my/file", [read]).
  %% tail(MyFile). 	
		
	

A Tiny Comet Server

Erlang is a pretty handy language for writing networked IO-driven applications and it's support for TCP socket manipulation out of the box (using the OTP gen_tcp module), make it pretty easy to hack together a tiny, specific-purpose webserver. Let's fire up erl and execute a few lines to create an HTTP server listening on a non-standard port (in this example, port 9999):


 Erlang R13B02 (erts-5.7.3) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
 Eshell V5.7.3  (abort with ^G)

 1> {ok, ListenSock}=gen_tcp:listen(9999, [list,{active, false},{packet,http}]).
 {ok,#Port<0.455>}
 2> {ok, Sock}=gen_tcp:accept(ListenSock).
	
gen_tcp:accept/1 will block until an HTTP request is recieved...let's use curl or wget to initiate a request:

 > curl -v http://127.0.0.1:9999
 * About to connect() to 127.0.0.1 port 9999 (#0)
 *   Trying 127.0.0.1... connected
 * Connected to 127.0.0.1 (127.0.0.1) port 9999 (#0)
 > GET / HTTP/1.1
 > User-Agent: curl/7.19.4 (universal-apple-darwin10.0) libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
 > Host: 127.0.0.1:9999
 > Accept: */*
 > 

curl will now be blocking, waiting on a response from the server...and if we switch back to our erl session, we will see that the gen_tcp:accept/1 function has returned a socket for us to which we can write data. So what should we write? Well, first, let's inform the waiting client that we will be sending data using "chunking". We will send an HTTP 200 back to the client, but instead of providing "Content-Length" header we will send back "Transfer-Encoding: chunked". We won't close the connection, and we won't send any data right now.:

 ...
 2> {ok, Sock}=gen_tcp:accept(ListenSock).	
 {ok,#Port<0.461>}
 3> gen_tcp:send(Sock, "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=UTF-8\r\nCache-Control: no-cache\r\nTransfer-Encoding: chunked\r\n\r\n").
 ok

curl will receive the sent headers, and keep the connection open, awaiting more data

 < HTTP/1.1 200 OK
 < Content-Type: text/html; charset=UTF-8
 < Cache-Control: no-cache
 < Transfer-Encoding: chunked
 < 

Now, let's write a small function to send data to the client. For each "chunk" of data, we will need to calculate the size, in bytes, and send that number, hex-encoded, ahead of each chunk. The termination of the chunk is indicated by a carriage return: "\r\n".

 4>  SendChunk=fun(Sock, Data) -> Length=io_lib:format("~.16B", [length(Data)]),
	gen_tcp:send(Sock, Length),
	gen_tcp:send(Sock, "\r\n"),
	gen_tcp:send(Sock, Data),
	gen_tcp:send(Sock, "\r\n") 
      end.
 #Fun<erl_eval.12.113037538>
 5> SendChunk(Sock, "Hello World!").
 ok
 6> SendChunk(Sock, "Chunky Bacon! Chunky Bacon!").
 ok
 7> SendChunk(Sock, "Hexapodia as the key insight").
 ok

Flipping back over to our curl session, we should see the chunks arriving:
	
 C
 Hello World!
 1B
 Chunky Bacon! Chunky Bacon!
 1C
 Hexapodia as the key insight
			
Now let's see how a browser behaves with a chunked stream...open up Firefox or Google Chrome (don't try Safari or IE yet--we'll explain why later on in this post) and point it to http://127.0.0.1:9999/ . Switch back to our erl session so we can accept the browser's request, create a socket, and start writing chunks to the browser...

 8> {ok, BrowserSock}=gen_tcp:accept(ListenSock). 
 {ok,#Port<0.483>}
 9>  gen_tcp:send(BrowserSock, "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=UTF-8\r\nCache-Control: no-cache\r\nTransfer-Encoding: chunked\r\n\r\n").
 ok
 10> SendChunk(BrowserSock, "<head><body>").
 ok	
 11> SendChunk(BrowserSock, "<h1>One conversation centered on the ever accelerating progress of technology and changes...</h1>").	 
 ok	
 12> SendChunk(SafariSock, "<script language='JavaScript'>alert('8.68 days since Fall of Relay');</script>").
 ok	

If you watch your browser window, you will see the page render the <H1> text as it is sent/received, and then execute the subsequently sent scriptlet

Putting it all together

Pretty cool, eh? Ok, let's go back and take a look at the tail function we created earlier...it's pretty obvious we can send each line to a socket instead of merely printing to standard out. Having covered the basic steps above, I won't walk through the complete solution line by line, but you can download the source here ( comet_tail.erl ) and I'll walk though a usage example below.

Let's fire up erl and compile the module, and start up a server on a given port (the first parameter of comet_tail:start/2). The second parameter is the directory prefix that will be prepended to file paths provided as request parameters:

	
 > erl -sname node1@localhost
 Erlang R13B02 (erts-5.7.3) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]
 Eshell V5.7.3  (abort with ^G) 
 (node1@localhost)1> c(comet_tail).
 {ok,comet_tail}
 (node1@localhost)2> comet_tail:start(9999, "/var/log/apache2/").

Now you have a tail -f server running on port 9999, serving up files in the /var/log/apache2/ directory (if you are on a recent version of OSX this is where your apache access logs are). Files streams are requested by passing a query string consisting of the filename relative to the base dir we started the server with. (Again, for OS X you should be able to view a tailed file with the above configuration by visiting http://127.0.0.1:9999/comet-tail?access_log)

Here is a live example streaming a tail of this webserver's acess log: http://www.hccp.org:9999/comet-tail?access_log You can play along at home by visiting pages on http://www.hccp.org and watching requests show up at tail of stream.

Various and Sundry Hacks

Scrolling

While the chunked stream will render, line by line in your browser, the default window behavior is to keep the view of a file positioned at the top, even as new content loads below the fold. To provide a tail-like scrolling experience, we will need to use a bit of HTML-hackery. One way to accomplish this scroll is to place the streaming content in an IFrame, and use JavaScript to constantly reposition the IFrame scroll to the end of the window. Here is the script that we use to manipulate the IFrame:

	
 function loop() { 
	if(navigator.userAgent.indexOf('Safari') > 0) { 	
		window.frames['logframe'].scrollBy(0 , 10000000); 
	} else  { 
		document.getElementById('logframe').contentWindow.scrollBy(0, 100000); 
	} 
	setTimeout('loop()', 1000); 
 } 
 loop(); 

IFrames and the JavaScript Security Model

Another pain point is that most browsers will not allow a parent window to access/manipulate DOM properties of a child IFrame (this is a good thing as it prevents phishing attacks that wrap legit windows and steal passwords, debit bank accounts and other nasty stuff). In order to scroll an internal frame, both the parent and the child IFrame must be served by the same host and port. Since I didn't want to have to proxy the Comet-tail server in order to provide a uniform endpoint to consumers of the parent and child IFrame, I decided to enhance the purpose-specific webserver to also serve up a static parent wrapper for the IFramed stream.

Buffering in Safari (and IE?)

Safari will not render content until a 1KB buffer has been filled. To get around this, we add a 1KB empty comment to each line we send.


 %% generate a ~n kb comment for Safari and other lame-o browsers in order to force flush...
 generate_comment_for_length([], Length) ->
	generate_comment_for_length("<!--", Length - 4);
 generate_comment_for_length(Comment, 0) ->
		lists:append(Comment, "-->");	
 generate_comment_for_length(Comment, Length) ->
	generate_comment_for_length(lists:append(Comment, " "), Length -1).