2009/10/21

OCaml for the impatient - part 2, reading standard input

Now that "Hello, world" is out of the way, let's look at the next step in writing our log processing script. We want to be able to read lines from standard input.
OCaml has an "input_line" function, which takes a channel as a parameter. Standard input is available without doing any extra work as the channel 'stdin'. So, to read a line of text from standard input, we just need to call;

input_line stdin

In OCaml, parameters to functions are not enclosed with braces. There are plenty of places you do need to use braces, but surrounding parameters is not one of them.
To do something useful with our line of text, we'll need to assign it to a variable. OCaml uses the "let" keyword for that, but we'll also need to declare a scope for our variable, using "in". So, the code we want is something like this;

let line = input_line stdin in
... a block of code ...


To read all the lines of text from standard input, until we run out, we'll need a loop of some kind. OCaml does allow us to write code in an imperative style, so we can just use a while loop. While loops are pretty basic in OCaml (and in functional languages in general), because you're meant to do much cleverer things with recursion.
Our loop will need to terminate when we run out of lines to read. The simplest way to do that in OCaml is to catch the "End_of_file" exception. I'm not a big fan of using exceptions for normal control flow, but we can live with it for now.
So, a simple program to read lines from standard input and echo them to standard output might look like this;

try
while true do
let line = input_line stdin in
Printf.printf "%s\n" line
done;
None
with
End_of_file -> None
;;


There are a few points to note here. The semi-colon after "done" is necessary to tell OCaml that it should evaluate everything before the semi-colon first, and then evaluate the stuff after it. Without the semi-colon, you'll get a syntax error. It needs to be ";" and not ";;" because we're not terminating a block of code.
We're using "End_of_file -> None" to discard the exception we get when "input_line" tries to read a line that isn't there. "None" is a bit like "nil" in Ruby or "undef" in Perl.
The "None" at the end of the block is required to keep the return type consistent. OCaml, like Perl or Ruby, returns whatever is the last thing evaluated in the block. OCaml requires that the try block return the same type of value as we will return if we catch an exception and end up in the with block. If you try running the code without the "None" before with, you'll get an error saying "This expression has type 'a option but is here used with type unit" (OCaml error messages are translated from French, so they're a little idiosyncratic).
The type "unit" is the empty type, like void in Java. Our with block is returning "None", so it's return type is unit, and the try block must return the same type.
If we change the with to say;

End_of_file -> "whatever"

Then the error becomes "This expression has type string but is here used with type 'a option". So, we can make it go away by replacing the earlier None with any string constant (like "hello" - try it).
The last thing we're going to do is to take our inline "Printf.printf" statement and turn it into a function call, so that we can do something more interesting with line later.
In OCaml, functions are values we can assign to variables. So, to define a function, we use the same let statement as we used to define line. Here's a function to print out our line;

let out = Printf.printf "%s\n";;

Notice that we terminated the statement without specifying what is supposed to be printed. If you type the code above into the interactive ocaml interpreter, you get this;

# let out = Printf.printf "%s\n";;
val out : string -> unit =


That's saying "the value out is a function which takes a single string and doesn't return anything". OCaml decided we were defining a function because we didn't specify all the arguments. If we had, it would have simply evaluated it and assigned the result to 'out'.
Now, we can simplify our program a little;

let out = Printf.printf "%s\n";;

try
while true do
let line = input_line stdin in
out line
done;
None
with
End_of_file -> None
;;


Try running the program like this "ls | ocaml foo.ml", or by compiling it as shown in part 1.
So far, we haven't done anything very useful overall, but we've covered reading from standard input and writing to standard output, looping over all the available input, assigning each line to a variable and calling a function with that variable.
In part 3, we'll actually do something!