2009/10/21

OCaml for the impatient - part 2, reading standard input

Now that "Hello, world" is out of the way, let's look at the next step in writing our log processing script. We want to be able to read lines from standard input.
OCaml has an "input_line" function, which takes a channel as a parameter. Standard input is available without doing any extra work as the channel 'stdin'. So, to read a line of text from standard input, we just need to call;

input_line stdin

In OCaml, parameters to functions are not enclosed with braces. There are plenty of places you do need to use braces, but surrounding parameters is not one of them.
To do something useful with our line of text, we'll need to assign it to a variable. OCaml uses the "let" keyword for that, but we'll also need to declare a scope for our variable, using "in". So, the code we want is something like this;

let line = input_line stdin in
... a block of code ...


To read all the lines of text from standard input, until we run out, we'll need a loop of some kind. OCaml does allow us to write code in an imperative style, so we can just use a while loop. While loops are pretty basic in OCaml (and in functional languages in general), because you're meant to do much cleverer things with recursion.
Our loop will need to terminate when we run out of lines to read. The simplest way to do that in OCaml is to catch the "End_of_file" exception. I'm not a big fan of using exceptions for normal control flow, but we can live with it for now.
So, a simple program to read lines from standard input and echo them to standard output might look like this;

try
while true do
let line = input_line stdin in
Printf.printf "%s\n" line
done;
None
with
End_of_file -> None
;;


There are a few points to note here. The semi-colon after "done" is necessary to tell OCaml that it should evaluate everything before the semi-colon first, and then evaluate the stuff after it. Without the semi-colon, you'll get a syntax error. It needs to be ";" and not ";;" because we're not terminating a block of code.
We're using "End_of_file -> None" to discard the exception we get when "input_line" tries to read a line that isn't there. "None" is a bit like "nil" in Ruby or "undef" in Perl.
The "None" at the end of the block is required to keep the return type consistent. OCaml, like Perl or Ruby, returns whatever is the last thing evaluated in the block. OCaml requires that the try block return the same type of value as we will return if we catch an exception and end up in the with block. If you try running the code without the "None" before with, you'll get an error saying "This expression has type 'a option but is here used with type unit" (OCaml error messages are translated from French, so they're a little idiosyncratic).
The type "unit" is the empty type, like void in Java. Our with block is returning "None", so it's return type is unit, and the try block must return the same type.
If we change the with to say;

End_of_file -> "whatever"

Then the error becomes "This expression has type string but is here used with type 'a option". So, we can make it go away by replacing the earlier None with any string constant (like "hello" - try it).
The last thing we're going to do is to take our inline "Printf.printf" statement and turn it into a function call, so that we can do something more interesting with line later.
In OCaml, functions are values we can assign to variables. So, to define a function, we use the same let statement as we used to define line. Here's a function to print out our line;

let out = Printf.printf "%s\n";;

Notice that we terminated the statement without specifying what is supposed to be printed. If you type the code above into the interactive ocaml interpreter, you get this;

# let out = Printf.printf "%s\n";;
val out : string -> unit =


That's saying "the value out is a function which takes a single string and doesn't return anything". OCaml decided we were defining a function because we didn't specify all the arguments. If we had, it would have simply evaluated it and assigned the result to 'out'.
Now, we can simplify our program a little;

let out = Printf.printf "%s\n";;

try
while true do
let line = input_line stdin in
out line
done;
None
with
End_of_file -> None
;;


Try running the program like this "ls | ocaml foo.ml", or by compiling it as shown in part 1.
So far, we haven't done anything very useful overall, but we've covered reading from standard input and writing to standard output, looping over all the available input, assigning each line to a variable and calling a function with that variable.
In part 3, we'll actually do something!

2009/10/17

OCaml for the impatient - part 1, "Hello, world!"

After an inspiring presentation by Tom Stuart at LRUG earlier this week, I decided to have a go at learning a functional programming language. I picked OCaml, partly on recommendation from Tom, although he's since changed his advice to recommend Clojure because of it's "smart and comprehensive treatment of mutable state".

I found quite a few introductions to OCaml, but many of them seem quite theoretical and a bit dry. Plus, I'm really impatient. I tend to learn best by just diving in and trying to perform a real-world task. So, I decided to start by writing a basic filter to take apache access log lines and spit out some fields in CSV form.


I know this is not necessarily the best task for OCaml, and I'm sure my coding style is missing the point and doing many things the 'wrong' way. But, at least it's a practical way to get my hands dirty with the language.

First of all, let's do the traditional "Hello, world!" exercise;

1. Installing OCaml

Lots of instructions for various platforms here;

http://wiki.cocan.org/getting_started_with_ocaml

I used the INRIA binary on Mac OS X.

2. Hello World

A basic hello world program in OCaml. Using a text editor (preferably vim, but I've heard that other programs sort of work), put the following into a file called "hello.ml"


Printf.printf "Hello, world!\n";;


Some points to note;

1. The 'printf' function comes from the 'Printf' module. This module is available to all programs, so there's no need to do anything special to gain access to it.

2. ";;" denotes the end of a chunk of code.

You can run this code interactively using the "ocaml toplevel"



$ ocaml
Objective Caml version 3.10.1

# Printf.printf "Hello, world!\n";; <--- type this and press enter Hello, world! - : unit = () # <-------- Ctrl-D to exit $


To run the program from the command-line there are several options.

Option 1


$ ocaml hello.ml


This is the easiest way, using the ocaml interpreter.

Option 2


$ ocamlc hello.ml -o hello
$ ./hello


This creates an executable "hello" file of OCaml bytecode, which should run on any machine with the OCaml bytecode interpreter, "ocamlrun" installed. It also creates a "hello.cmo" and "hello.cmi" file, which are OCaml object and interface files, respectively. This seems to be an intermediate step, since "hello" runs just fine if you delete them.

Option 3


$ ocamlopt hello.ml -o hello
$ ./hello


This time, "hello" is a compiled binary which can run standalone. There will also be the "hello.cmi" file, as well as "hello.cmx" (OCaml *native* object file), and "hello.o" (object file for your OS).

The interactive toplevel is great for noodling around, and option 1 is what I spend most of my time doing. Option 3 is what I will use if and when I write something that I want to use in production.

So, that's "Hello, world" out of the way. In the next part, we'll look at reading from standard input and writing to standard output.

OCaml for the impatient - part 2, reading standard input