Wednesday, February 29, 2012

How to use Apache Httpget in Clojure

Clojure's slurp function is great for quick retrieval of websites, but sometimes it's not enough and you'll get an 403 error.

As a quick fix I suggest using the Apache httpclient library. You will need httpclient-4.x.x.jar, httpcore-4.x.x.jar and commons-logging-1.x.x.jar in your classpath, which you can find here:
 The function http-get is a simple replacement for slurp. You can easily extend, e.g. adding timeouts, etc.


(ns apache-http
  (:import
   (org.apache.http.client ResponseHandler HttpClient methods.HttpGet)
   (org.apache.http.impl.client BasicResponseHandler DefaultHttpClient)))

(defn http-get [ url ]
  (let [client (DefaultHttpClient.)
        httpget (HttpGet. url)
        handler (BasicResponseHandler.)]
    (try 
      (let [body (.execute client httpget handler)]
        body)
      (catch Exception e (println e))
      (finally
        (println "shutdown connection")
        (.shutdown (.getConnectionManager client))
        )) ))

usage is simple:
(http-get "http://www.google.com")

No comments:

Post a Comment