Parse RSS/Atom feeds with a simple, clojure-friendly API. Uses the Java ROME library, wrapped in Clojure records.
Usable for parsing and exploring feeds. No escaping of potentially-malicious content is performed, and we've inherited any quirks that ROME itself has.
Supports the following syndication formats:
- RSS 0.90
- RSS 0.91 Netscape
- RSS 0.91 Userland
- RSS 0.92
- RSS 0.93
- RSS 0.94
- RSS 1.0
- RSS 2.0
- Atom 0.3
- Atom 1.0
For a more detailed understanding about supported feed types and meanings, the ROME javadocs (under com.sun.syndication.feed.synd) are a good resource.
There is only one function, parse-feed, which takes a URL and returns a StructMap with all the feed's structure and content.
The following REPL session should give an idea about the capabilities and usage of feedparser-clj.
Load the package into your namespace:
user=> (ns user (:use feedparser-clj.core) (:require [clojure.contrib.string :as string]))
Retrieve and parse a feed:
user=> (def f (parse-feed "http://gregheartsfield.com/atom.xml"))
parse-feed also accepts a java.io.InputStream for reading from a file or other sources (see clojure.java.io/input-stream):
;; Contents of resources/feed.rss
<rss>
...
</rss>
user=> (def f (with-open
[feed-stream (-> "feed.rss"
clojure.java.io/resource
clojure.java.io/input-stream)]
(parse-feed feed-stream)))
f is now a map that can be accessed by key to retrieve feed information:
user=> (keys f)
(:authors :categories :contributors :copyright :description :encoding :entries :feed-type :image :language :link :entry-links :published-date :title :uri)
A key applied to the feed gives the value, or nil if it was not defined for the feed.
user=> (:title f)
"Greg Heartsfield"
Feed/entry ID or GUID can be obtained with the :uri key:
user=> (:uri f)
"http://gregheartsfield.com/"
Some feed attributes are maps themselves (like :image) or lists of structs (like :entries and :authors):
user=> (map :email (:authors f))
("scsibug@imap.cc")
Check how many entries are in the feed:
user=> (count (:entries f))
18
Determine the feed type:
user=> (:feed-type f)
"atom_1.0"
Look at the first few entry titles:
user=> (map :title (take 3 (:entries f)))
("Version Control Diagrams with TikZ" "Introducing cabal2doap" "hS3, with ByteString")
Find the most recently updated entry's title:
user=> (first (map :title (reverse (sort-by :updated-date (:entries f)))))
"Version Control Diagrams with TikZ"
Compute what percentage of entries have the word "haskell" in the body (uses clojure.contrib.string):
user=> (let [es (:entries f)]
(* 100.0 (/ (count (filter #(string/substring? "haskell"
(:value (first (:contents %)))) es))
(count es))))
55.55555555555556
This library uses the Leiningen build tool.
Simply add the following dependency in your project.clj.
Distributed under the BSD-3 License.
Copyright (C) 2010 Greg Heartsfield