YD Blog: 2010

2010年11月18日星期四

2010年11月3日星期三

Clojure Stream


(define fibs
  (stream-cons 0
    (stream-cons 1
      (stream-add (stream-cdr fibs) fibs))))

The code above is an excerpt from SICP that generates an infinite Fibonacci sequence. I've tried to migrate it to Clojure.

The following code is the implementations of the stream concept mentioned in SICP and the the Fibonacci function. The essential part is this: (defmacro stream-cons [e s] ...). Because Clojure's cons requires that s to be a ISeq or Seqable, that makes the "(stream-cdr fibs)" part always fails. Because this piece of code is executed before the outer stream-cons ever happened. By the time we execute "(stream-cdr fibs)", we haven't had the value of "fibs" yet.


(defn pair [x y] (cons x [y]))
  
(defn pair-car [p] (first p))

(defn pair-cdr [p] (second p))

(defmacro stream-cons [e s]
  `(pair ~e (delay ~s)))
  
(defn stream-car [s]
  (pair-car s))
  
(defn stream-cdr [s]
  (force (pair-cdr s)))
  
(defn stream-map
  ([f s]
    (stream-cons
      (f (stream-car s))
      (stream-map f (stream-cdr s))))
  ([f s & ss]
    (let [step (fn step [ss]
                  (when (not (empty? ss))
                    (stream-cons (map stream-car ss) (step (map stream-cdr ss)))))]
      (stream-map #(apply f %) (step (cons s ss))))))
    
(defn stream-add [& ss]
  (apply stream-map + ss))
  
(defn take-stream [s count]
  (when (> count 0)
    (stream-cons
      (stream-car s)
      (take-stream (stream-cdr s) (dec count)))))
      
(defn print-stream [s]
  (let [step (fn step [s]
                (when (not (empty? s))
                  (print (stream-car s) "")
                  (recur (stream-cdr s))))]
    (step s)
    (println)))
    
(def fibs
  (stream-cons 0
    (stream-cons 1
      (stream-add (stream-cdr fibs) fibs))))
      
(def ones (stream-cons 1 ones))
(def twos (stream-add ones ones))

(print-stream (take-stream twos 10)) ; => 2 2 2 2 2 2 2 2 2 2
(print-stream (take-stream fibs 10)) ; => 0 1 1 2 3 5 8 13 21 34

Interesting enough that I haven't thought of a way to seamlessly integrate the above stream code into Clojure. If we implement it using (deftype LazyStream [] clojure.lang.Seq ...) that would make cons a function instead of a macro. Therefore I temporarily think that it's impossible to seamlessly integrate the implementation.

2010年10月31日星期日

Definitions of Imperative & Functional Programming from SICP

Programming without any use of assignments is accordingly known as functional programming.

In contrast to functional programming, programming that makes extensive use of assignment is known as imperative programming.

2010年10月30日星期六

求对数

下面是求以 1.12 为底，2.7 的对数。本方法极其低效，仅仅标记一下以备以后参考。


(ns t
  (:require [clojure.contrib.generic.math-functions :as math]))

(defn abs [x]
  (if (< x 0) (* -1 x) x))

(defn close-enough? [delta]
  #(< (abs (- %1 %2)) delta))
  
(defn calc-log [lg lg-level acc-stack]
  (let [acc (math/pow 1.12 lg)
        close? (close-enough? 0.00000000000000000001M)
        lg-inc (/ 1 (math/pow 10 lg-level))
        finer-lg-inc (/ 1 (math/pow 10 (inc lg-level)))]
    (if (close? acc 2.7)
      lg
      (if (> acc 2.7)
        (recur (+ (- lg lg-inc) finer-lg-inc) (inc lg-level) acc-stack)
        (recur (+ lg lg-inc) lg-level (conj acc-stack acc))))))
      
(println (calc-log 1 0 []))

2010年10月26日星期二

空间

《Structure and Interpretation of Computer Programs》中提到，“The ability to create pairs whose elements are pairs is the essence of list structure's importance as a representational tool. We refer to this ability as the closure property of cons.”这话表面上是说 cons 作为 LISP 的核心，能够把自己产生的 pair 作为 cons 的参数进而去产生新的组合 pair，是列表数据结构在表示上非常本质的东西。“closure property”我倾向于翻译成封闭性。因为在代数中，给定 f(x) = y，如果任意 x 属于 Z，且 y 也属于 Z，那么 Z 就对 f 是封闭的。

当然我们可以很容易地想到高中数学里面的列表。这跟 LISP 中的列表是完全对应的。正是这种特性造就了 LISP 语言的强大。但是这种模式对于即使不熟悉数学的程序员来说也应该似曾相识。比如 JUnit4 和 JMock 里面用的 Hamcrest Matcher。你可以任意组合各种 matcher 达到想要的效果。比如 assertThat(result, is(not(greaterThan(10)))。它使得 JUnit 摆脱了不停地往 Assert 类里面增加方法的局面。也是为什么 JUnit4 要绑定 Hamcrest Matcher 发布的原因。如今你很难看到一个优秀的测试框架里面不用 Hamcrest Matcher 的。Matcher 的模式就是：makeMatcher(someMather) === AnotherMatcher。也就是说，从逻辑上，makeMatcher 和 Matcher 这个抽象集合组成了一个代数空间。

找到这样的空间，我们就可能找到了非常强大而优雅的工具。比如 SICP 里面提到的 painter。我们可以生成一个给定 painter 的水平镜像 painter、垂直镜像 painter、扭曲 painter，等等。这点对于我们做面向对象设计的时候也极有启发价值。想想 Hamcreset Matcher……

2010年8月30日星期一

快速排序

看了《The Joy of Clojure》的快速排序，觉得比较丑，而且不通用。所以自己又写了一个。不过有一个比不上书里面的。书里面的支持 lazy-seq，而下面这个不支持。可能正是因为要支持 lazy-seq 所以书里面的那个才会那么繁琐吧……


(defn- split-by [pred coll]
  [(filter pred coll) (remove pred coll)])

(defn qsort-by [comparator work]
  (letfn [(step [ret [pivot & rst :as work]]
            (if-not work
              ret
              (let [[left right] (split-by #(< (comparator % pivot) 0) rst)]
                (if (seq left)
                  (recur ret (concat left [pivot] right))
                  (recur (conj ret pivot) rst)))))]
    (seq (step [] (seq work)))))

(println (qsort-by - [57 66 72 27 16]))
(println (qsort-by - []))
(println (qsort-by - nil))

;=> (16 27 57 66 72)
;=> nil
;=> nil

快速排序（升序）我理解为：给定一个轴（pivot)，遍历所有还没有排序完整的元素（rst），把小于轴的元素放到轴的左边（left），把不小于轴的元素放到轴的右边（right）。完全排序好的元素放到结果集（ret）里。如果左边有元素，说明找到了比轴还要小的元素。那么结果集不动，因为还没有找到最小的元素。如果左边没有元素，说明当前的轴就是最小的，把它加入结果集。

split-by 不同于 split-with。split-with 在遇到第一个不满足条件的元素后就终止了。而 split-by 会把整个集合都遍历一遍（这里的实现是遍历了两遍）。

如果需要排序，请用 Clojure 核心库的 sort 函数。这个函数转调 java.util.Arrays.sort 函数，它是经过高度优化的。我的测试表明我的函数比标准库的慢了两三个数量级…… -_-!

2010年8月25日星期三

Named arguments in Clojure

Clojure doesn't provide direct support for named arguments, which are supported under some popular dynamic languages like Ruby, Python. The following Python code was exerpted from `The Joy of Clojure':


def slope(p1=(0,0), p2=(1,1)):
return (float(p2[1] - p1[1])) / (p2[0] - p1[0])
slope((4,15), (3,21))
#=> -6.0
slope(p2=(2,1))
#=> 0.5
slope()
#=> 1.0

It calculates the slope of a line, which can be determined by two given points. The following Clojure code mimics the named arguments:


(defn slope
([] (slope [0 0] [1 1]))
([{:keys [p1 p2] :or {p1 [0 0] p2 [1 1]}}] (slope p1 p2))
([p1 p2]
(/ (- (float (p2 1)) (p1 1)) (- (p2 0) (p1 0)))))
(slope [4 15] [3 21])
;=> -6.0
(slope {:p2 [2 1]})
;=> 0.5
(slope)
;=> 1.0

I think it's sufficient for most of the cases. However, if you want to go deeper, let's use the defnk from clojure.contrib.def.


-------------------------
clojure.contrib.def/defnk
([fn-name & fn-tail])
Macro
  Define a function accepting keyword arguments. Symbols up to the first
 keyword in the parameter list are taken as positional arguments.  Then
 an alternating sequence of keywords and defaults values is expected. The
 values of the keyword arguments are available in the function body by
 virtue of the symbol corresponding to the keyword (cf. :keys destructuring).
 defnk accepts an optional docstring as well as an optional metadata map.

Unfornately, no examples were given in the doc string. Here's a simple example:


(require '[clojure.contrib.def :as def])
(def/defnk t [a :b 1] [a b])
(t 3)
;=> [3 1]
(t 3 :b 4)
;=> [3 4]

It should be noted that, the optional/named arguments must be put at the tail of the argument list. i.e. You can't write this: (def/defnk t [a :b 1 c] [a b c]) If you do, weird thing would happen. Or, your function may not even compile.


(def/defnk t [a :b 1 c] [a b c])
(t 1 2 3)
;=> [1 1 nil]
(def/defnk t [a :b 1 c d] [a b c d])
;=> java.lang.Exception: Unable to resolve symbol: d in this context

2010年8月16日星期一

在 Clojure 中处理异常的重要方法

Clojure 中虽然使用了 Java 的异常处理机制。但是，Clojure 很难自然地自定义自己的异常。我在与 Java 类库进行交互就时恰恰遇到了这种需求。下面的代码是与 svn-kit 进行交互的代码，它们提供了 svn-kit 的一个 wrapper。


(defmacro- try-catch-svn-ex [& exprs]
  `(try ~@exprs
    (catch org.tmatesoft.svn.core.SVNAuthenticationException e#
      :auth-ex)
    (catch org.tmatesoft.svn.core.SVNException e#
      (if (re-matches #".*404 Not Found.*" (.getMessage e#))
        nil
        (throw e#)))))

(defn svn-get-file! [svn-repo file-path local-file]
  (with-open [os (output-stream (file local-file))]
    (try-catch-svn-ex
      (.getFile svn-repo file-path -1 (SVNProperties.) os)
      local-file)))

调用 svn-get-file! 时可能会出现用户名密码无效的问题，这时候我希望能给用户重新输入的机会。但是又不想被其它的异常干扰。这时候我可以选择将 SVNAuthenticationException 暴露出去，但是明显捕获这样一个异常是很让外层函数头疼的事。同时，自定义 Clojure 异常在外部捕获更让人头疼。所以，我在捕获了 SVNAuthenticationException 后返回一个 :auth-ex。

这种异常处理机制的最大的问题就是回到 C 语言时代检查函数返回值的方式上。这种方式写出来的程序会比较繁琐。最好的办法是用 Stuart Chouser 写的 clojure.contrib.error-kit 库。它提供了类似 Common Lisp 的异常处理体系。比传统的 try...catch 要强大很多。现在，我用 error-kit 库重写上面的函数：


(require '[clojure.contrib.error-kit :as ek])

(ek/deferror *svn-auth-error* [] [msg]
  (:msg msg)
  (:unhandled (ek/throw-msg Exception)))
    
(defmacro- try-catch-svn-ex [& exprs]
  `(try ~@exprs
    (catch org.tmatesoft.svn.core.SVNAuthenticationException e#
      (ek/raise *svn-auth-error* (.getMessage e#)))
    (catch org.tmatesoft.svn.core.SVNException e#
      (if (re-matches #".*404 Not Found.*" (.getMessage e#))
        nil
        (throw e#)))))

(defn svn-get-file! [svn-repo file-path local-file]
  (with-open [os (output-stream (file local-file))]
    (try-catch-svn-ex
      (.getFile svn-repo file-path -1 (SVNProperties.) os)
      local-file)))

注意我用 raise 调用代替了 :auth-ex 返回值。如果捕获到了权限异常，那么我们就 raise 一个 error。这个 error 必须用 deferror 函数定义。这个 *svn-auth-error* 在没有处理函数来处理它时会通过 throw-msg 调用抛出 Exception 异常，异常的消息内容就是 :msg 所指定的消息。

注意 *svn-auth-error* 后面的第一个括号表示“父”error 是谁。这个父子关系内部通过标准库的 derive 方法定义。这里它没有父 error，所以留空。这时调用 svn-get-file! 的函数就可以拿到这个 error，可以选择让栈爆掉，也可以选择在异常抛出点继续执行。这里我们选择简单地处理后重新执行函数：


(defn svn-get-file-ex! [svn-repo file-path local-file]
  (let [ret (ek/with-handler
              (svn-get-file! svn-repo file-path local-file)
              (ek/handle *svn-auth-error* [msg]
                (println (str "Error getting " file-path ", authentication failed"))
                (rm-scm-repo-username!)
                (rm-scm-repo-password!)
                (get-scm-repo-username!)
                (get-scm-repo-password!)
                (svn-get-file-ex! (get-scm-repo) file-path local-file)))]
    (if
      (nil? ret)
        (ek/raise *get-scm-file-error* (str "404 not found: " file-path))
      ret)))

注意此时对 svn-get-file-ex! 的递归调用不能用 recur。很遗憾，可能是因为 with-handler 或 handle 宏展开后定义了新的函数或者 loop。同时也请注意 deferror 时的 :unhandled 后面的 throw-msg 不要用 (throw (Exception. msg)) 来代替。如果这样做，你会发现异常是抛出去了，但是却捕获不到。原因是 :unhandled 后面期望跟的是一个函数定义。具体可以参看 throw-msg 的实现。

更多关于 error-kit 的信息，比如 continue，请参阅：ANN: clojure.contrib.error-kit。

但是如果你不需要 error-kit 里的 continue 相关的功能的话，也可以使用 clojure.contrib.condition。这个库比较容易使用。而且还带了一个 print-stack-trace 方法，可以打印出比较干净的栈。示例可以参看 contrib 库源代码里面的 example 目录中的 condition/example.clj。

这两种库实现上都利用 Java 的异常来跳出栈。所以，如果你想捕获所有的异常，包括这两种库抛出来的，可以用 catch Throwable。值得一提的是，condition 库的 print-stack-trace 是通用的。不仅可以打印 condition 库抛出来的异常，也可以打印其它的异常。

contrib 库中还有一个 except，也是用来处理异常的。作者跟 condition 库是一个人。根据作者的原话，condition 库是 except 库的加强。

2010年8月6日星期五

Functions added in Clojure 1.2

The following function docs were generated from Clojure 1.2.0-RC1, in the namespace of clojure.core.

-------------------------
clojure.core/agent-error
([a])
Returns the exception thrown during an asynchronous action of the
agent if the agent is failed. Returns nil if the agent is not
failed.
-------------------------
clojure.core/bound?
([& vars])
Returns true if all of the vars provided as arguments have any bound value, root or thread-local.
Implies that deref'ing the provided vars will succeed. Returns true if no vars are provided.
-------------------------
clojure.core/case
([e & clauses])
Macro
Takes an expression, and a set of clauses.

Each clause can take the form of either:

test-constant result-expr

(test-constant1 ... test-constantN) result-expr

The test-constants are not evaluated. They must be compile-time
literals, and need not be quoted. If the expression is equal to a
test-constant, the corresponding result-expr is returned. A single
default expression can follow the clauses, and its value will be
returned if no clause matches. If no default expression is provided
and no clause matches, an IllegalArgumentException is thrown.

Unlike cond and condp, case does a constant-time dispatch, the
clauses are not considered sequentially. All manner of constant
expressions are acceptable in case, including numbers, strings,
symbols, keywords, and (Clojure) composites thereof. Note that since
lists are used to group multiple constants that map to the same
expression, a vector can be used to match a list if needed. The
test-constants need not be all of the same type.
-------------------------
clojure.core/defprotocol
([name & opts+sigs])
Macro
A protocol is a named set of named methods and their signatures:
(defprotocol AProtocolName

;optional doc string
"A doc string for AProtocol abstraction"

;method signatures
(bar [this a b] "bar docs")
(baz [this a] [this a b] [this a b c] "baz docs"))

No implementations are provided. Docs can be specified for the
protocol overall and for each method. The above yields a set of
polymorphic functions and a protocol object. All are
namespace-qualified by the ns enclosing the definition The resulting
functions dispatch on the type of their first argument, which is
required and corresponds to the implicit target object ('this' in
Java parlance). defprotocol is dynamic, has no special compile-time
effect, and defines no new types or classes. Implementations of
the protocol methods can be provided using extend.

defprotocol will automatically generate a corresponding interface,
with the same name as the protocol, i.e. given a protocol:
my.ns/Protocol, an interface: my.ns.Protocol. The interface will
have methods corresponding to the protocol functions, and the
protocol will automatically work with instances of the interface.

Note that you should not use this interface with deftype or
reify, as they support the protocol directly:

(defprotocol P
(foo [this])
(bar-me [this] [this y]))

(deftype Foo [a b c]
P
(foo [this] a)
(bar-me [this] b)
(bar-me [this y] (+ c y)))

(bar-me (Foo. 1 2 3) 42)
=> 45

(foo
(let [x 42]
(reify P
(foo [this] 17)
(bar-me [this] x)
(bar-me [this y] x))))
=> 17
-------------------------
clojure.core/defrecord
([name [& fields] & opts+specs])
Macro
Alpha - subject to change

(defrecord name [fields*] options* specs*)

Currently there are no options.

Each spec consists of a protocol or interface name followed by zero
or more method bodies:

protocol-or-interface-or-Object
(methodName [args*] body)*

Dynamically generates compiled bytecode for class with the given
name, in a package with the same name as the current namespace, the
given fields, and, optionally, methods for protocols and/or
interfaces.

The class will have the (immutable) fields named by
fields, which can have type hints. Protocols/interfaces and methods
are optional. The only methods that can be supplied are those
declared in the protocols/interfaces. Note that method bodies are
not closures, the local environment includes only the named fields,
and those fields can be accessed directy.

Method definitions take the form:

(methodname [args*] body)

The argument and return types can be hinted on the arg and
methodname symbols. If not supplied, they will be inferred, so type
hints should be reserved for disambiguation.

Methods should be supplied for all methods of the desired
protocol(s) and interface(s). You can also define overrides for
methods of Object. Note that a parameter must be supplied to
correspond to the target object ('this' in Java parlance). Thus
methods for interfaces will take one more argument than do the
interface declarations. Note also that recur calls to the method
head should *not* pass the target object, it will be supplied
automatically and can not be substituted.

In the method bodies, the (unqualified) name can be used to name the
class (for calls to new, instance? etc).

The class will have implementations of several (clojure.lang)
interfaces generated automatically: IObj (metadata support) and
IPersistentMap, and all of their superinterfaces.

In addition, defrecord will define type-and-value-based equality and
hashCode.

When AOT compiling, generates compiled bytecode for a class with the
given name (a symbol), prepends the current ns as the package, and
writes the .class file to the *compile-path* directory.

Two constructors will be defined, one taking the designated fields
followed by a metadata map (nil for none) and an extension field
map (nil for none), and one taking only the fields (using nil for
meta and extension fields).
-------------------------
clojure.core/deftype
([name [& fields] & opts+specs])
Macro
Alpha - subject to change

(deftype name [fields*] options* specs*)

Currently there are no options.

Each spec consists of a protocol or interface name followed by zero
or more method bodies:

protocol-or-interface-or-Object
(methodName [args*] body)*

Dynamically generates compiled bytecode for class with the given
name, in a package with the same name as the current namespace, the
given fields, and, optionally, methods for protocols and/or
interfaces.

The class will have the (by default, immutable) fields named by
fields, which can have type hints. Protocols/interfaces and methods
are optional. The only methods that can be supplied are those
declared in the protocols/interfaces. Note that method bodies are
not closures, the local environment includes only the named fields,
and those fields can be accessed directy. Fields can be qualified
with the metadata :volatile-mutable true or :unsynchronized-mutable
true, at which point (set! afield aval) will be supported in method
bodies. Note well that mutable fields are extremely difficult to use
correctly, and are present only to facilitate the building of higher
level constructs, such as Clojure's reference types, in Clojure
itself. They are for experts only - if the semantics and
implications of :volatile-mutable or :unsynchronized-mutable are not
immediately apparent to you, you should not be using them.

Method definitions take the form:

(methodname [args*] body)

The argument and return types can be hinted on the arg and
methodname symbols. If not supplied, they will be inferred, so type
hints should be reserved for disambiguation.

Methods should be supplied for all methods of the desired
protocol(s) and interface(s). You can also define overrides for
methods of Object. Note that a parameter must be supplied to
correspond to the target object ('this' in Java parlance). Thus
methods for interfaces will take one more argument than do the
interface declarations. Note also that recur calls to the method
head should *not* pass the target object, it will be supplied
automatically and can not be substituted.

In the method bodies, the (unqualified) name can be used to name the
class (for calls to new, instance? etc).

When AOT compiling, generates compiled bytecode for a class with the
given name (a symbol), prepends the current ns as the package, and
writes the .class file to the *compile-path* directory.

One constructors will be defined, taking the designated fields.
-------------------------
clojure.core/denominator
([r])
Returns the denominator part of a Ratio.
-------------------------
clojure.core/error-handler
([a])
Returns the error-handler of agent a, or nil if there is none.
See set-error-handler!
-------------------------
clojure.core/error-mode
([a])
Returns the error-mode of agent a. See set-error-mode!
-------------------------
clojure.core/extend
([atype & proto+mmaps])
Implementations of protocol methods can be provided using the extend construct:

(extend AType
AProtocol
{:foo an-existing-fn
:bar (fn [a b] ...)
:baz (fn ([a]...) ([a b] ...)...)}
BProtocol
{...}
...)

extend takes a type/class (or interface, see below), and one or more
protocol + method map pairs. It will extend the polymorphism of the
protocol's methods to call the supplied methods when an AType is
provided as the first argument.

Method maps are maps of the keyword-ized method names to ordinary
fns. This facilitates easy reuse of existing fns and fn maps, for
code reuse/mixins without derivation or composition. You can extend
an interface to a protocol. This is primarily to facilitate interop
with the host (e.g. Java) but opens the door to incidental multiple
inheritance of implementation since a class can inherit from more
than one interface, both of which extend the protocol. It is TBD how
to specify which impl to use. You can extend a protocol on nil.

If you are supplying the definitions explicitly (i.e. not reusing
exsting functions or mixin maps), you may find it more convenient to
use the extend-type or extend-protocol macros.

Note that multiple independent extend clauses can exist for the same
type, not all protocols need be defined in a single extend call.

See also:
extends?, satisfies?, extenders
-------------------------
clojure.core/extend-protocol
([p & specs])
Macro
Useful when you want to provide several implementations of the same
protocol all at once. Takes a single protocol and the implementation
of that protocol for one or more types. Expands into calls to
extend-type:

(extend-protocol Protocol
AType
(foo [x] ...)
(bar [x y] ...)
BType
(foo [x] ...)
(bar [x y] ...)
AClass
(foo [x] ...)
(bar [x y] ...)
nil
(foo [x] ...)
(bar [x y] ...))

expands into:

(do
(clojure.core/extend-type AType Protocol
(foo [x] ...)
(bar [x y] ...))
(clojure.core/extend-type BType Protocol
(foo [x] ...)
(bar [x y] ...))
(clojure.core/extend-type AClass Protocol
(foo [x] ...)
(bar [x y] ...))
(clojure.core/extend-type nil Protocol
(foo [x] ...)
(bar [x y] ...)))
-------------------------
clojure.core/extend-type
([t & specs])
Macro
A macro that expands into an extend call. Useful when you are
supplying the definitions explicitly inline, extend-type
automatically creates the maps required by extend. Propagates the
class as a type hint on the first argument of all fns.

(extend-type MyType
Countable
(cnt [c] ...)
Foo
(bar [x y] ...)
(baz ([x] ...) ([x y & zs] ...)))

expands into:

(extend MyType
Countable
{:cnt (fn [c] ...)}
Foo
{:baz (fn ([x] ...) ([x y & zs] ...))
:bar (fn [x y] ...)})
-------------------------
clojure.core/extenders
([protocol])
Returns a collection of the types explicitly extending protocol
-------------------------
clojure.core/extends?
([protocol atype])
Returns true if atype extends protocol
-------------------------
clojure.core/flatten
([x])
Takes any nested combination of sequential things (lists, vectors,
etc.) and returns their contents as a single, flat sequence.
(flatten nil) returns nil.
-------------------------
clojure.core/fnil
([f x] [f x y] [f x y z])
Takes a function f, and returns a function that calls f, replacing
a nil first argument to f with the supplied value x. Higher arity
versions can replace arguments in the second and third
positions (y, z). Note that the function f can take any number of
arguments, not just the one(s) being nil-patched.
-------------------------
clojure.core/frequencies
([coll])
Returns a map from distinct items in coll to the number of times
they appear.
-------------------------
clojure.core/get-in
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of ke(ys. Returns nil if the key is not present,
or the not-found value if supplied.
-------------------------
clojure.core/group-by
([f coll])
Returns a map of the elements of coll keyed by the result of
f on each element. The value at each key will be a vector of the
corresponding elements, in the order they appeared in coll.
-------------------------
clojure.core/keep
([f coll])
Returns a lazy sequence of the non-nil results of (f item). Note,
this means false return values will be included. f must be free of
side-effects.
-------------------------
clojure.core/keep-indexed
([f coll])
Returns a lazy sequence of the non-nil results of (f index item). Note,
this means false return values will be included. f must be free of
side-effects.
-------------------------
clojure.core/map-indexed
([f coll])
Returns a lazy sequence consisting of the result of applying f to 0
and the first item of coll, followed by applying f to 1 and the second
item in coll, etc, until coll is exhausted. Thus function f should
accept 2 arguments, index and item.
-------------------------
clojure.core/namespace-munge
([ns])
Convert a Clojure namespace name to a legal Java package name.
-------------------------
clojure.core/numerator
([r])
Returns the numerator part of a Ratio.
-------------------------
clojure.core/object-array
([size-or-seq])
Creates an array of objects
-------------------------
clojure.core/partition-all
([n coll] [n step coll])
Returns a lazy sequence of lists like partition, but may include
partitions with fewer than n items at the end.
-------------------------
clojure.core/partition-by
([f coll])
Applies f to each value in coll, splitting it each time f returns
a new value. Returns a lazy seq of partitions.
-------------------------
clojure.core/rand-nth
([coll])
Return a random element of the (sequential) collection. Will have
the same performance characteristics as nth for the given
collection.
-------------------------
clojure.core/reductions
([f coll] [f init coll])
Returns a lazy seq of the intermediate values of the reduction (as
per reduce) of coll by f, starting with init.
-------------------------
clojure.core/reify
([& opts+specs])
Macro
reify is a macro with the following structure:

(reify options* specs*)

Currently there are no options.

Each spec consists of the protocol or interface name followed by zero
or more method bodies:

protocol-or-interface-or-Object
(methodName [args+] body)*

Methods should be supplied for all methods of the desired
protocol(s) and interface(s). You can also define overrides for
methods of Object. Note that the first parameter must be supplied to
correspond to the target object ('this' in Java parlance). Thus
methods for interfaces will take one more argument than do the
interface declarations. Note also that recur calls to the method
head should *not* pass the target object, it will be supplied
automatically and can not be substituted.

The return type can be indicated by a type hint on the method name,
and arg types can be indicated by a type hint on arg names. If you
leave out all hints, reify will try to match on same name/arity
method in the protocol(s)/interface(s) - this is preferred. If you
supply any hints at all, no inference is done, so all hints (or
default of Object) must be correct, for both arguments and return
type. If a method is overloaded in a protocol/interface, multiple
independent method definitions must be supplied. If overloaded with
same arity in an interface you must specify complete hints to
disambiguate - a missing hint implies Object.

recur works to method heads The method bodies of reify are lexical
closures, and can refer to the surrounding local scope:

(str (let [f "foo"]
(reify Object
(toString [this] f))))
== "foo"

(seq (let [f "foo"]
(reify clojure.lang.Seqable
(seq [this] (seq f)))))
== (\f \o \o))
-------------------------
clojure.core/remove-all-methods
([multifn])
Removes all of the methods of multimethod.
-------------------------
clojure.core/restart-agent
([a new-state & options])
When an agent is failed, changes the agent state to new-state and
then un-fails the agent so that sends are allowed again. If
a :clear-actions true option is given, any actions queued on the
agent that were being held while it was failed will be discarded,
otherwise those held actions will proceed. The new-state must pass
the validator if any, or restart will throw an exception and the
agent will remain failed with its old state and error. Watchers, if
any, will NOT be notified of the new state. Throws an exception if
the agent is not failed.
-------------------------
clojure.core/satisfies?
([protocol x])
Returns true if x satisfies the protocol
-------------------------
clojure.core/set-error-handler!
([a handler-fn])
Sets the error-handler of agent a to handler-fn. If an action
being run by the agent throws an exception or doesn't pass the
validator fn, handler-fn will be called with two arguments: the
agent and the exception.
-------------------------
clojure.core/set-error-mode!
([a mode-keyword])
Sets the error-mode of agent a to mode-keyword, which must be
either :fail or :continue. If an action being run by the agent
throws an exception or doesn't pass the validator fn, an
error-handler may be called (see set-error-handler!), after which,
if the mode is :continue, the agent will continue as if neither the
action that caused the error nor the error itself ever happened.

If the mode is :fail, the agent will become failed and will stop
accepting new 'send' and 'send-off' actions, and any previously
queued actions will be held until a 'restart-agent'. Deref will
still work, returning the state of the agent before the error.
-------------------------
clojure.core/shuffle
([coll])
Return a random permutation of coll
-------------------------
clojure.core/spit
([f content & options])
Opposite of slurp. Opens f with writer, writes content, then
closes f. Options passed to clojure.java.io/writer.
-------------------------
clojure.core/thread-bound?
([& vars])
Returns true if all of the vars provided as arguments have thread-local bindings.
Implies that set!'ing the provided vars will succeed. Returns true if no vars are provided.
-------------------------
clojure.core/vector-of
([t])
Creates a new vector of a single primitive type t, where t is one
of :int :long :float :double :byte :short :char or :boolean. The
resulting vector complies with the interface of vectors in general,
but stores the values unboxed internally.

2010年7月24日星期六

2010年6月10日星期四

Use Areca backup on 64 bit Windows

Areca is an open-source backup software which is located on Source Forge. Its official releases doesn't support 64-bit Windows which is quite unconvient for me, because I'm using 64-bit Windows on two machines, and backup is simply an essential to me.

The anonying part when you try to start areca is that you can't see any error message and no log can be found. The problem is Areca uses 32-bit SWT library. You can fix this just by replacing the 32-bit SWT library with a 64-bit one. Say you installed Areca to C:\Program Files (x86)\Areca. Go to that directory, delete the swt-win32-*.dll and paste your 64-bit SWT jar file to the lib directory. Go into the lib directory, delete the original org.eclipse.swt.win32.win32.x86_3.2.0.v3232m.jar，and rename the 64-bit jar file to this name. To say this way, you make Areca believes everything is fine, but actually the SWT library is 64-bit now.

Now all set, enjoy using Areca.

2010年5月30日星期日

Clojure 的类型提示

为了提高与 Java 类库交互时的性能，Clojure 提供了元数据用以定义某个给定的变量的类型。比如在函数定义的时候可以指定参数与返回值的类型：


(defn add-elem [#^java.util.List java-list elem]
  (.add java-list elem)
  java-list)

等效于


(defn add-elem [#^{:tag java.util.List} java-list elem]
  (.add java-list elem)
  java-list)

如果不告诉编译器这个类型


(defn naive-add-elem [java-list elem]
  (.add java-list elem)
  java-list)

Clojure 一样可以顺利执行，只不过 Clojure 是通过反射去调用 add 方法的。因为编译器没办法知道运行时的参数可能会是什么类型。所以，两种方法的执行时间差距还是有点大的：


(def java-list (java.util.ArrayList.))

(time (dotimes [_ 100000]
(naive-add-elem java-list 1))) ==> "Elapsed time: 342.663873 msecs"

(time (dotimes [_ 100000]
(add-elem java-list 1)))       ==> "Elapsed time: 18.652259 msecs"

有一个数量级的差距，跟反射与普通方法调用的差距差不多。但是不带类型提示的版本会在重用性上取胜。比如：


(def java-list (java.util.ArrayList.))
(def java-set (java.util.HashSet.))

(naive-add-elem java-list 1)  ==> #<ArrayList [1]>
(naive-add-elem java-set 1)   ==> #<Hashset [1]>

(add-elem java-list 1)  ==> #<ArrayList [1]>
(add-elem java-set 1)   ==> ClassCastException!

与类型提示有点关系的就是 :inline。为了加快 Clojure 在 JVM 上的数字运算，Clojure 提供了内联。因为 Clojure 与 C/C++ 一样具有宏的功能（只不过强大了太多），所以 Clojure 也可以做到源代码展开。比如 num 的实现：


(defn num
  {:tag Number
   :inline (fn  [x] `(. clojure.lang.Numbers (num ~x)))}
  [x] (. clojure.lang.Numbers (num x)))

编译器何时会选择内联我不知道，只是它的确有这种优化。因为在频繁数字运算时 HotSpot 会将包装类型优化成原始类型，所以 Clojure 依赖 JVM 的这个特性，通过内联将优化的任务交给 JVM 去做。Clojure 的官网说下面的两段代码在 JVM 的 -server 属性开启时运行速度一样：


static public float asum(float[] xs) {
    float ret = 0;
    for (int i = 0; i < xs.length; i++)
        ret += xs[i];
    return ret;
}

(defn asum [#^floats xs]
  (areduce xs i ret (float 0)
    (+ ret (aget xs i))))

Clojure 在给数字运算型函数做内联的时候与普通宏不一样的地方是，内联的宏都返回一个函数。这样它们就可以被扔到 map, reduce 这样的函数去执行了。对于原始类型数组，Clojure 特别添加了支持：#^ints, #^floats, #^longs, #^doubles。关于内联具体请参看：http://clojure.org/news

对于那些无法在函数声明时在参数列表就加上类型提示的，可以在 Java 调用的时候再指明参数类型。比如：


(defn bigint
  {:tag BigInteger}
  [x] (cond
       (instance? BigInteger x) x
       (decimal? x) (.toBigInteger #^BigDecimal x)
       (number? x) (BigInteger/valueOf (long x))
       :else (BigInteger. x)))

这个函数在声明参数 x 的时候并没有指明类型，但是通过 decimal? 测试后 x 就应该是 BigDecimal 类型的了，这个时候就可以告诉编译 x 的类型是 BigDecimal。

2010年5月29日星期六

Clojure agents

Clojure agents 是用 JDK 的线程池实现的。比如 send


(defn send
  [#^clojure.lang.Agent a f & args]
    (. a (dispatch f args false)))

clojure.lang.Agent#dispatch 方法把 f 封装成一个 Action 对象，再把执行这个 Action 的任务代理给 Action 自身的 execute 方法。execute 方法会把自身的 action 实例扔到线程池中去执行。看看 send 的兄弟，send-off


(defn send-off
  [#^clojure.lang.Agent a f & args]
    (. a (dispatch f args true)))

注意到了吗？这两个函数只差一个参数，send 在调用 dispatch 的时候最后一个参数是 false，而 send-off 是 true。这个参数为真表示 Action 在执行的时候是使用 CachedThreadPool，否则使用 FixedThreadPool 。FixedThreadPool 是大小固定的线程池，所以 send 方法的 f 函数不能包含 IO 操作，而 send-off 可以。

相应地，agent 在执行时线程池会被初始化。所以，如果你的程序中使用到了 agents，在程序的最后要记得关闭线程池。如果你不关闭，在 eclipse 或者 IDEA 中你会看到程序执行完毕后标准输出窗口的结束程序的按钮仍然可用。原因就是线程池未关闭。当然关闭线程池最重要的作用是确保线程池中所有的操作都已完成，然后释放资源。结束 agents 系统，关闭线程池的函数是 shutdown-agents，不带任何参数。

2010年5月28日星期五

Clojure 核心库代码阅读技巧

核心库（core.clj）的实现代表了 Clojure 的标准和规范。通过阅读核心库的代码可以提高自身对 Clojure 的理解，并且帮助自己从 C 语系的命令式（imperative）的思想转变到非命令式。

核心库的函数实现通常不超过 10 行，以五行左右居多。它的简洁性来源于对递归的深入使用。同时，也依赖于 if, cons, conj, seq, next 对 nil 的特殊处理。

Clojure 中尽管有布尔型，但是不像其它的高级语言一样强制要求 if 接受布尔型。而是像 C/C++ 一样将 nil 作为 false 来看待。所以像 JavaScript 一样，经常会看到这样的代码：


(if x
  then ; x 存在
  else) ; x 不存在

cons 用来连接元素和集合。但是，如果 cons 的第二个参数，也就是集合，如果为 nil 的话，cons 会把这个集合当作空集来看待。比如：


(cons 1 [])   ==> (1)
(cons 1 nil)  ==> (1)

conj 类似，只不过 conj 是第一个参数接集合。如果集合为 nil，也被当作空集。（当然 cons 和 conj 作用并不完全一样）seq 和 next 的特殊性是这样：


(seq [])  ==> nil
(next []) ==> nil

Clojure 标准库中各种函数对 nil 的宽容对待都是基于这几个基本函数做到的。比如下面是经过简化的 reduce 实现：


(defn reduce
  ([f val coll]
     (if-let [s (seq coll)]
       (recur f (f val (first s)) (next s))
       val)))

if-let 只有在 coll 不为空集时才会隐式地用 let 绑定 s。再比如 conj 的实现：


(def
 conj (fn conj 
        ([coll x] (. clojure.lang.RT (conj coll x)))
        ([coll x & xs]
         (if xs
           (recur (conj coll x) (first xs) (next xs))
           (conj coll x)))))

注意第二个函数重载，当函数递归到最后一层的时候因为 next 调用的原因， xs 必为 nil。这时候 (if xs) 就会是 false，函数停止递归。

核心库中大部分都是这样的短小的函数。但是因为性能优化的需要，函数可能会比你现在看到的要臃肿一些。比如 reduce 在 1.1.0 版本中的实际实现：


(defn reduce
  ([f coll]
   (let [s (seq coll)]
     (if s
       (reduce f (first s) (next s))
       (f))))
  ([f val coll]
     (let [s (seq coll)]
       (if s
         (if (chunked-seq? s)
           (recur f 
                  (.reduce (chunk-first s) f val)
                  (chunk-next s))
           (recur f (f val (first s)) (next s)))
         val))))

除去第一个重载给用户提供了便利之外，第二个重载中使用了 chunked-seq。chunked-seq 就是性能优化的产物。看核心库代码的时候如果感觉函数有点长，不好懂，那就先尝试把这些为了性能优化而加的代码去掉。可以在心里去掉，也可以把代码拷贝出来，从视觉上去掉。

目前我看到的最难懂的代码就是 doseq 和 for 的实现，达到了 30 - 80 行。我的智商不高，看 doseq 花了很长时间。不过也因此得出一条重要的经验：看复杂代码的时候要先揣摩作者的意图，以意图作为切入点，以具体代码逻辑作验证，看看自己揣摩出来的意图是不是跟作者一致。作为一名传统的 C 语系的程序员，对 FP 风格的代码不熟的确给我的阅读带来了一定障碍。但是我保证，如果你跟我一样坚持下来，你会发现 Clojure 在微观级别的抽象表达能力比 C 语系的命令式语言要强上一个档次。这点对于你理解算法来说影响非常大。直观上觉得同样的程序 Java 写出来更好懂。但如果你试着去熟悉 Clojure 会发现事实刚好相反。看 doseq 的实现：


(defmacro doseq
  "Repeatedly executes body (presumably for side-effects) with
  bindings and filtering as provided by \"for\".  Does not retain
  the head of the sequence. Returns nil."
  [seq-exprs & body]
  (assert-args doseq
     (vector? seq-exprs) "a vector for its binding"
     (even? (count seq-exprs)) "an even number of forms in binding vector")
  (let [step (fn step [recform exprs]
               (if-not exprs
                 [true `(do ~@body)]
                 (let [k (first exprs)
                       v (second exprs)]
                   (if (keyword? k)
                     (let [steppair (step recform (nnext exprs))
                           needrec (steppair 0)
                           subform (steppair 1)]
                       (cond
                         (= k :let) [needrec `(let ~v ~subform)]
                         (= k :while) [false `(when ~v
                                                ~subform
                                                ~@(when needrec [recform]))]
                         (= k :when) [false `(if ~v
                                               (do
                                                 ~subform
                                                 ~@(when needrec [recform]))
                                               ~recform)]))
                     (let [seq- (gensym "seq_")
                           chunk- (with-meta (gensym "chunk_")
                                             {:tag 'clojure.lang.IChunk})
                           count- (gensym "count_")
                           i- (gensym "i_")
                           recform `(recur (next ~seq-) nil (int 0) (int 0))
                           steppair (step recform (nnext exprs))
                           needrec (steppair 0)
                           subform (steppair 1)
                           recform-chunk 
                             `(recur ~seq- ~chunk- ~count- (unchecked-inc ~i-))
                           steppair-chunk (step recform-chunk (nnext exprs))
                           subform-chunk (steppair-chunk 1)]
                       [true
                        `(loop [~seq- (seq ~v), ~chunk- nil,
                                ~count- (int 0), ~i- (int 0)]
                           (if (< ~i- ~count-)
                             (let [~k (.nth ~chunk- ~i-)]
                               ~subform-chunk
                               ~@(when needrec [recform-chunk]))
                             (when-let [~seq- (seq ~seq-)]
                               (if (chunked-seq? ~seq-)
                                 (let [c# (chunk-first ~seq-)]
                                   (recur (chunk-rest ~seq-) c#
                                          (int (count c#)) (int 0)))
                                 (let [~k (first ~seq-)]
                                   ~subform
                                   ~@(when needrec [recform]))))))])))))]
    (nth (step nil (seq seq-exprs)) 1)))

首先不去看有关 chunked-seq 的部分，在这么复杂的函数（而且还是宏！）中它会极大地干扰你的阅读。随便一扫就可以发现函数的主体都在 step 这个内部函数里面。递归包含两部分：1、递推基础 2、递推规则。标准库很多函数都会定义一个内部函数 step。它就承担着定义递推规则的责任。所以 doseq 里面只有一句话 (nth (step nil (seq seq-exprs)) 1)))，它只负责提供递推基础。

step 的第二个参数 exprs 从名字和调用上就可以看出是绑定表达式串（使用 doseq 时的中括号的部分）。recform 从名字上不好猜是什么，搜索一下 recform，发现这样一句：

recform `(recur (next ~seq-) nil (int 0) (int 0))

可以看出 recform 就是 recur-form 的意思。注意如果你跟我一样是严谨逻辑派的人，你会发现这样的假设并不严谨，不过不重要，我们目前的首要目标是猜出作者的意图，最后再用严谨的逻辑验证一次就好了。当然，这个过程可能因为猜测错误而反复多次。目前我们知道 recform 应该是用来承载 recur 的内容。那么相应地，needrec 也就应该表示是否需要继续递归的意思。通过这句话

(let [steppair (step recform (nnext exprs))

可以看出 steppair 表示 step 的返回值，它应该是一个大小为二的向量。再结合


(let [steppair (step recform (nnext exprs))
      needrec (steppair 0)
      subform (steppair 1)]

可以看出这个向量的第一个元素表示是否需要继续递归，而第二元素表示 step 递归缩减出来的子 form。有了上面这些猜测，我们可以看出 step 的最后三行是核心。


(let [~k (first ~seq-)]
  ~subform
  ~@(when needrec [recform]))))))])))))]

它们表示局部绑定 doseq 中指定集合的第一个元素，然后递归展开子 form。通过


(= k :let) [needrec `(let ~v ~subform)]
(= k :while) [false `(when ~v
  ~subform
  ~@(when needrec [recform]))]
(= k :when) [false `(if ~v
  (do
    ~subform
    ~@(when needrec [recform]))
  ~recform)]))

可以看出 doseq 支持 for 所支持的三个额外参数 let, when, while。当这些都理解的时候就可以回过头去看 chunked-seq 的部分了。

2010年5月25日星期二

精巧的状态机实现


public enum SniperState {
  JOINING {
    @Override public SniperState whenAuctionClosed() { return LOST; }
  },
  BIDDING {
    @Override public SniperState whenAuctionClosed() { return LOST; }
  },
  WINNING {
    @Override public SniperState whenAuctionClosed() { return WON; }
  },
  LOST,
  WON;
  public SniperState whenAuctionClosed() {
    throw new Defect("Auction is already closed");
  }
}

Sniper 英语是狙击手的意思，在这个程序的上下文中表示拍卖交易中的自动出价器。它的状态包括刚刚加入拍卖交易过程、正在出价但未占上风、正在出价但暂时领先、已经失败、已经成功拍下。whenAuctionClosed() 是一个事件回调，表示拍卖交易已经关闭时的回调。

因为已经失败和成功拍下这两个状态都是最终状态，所以调用 whenAuctionClosed() 会抛异常。而其它几个状态会重载这个方法，返回交易关闭后对应的状态。

这段代码摘自《Growing Object-Oriented Software, Guided by Tests》。

订阅：博文 (Atom)

YD Blog