Programming Clojure - Unifying Data with Sequences-阿里云开发者社区

开发者社区> 开发与运维> 正文

Programming Clojure - Unifying Data with Sequences


In Clojure, all these data structures can be accessed through a single abstraction: the sequence (or seq)
A seq (pronounced “seek”) is a logical list, the seq is an abstraction that can be used everywhere.


Collections that can be viewed as seqs are called seq-able. In this chapter, you will meet a variety of seq-able collections:

• All Clojure collections 
• All Java collections 
• Java arrays and strings 
• Regular expression matches 
• Directory structures 
• I/O streams 
• XML trees

Sequence这种抽象对于Lisp-like语言都非常重要, 因为在这种FP语言中, 几乎什么都是list, 都是sequence, 所以Lisp, 就是list process语言. 


4.1 Everything Is a Sequence

Every aggregate data structure in Clojure can be viewed as a sequence. 
A sequence has three core capabilities:

(first aseq) ;get the first item

(rest aseq) ;get everything after the first item

(cons elem aseq) ;construct a new sequence by adding an item

对于sequence其实只有3个核心操作first, rest, cons, 非常简单, 其他各种操作都是基于这3个操作实现的

为什么只提供first, rest? 开始觉得比较奇怪 
个人觉得, 首先这两个操作已经足够强大, 你如果要遍历就直接不断对rest取first, 如需要取第几个item, 就把first(rest)迭代几次, 再者非常便于lazy sequence的design, rest的都可以先不计算, 等first到再计算. 

(first '(1 2 3))
(rest '(1 2 3))
(2 3)
(cons 0 '(1 2 3))
(0 1 2 3)

;无论对任何类型的seq-able做seq操作, 返回的都是seq. 所以以vector为参数返回的仍是seq
;原因不是clojure在内部做了转换, 而是REPL无法知道该seq的确切类型, 所以只能都按seq处理
(rest [1 2 3]) 
(2 3)

(first {:fname "Stu" :lname "Halloway"})
[:fname "Stu"]

(first #{:the :quick :brown :fox})  ;set是无序的, 所以first返回什么不确定

(sorted-set :the :quick :brown :fox) ;有序的set
#{:brown :fox :quick :the}
(sorted-map :c 3 :b 2 :a 1)    ;有序的map
{:a 1, :b 2, :c 3}

conj adds one or more elements to a collection, and into adds all the items in one collection to another.

(conj '(1 2 3) :a)
(:a 1 2 3)
(into '(1 2 3) '(:a :b :c))
(:c :b :a 1 2 3)

4.2 Using the Sequence Library

The Clojure sequence library provides a rich set of functionality that can work with any sequence.

The functions provide a rich backbone of functionality that can take advantage of any data structure that obeys the basic first/rest/cons contract.

The following functions are grouped into four broad categories
• Functions that create sequences 
• Functions that filter sequences 
• Sequence predicates 
• Functions that transform sequences


Creating Sequences


(range start? end step?)

(range 10)
(0 1 2 3 4 5 6 7 8 9)
(range 10 20)
(10 11 12 13 14 15 16 17 18 19)

(range 1 25 2)
(1 3 5 7 9 11 13 15 17 19 21 23)


(repeat n x)

(repeat 5 1)
(1 1 1 1 1)
(repeat 10 "x")
("x" "x" "x" "x" "x" "x" "x" "x" "x" "x")

iterate , infinite extension of range

(iterate f x)

(take 10 (iterate inc 1)) ;take用于应对infinite,从无限seq中有限的seq
(1 2 3 4 5 6 7 8 9 10)


(cycle coll)

(take 10 (cycle (range 3)))
(0 1 2 0 1 2 0 1 2 0)


takes multiple collections and produces a new collection that interleaves values from each collection until one of the collections is exhausted

(interleave & colls) ;可变长, 多个seq

(interleave (whole-numbers) ["A" "B" "C" "D" "E"])
(1 "A" 2 "B" 3 "C" 4 "D" 5 "E")

each of the elements of the input collection separated by a separator

(interpose separator coll)

(interpose "," ["apples" "bananas" "grapes"])
("apples" "," "bananas" "," "grapes")


(apply str (interpose \, ["apples" "bananas" "grapes"])) ;转string


(use '[clojure.contrib.str-utils :only (str-join)])
(str-join \, ["apples" "bananas" "grapes"]) ;上面的简化版


Filtering Sequences


(filter pred coll)

(take 10 (filter even? (whole-numbers)))
(2 4 6 8 10 12 14 16 18 20)
(take 10 (filter odd? (whole-numbers)))
(1 3 5 7 9 11 13 15 17 19)

take-while, drop-while 
(take-while pred coll)   =   while (pred?) { take element from coll }, 所以当碰到第一个不符合条件的就会停止take

(drop-while pred coll)

(take-while even? [2 4 6 1 8]) ; [2 4 6]

(take-while #(> % 0) [3 2 1 0 -1 -2]) ; (3 2 1) 注意匿名函数的简写


(drop-while even? [2 4 6 1 3 5]) ; [1 3 5]

(drop-while even? [1 3 2 4 5]) ; [1 3 2 4 5] 注意从1开始就停止drop, 所以仍返回该vector


split-at, split-with

split-at takes an index, and split-with takes a predicate

(split-at 5 (range 10))
[(0 1 2 3 4) (5 6 7 8 9)]
(split-with #(<= % 10) (range 0 20 2))
[(0 2 4 6 8 10) (12 14 16 18)]

Sequence Predicates


(every? pred coll)

(every? odd? [1 3 5])
(every? odd? [1 3 5 8])


(some pred coll)

some returns the first nonfalse value for its predicate or returns nil if no element matched

(some even? [1 2 3])
(some even? [1 3 5])

not-every, not-any

(not-every? even? (whole-numbers))
(not-any? even? (whole-numbers))

Transforming Sequences


(map f coll)

(map #(format "<p>%s</p>" %) ["the" "quick" "brown" "fox"])
("<p>the</p>" "<p>quick</p>" "<p>brown</p>" "<p>fox</p>")

(map #(format "<%s>%s</%s>" %1 %2 %1) ["h1" "h2" "h3" "h1"] ["the" "quick" "brown" "fox"])
("<h1>the</h1>" "<h2>quick</h2>" "<h3>brown</h3>" "<h1>fox</h1>")


(reduce f coll)

reduce applies f on the first two elements in coll, then applies f to the result and the third element, and so on. 
reduce is useful for functions that “total up” a sequence in some way.

(reduce + (range 1 11))

sort, sort-by

(sort comp? coll) 
(sort-by a-fn comp? coll) 比sort多‘a-fn’, 对coll先apply a-fn, 再sort

(sort [42 1 7 11])
(1 7 11 42)

(sort > [42 1 7 11])
(42 11 7 1)

(sort-by #(.toString %) [42 1 7 11]) ;按str比较,所以7>42
(1 11 42 7)

(sort-by :grade > [{:grade 83} {:grade 90} {:grade 77}])
({:grade 90} {:grade 83} {:grade 77})

List Comprehension

The granddaddy of all filters and transformations is the list comprehension. 
A list comprehension creates a list based on an existing list, using set notation.



(for [binding-form coll-expr filter-expr? ...] expr) 
for takes a vector of binding-form/coll-exprs, plus an optional filter-expr, and then yields a sequence of exprs. 
List comprehension is more general than functions such as map and filter and can in fact emulate most of the filtering and transformation functions described earlier.

(for [word ["the" "quick" "brown" "fox"]] (format "<p>%s</p>" word))
("<p>the</p>" "<p>quick</p>" "<p>brown</p>" "<p>fox</p>")

Comprehensions can emulate filter using a :when clause

(take 10 (for [n (whole-numbers) :when (even? n)] n))
(2 4 6 8 10 12 14 16 18 20)

:while clause continues the evaluation only while its expression holds true: 和when的不同是, 不符合就stop

(for [n (whole-numbers) :while (even? n)] n)

The real power of for comes when you work with more than one binding expression.

(for [file "ABCDEFGH" rank (range 1 9)] (format "%c%d" file rank))
("A1" "A2" ... elided ... "H7 ""H8")


;Clojure iterates over the rightmost binding expression in a sequence comprehension first and then works its way left

(for [rank (range 1 9) file "ABCDEFGH"] (format "%c%d" file rank))
("A1" "B1" ... elided ... "G8" "H8") ;顺序不同会得到不同的结果组合

4.3 Lazy and Infinite Sequences

Most Clojure sequences are lazy; in other words, elements are not calculated until they are needed. 
Using lazy sequences has many benefits:

• You can postpone expensive computations that may not in fact be needed. 
• You can work with huge data sets that do not fit into memory. 
• You can delay I/O until it is absolutely needed.


Forcing Sequences

有些时候, 你不想lazy, 所以需要force seq, 不过尽量别用.

The problem usually arises when the code generating the sequence has side effects. Consider the following sequence, which embeds side effects via println:

(def x (for [i (range 1 3)] (do (println i) i))) ;因为lazy, 所以不会真正的print

doall, dorun

(doall coll)

doall forces Clojure to walk the elements of a sequence and returns the elements as a result:

(doall x)
| 1
| 2
) (1 2)

(dorun coll)  不会保留结果, 只是遍历一遍 
dorun walks the elements of a sequence without keeping past elements in memory. 
As a result, dorun can walk collections too large to fit in memory.

(dorun x)
| 1
| 2

The nil return value is a telltale reminder that dorun does not hold a reference to the entire sequence.

The dorun and doall functions help you deal with side effects, while most of the rest of Clojure discourages side effects. 
You should use these functions rarely. (The Clojure core calls each of these functions only once in about 4,000 lines of code.)

4.4 Clojure Makes Java Seq-able

The seq abstraction of first/rest applies to anything that there can be more than one of.

In the Java world, that includes the following: 
• The Collections API 
• Regular expressions 
• File system traversal 
• XML processing 
• Relational database results 
Clojure wraps these Java APIs, making the sequence library available for almost everything you do.

跳过, 用到再说...


4.5 Calling Structure-Specific Functions

Clojure’s sequence functions allow you to write very general code. 
Sometimes you will want to be more specific and take advantage of the characteristics of a specific data structure. 
Clojure includes functions that specifically target lists, vectors, maps, structs, and sets.

首先为了方便, 更重要的是效率, 通用的方法具有更好的抽象, 但会比较低效.

Functions on Lists

(peek '(1 2 3))  ; first
(pop '(1 2 3)) ; rest
(2 3)

Functions on Vectors


(get [:a :b :c] 1)
(get [:a :b :c] 5)

([:a :b :c] 1) ;vector本身也是function
([:a :b :c] 5)

java.lang.ArrayIndexOutOfBoundsException: 5 ;与get不同, 溢出会报错

assoc associates a new value with a particular index:

(assoc [0 1 2 3 4] 2 :two)
[0 1 :two 3 4]


subvec returns a subvector of a vector: 
(subvec avec start end?)

(subvec [1 2 3 4 5] 3) ;end is not specified, it defaults to the end of the vector
[4 5]
(subvec [1 2 3 4 5] 1 3)
[2 3]

Functions on Maps

(keys {:sundance "spaniel", :darwin "beagle"})
(:sundance :darwin)
(vals {:sundance "spaniel", :darwin "beagle"})
("spaniel" "beagle")

(get {:sundance "spaniel", :darwin "beagle"} :darwin)
(get {:sundance "spaniel", :darwin "beagle"} :snoopy)

({:sundance "spaniel", :darwin "beagle"} :darwin) ;map本身也是function

({:sundance "spaniel", :darwin "beagle"} :snoopy)

(:darwin {:sundance "spaniel", :darwin "beagle"} ) ;keyword本身也是function
(:snoopy {:sundance "spaniel", :darwin "beagle"} )


assoc returns a map with a key/value pair added. 
dissoc returns a map with a key removed. 
select-keys returns a map, keeping only the keys passed in 
merge combines maps. If multiple maps contain a key, the rightmost map wins.

(def song {:name "Agnus Dei"               :artist "Krzysztof Penderecki"
:album "Polish Requiem"
:genre "Classical" })


(assoc song :kind "MPEG Audio File") ;add kind:"MPEG Audio File"
{:name "Agnus Dei", :album "Polish Requiem",
:kind "MPEG Audio File", :genre "Classical",
:artist "Krzysztof Penderecki"}


(dissoc song :genre) ; rmove genre
{:name "Agnus Dei", :album "Polish Requiem",
:artist "Krzysztof Penderecki"}


(select-keys song [:name :artist])
{:name "Agnus Dei", :artist "Krzysztof Penderecki"}


(merge song {:size 8118166, :time 507245})
{:name "Agnus Dei", :album "Polish Requiem",
:genre "Classical", :size 8118166,
:artist "Krzysztof Penderecki", :time 507245}

merge-with is like merge, except that when two or more maps have the same key, you can specify your own function for combining the values under the key.

(merge-with merge-fn & maps) 可以自定义当出现同一个key时的逻辑

concat ;merge-fn
{:rubble ["Barney"], :flintstone ["Fred"]}
{:rubble ["Betty"], :flintstone ["Wilma"]}
{:rubble ["Bam-Bam"], :flintstone ["Pebbles"]})
{:rubble ("Barney" "Betty" "Bam-Bam"),
:flintstone ("Fred" "Wilma" "Pebbles")}


Functions on Sets

union, intersection, difference, select

(def languages #{"java" "c" "d" "clojure" })
(def letters #{"a" "b" "c" "d" "e" })
(def beverages #{"java" "chai" "pop" })

;union returns the set of all elements present in either input set, 并
(union languages beverages)
#{"java" "c" "d" "clojure" "chai" "pop"}


;intersection returns the set of all elements present in both input sets, 交
(intersection languages beverages)


;difference returns the set of all elements present in the first input set, minus those in the second
(difference languages beverages) ;languages和beverages的不同, 即在b中有, 而l中没有的
#{"c" "d" "clojure"}


;select returns the set of all elements matching a predicate

(select #(= 1 (.length %)) languages)
#{"c" "d"}

Relational Algebra

Set union and difference are part of set theory, but they are also part of relational algebra, which is the basis for query languages such as SQL. 
The relational algebra consists of six primitive operators: set union and set difference (described earlier), plus rename,selectionprojection, and cross product.

这个挺方便, 也挺有意思, 支持关系代数. 
每行用map表示, 表就是map的set来表示.

(def compositions
#{{:name "The Art of the Fugue" :composer "J. S. Bach" }
{:name "Musical Offering" :composer "J. S. Bach" }
{:name "Requiem" :composer "Giuseppe Verdi" }
{:name "Requiem" :composer "W. A. Mozart" }})

(def composers
#{{:composer "J. S. Bach" :country "Germany" }
{:composer "W. A. Mozart" :country "Austria" }
{:composer "Giuseppe Verdi" :country "Italy" }})

(def nations
#{{:nation "Germany" :language "German" }
{:nation "Austria" :language "German" }
{:nation "Italy" :language "Italian" }})

The rename function renames keys (“database columns”), based on a map from original names to new names. 
(rename relation rename-map)

(rename compositions {:name :title}) ;将name改为title
#{{:title "Requiem", :composer "Giuseppe Verdi"}
{:title "Musical Offering", :composer "J.S.Bach"}
{:title "Requiem", :composer "W. A. Mozart"}
{:title "The Art of the Fugue", :composer "J.S. Bach"}}

The select function returns maps for which a predicate is true and is analogous to the WHERE portion of a SQL SELECT: 
(select pred relation)

(select #(= (:name %) "Requiem") compositions) ;name等于Requiem
#{{:name "Requiem", :composer "W. A. Mozart"}
{:name "Requiem", :composer "Giuseppe Verdi"}}

The project function returns only the portions of the maps that match a set of keys. 
(project relation keys)

(project compositions [:name])
#{{:name "Musical Offering"}
{:name "Requiem"}
{:name "The Art of the Fugue"}}

The cross product returns every possible combination of rows in the different tables. You can do this easily enough in Clojure with a list comprehension: 
(for [m compositions c composers] (concat m c)) ;全组合 
; ... 4 x 3 = 12 rows ...

Although the cross product is theoretically interesting, you will typically want some subset of the full cross product. 
For example, you might want to join sets based on shared keys: 
(join relation-1 relation-2 keymap?)

(join compositions composers)   ;join the composition names and composers on the shared key :composer

(join composers nations {:country :nation}) ;指定country和nation map

You can combine the relational primitives.

    (select #(= (:name %) "Requiem") compositions)
#{{:country "Italy"} {:country "Austria"}}



+ 订阅