hfs-delimited and lfs-delimited

简介: <span style="font-family:arial,sans-serif; font-size:14px">Hey guys,</span><br style="font-family:arial,sans-serif; font-size:14px"><br style="font-family:arial,sans-serif; font-size:14px"><span s
Hey guys,

I've pushed a snapshot update to Cascalog that includes two new taps -- hfs-delimited and lfs-delimited. These support the same keyword options as the other hfs-* and lfs-* taps, with a few extras I'll detail below.

If any of you find these useful, I'd really appreciate it if you would give them a try and let me know how the API works out for you. This feature is available in either of the following builds:

[cascalog "1.8.7-SNAPSHOT"]
[cascalog "1.9.0-wip8"]

As an example, say you had a textfile with data like this:

exchange,stock_symbol,date,open,high,low,close,volume,adj
NYSE,AA,2008-03-05,37.01,37.9,36.13,36.6,17752400,36.6
NYSE,AA,2008-03-04,38.85,39.28,38.26,38.37,11279900,38.37


The default separator is a tab character, so the standard hfs-delimited tap with no options would produce 1-tuples with a single line of text:

(hfs-delimited "/path/to/file")
;; makes textlines

The ":delimiter" option allows you to change this:

(hfs-delimited "/pathto/data"
:delimiter ",")

;; produces 9-tuples, all strings

Now we have the problem of the header line getting in the way. :skip-header? to the rescue:

(hfs-delimited "/pathto/data"
:delimiter ","
:skip-header? true)

;; produces 9-tuples of strings

Next, if you include a vector of classes with the :classes keyword, the tap will do class conversions on the fields for you:

(hfs-delimited "/pathto/data"
:delimiter ","
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true)

;; produces 9-tuples with the above classes -- numbers are parsed properly, strings stay strings.

Finally, by providing :outfields you gain the ability to select out specific fields by name:

(def stock-tap
(hfs-delimited "/pathto/data"
:delimiter ","
:outfields ["?exchange" "?stock-sym" "?date" "?open" "?high" "?low" "?close" "?volume" "?adj"]
:classes [String String String Float Float Float Float Integer Float]
:skip-header? true))


(select-fields stock-tap ["?stock-sym" "?open"])
;; returns 2-tuples of [String, Float] pairs representing the stock symbol and opening price for each line.

Looking forward to hearing your feedback! The API here will probably change a bit before release, so get your notes in now.

Cheers,


http://grokbase.com/t/gg/cascalog-user/123ky5apsx/new-taps-hfs-delimited-and-lfs-delimited

目录
相关文章
|
10月前
|
Linux
Redhat 7 LVM xfs文件系统修复
Redhat 7 LVM xfs文件系统修复
251 0
|
编译器 Linux C语言
基于Buildroot的rootfs制作
本文当记录使用buildroot制作rockchip-rk3288平台的rootfs过程。
729 0
|
前端开发 Linux C语言
在tinycolinux32上装tinycolinux64 kernel和toolchain
本文关键字:高版本gcc cross compile 交叉编译低版本gcc,boostrap,为tinycolinux低版本linux kernel生成gcc,在32位linux cross build gcc target for linux64 execution,32位64位混合rootfs制作,运行cross build的应用。
217 0
在tinycolinux32上装tinycolinux64 kernel和toolchain
|
Linux
Debain/ArchLinux/Gentoo 等将合并为超级Linux
导读:在阅读此文之前请依次打开Debain , ArchLinux , Gentoo , OpenSuse 及Grml 等各大主流发行版的主页,你看到了什么?是的,都是一模一样的主页。当然,此事是Linux社区搞了一个大的愚人节节目。
1333 0
|
数据建模 Shell C语言