A Unix Utility You Should Know About: Pipe Viewer

简介: Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about.

Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about. The articles will discuss one Unix program at a time. I'll try to write a good introduction to the tool and give as many examples as I can think of.

Before I start, I want to clarify one thing - Why am I starting so many article series? The answer is that I want to write about many topics simultaneously and switch between them as I feel inspired.

The first post in this series is going to be about not so well known Unix program called Pipe Viewer or pv for short. Pipe viewer is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Update: French translation available.

Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin. The homepage of pv utility is here: pv utility.

If you feel like you are interested in this stuff, I suggest that you subscribe to my rss feed to receive my future posts automatically.

How to use pv?

Ok, let's start with some really easy examples and progress to more complicated ones.

Suppose that you had a file "access.log" that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let's say a gunzip archive (.gz). The obvious way would be to do:

$ gzip -c access.log > access.log.gz

As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source:  760MB 0:00:15 [37.4MB/s] [=>     ] 19% ETA 0:01:02
  gzip: 34.5MB 0:00:15 [1.74MB/s] [  <=>  ]

Here we specified the "-N" parameter to pv to create a named stream. The "-c" parameter makes sure the output is not garbaged by one pv process writing over the other.

This example shows that "access.log" file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!

Notice how the gzip does not include how much data is left or how fast it will finish. It's because the pv process after gzip has no idea how much data gzip will produce (it's just outputting compressed data from input stream). The first pv process, however, knows how much data is left, because it's reading it.

Another similar example would be to pack the whole directory of files into a compressed tarball:

$ tar -czf - . | pv > out.tgz
 117MB 0:00:55 [2.7MB/s] [>         ]

In this example pv shows just the output rate of "tar -czf" command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it's done this way:

$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
 253MB 0:00:05 [46.7MB/s] [>     ]  1% ETA 0:04:49

What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk '{print $1}'" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.

Another fine example is copying large amounts of data over network by using help of "nc" utility that I will write about some other time.

Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.

# on computer A, with IP address 192.168.1.100
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B
$ nc 192.168.1.100 6666 | pv | tar -xf -

That's it. All the files in /path/to/dir on computer A will get transferred to computer B, and you'll be able to see how fast the operation is going.

If you want the progress bar, you have to do the "pv -s $(...)" trick from the previous example (only on computer A).

Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:

$ pv /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]

That's about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)

How to install pv?

If you're on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install pv

If you're on Fedora or Fedora based system such as CentOS do:

$ sudo yum install pv

If you're on Slackware, go to pv homepage, download the pv-version.tar.gz archive and do:

$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install

If you're a Mac user:

$ sudo port install pv

If you're OpenSolaris user:

$ pfexec pkg install pv

If you're a Windows user on Cygwin:

$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install

The manual of the utility can be found here man pv.

Have fun measuring your pipes with pv, and until next time!

A question to my readers: what other not so well known Unix utilities do you use and/or know about?

相关文章
|
Unix Perl Ubuntu
A Unix Utility You Should Know About: Pipe Viewer
Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about.
926 0
|
Unix Shell Windows
A Unix Utility You Should Know About: Netcat
This is the second post in the article series about Unix utilities that you should know about.
932 0
|
Unix Perl Ubuntu
A Unix Utility You Should Know About: Pipe Viewer
Hi all. I'm starting yet another article series here.
783 0
|
Unix Shell Windows
A Unix Utility You Should Know About: Netcat
This is the second post in the article series about Unix utilities that you should know about.
856 0
|
网络协议 Unix Apache
A Unix Utility You Should Know About: lsof
This is the third post in the article series about Unix and Linux utilities that you should know about.
839 0
|
Unix Shell Windows
A Unix Utility You Should Know About: Netcat
This is the second post in the article series about Unix utilities that you should know about.
1207 0
|
网络协议 Unix Apache
A Unix Utility You Should Know About: lsof
This is the third post in the article series about Unix and Linux utilities that you should know about.
1245 0
|
7月前
|
Unix Shell Linux
在Unix/Linux操作系统中,Shell脚本广泛用于自动化任务
在Unix/Linux操作系统中,Shell脚本广泛用于自动化任务
71 2
|
5天前
|
Unix Linux 编译器
UNIX/Linux 上的安装
UNIX/Linux 上的安装。
21 2
|
2月前
|
Unix 物联网 大数据
操作系统的演化与比较:从Unix到Linux
本文将探讨操作系统的历史发展,重点关注Unix和Linux两个主要的操作系统分支。通过分析它们的起源、设计哲学、技术特点以及在现代计算中的影响,我们可以更好地理解操作系统在计算机科学中的核心地位及其未来发展趋势。