外国技术“大牛”眼中的云和网格-阿里云开发者社区

[Ian Foster的博客] You’ve probably seen the recent flurry of news concerning “Cloud computing.” Business Week had a long article on it (with an amusing and pointed critique here). Nick Carr has even written a book about it. So what is it about, what is new, and what does it mean for information technology?

The basic idea seems to be that in the future, we won’t compute on local computers, we will compute in centralized facilities operated by third-party compute and storage utilities. To which I say, Hallelujah, assuming that it means no more shrink-wrapped software to unwrap and install.

Needless to say, this is not a new idea. In fact, back in 1960, computing pioneer John McCarthy predicted that “computation may someday be organized as a public utility”—and went on to speculate how this might occur.

In the mid 1990s, the term grid was coined to describe technologies that would allow consumers to obtain computing power on demand. I and others posited that by standardizing the protocols used to request computing power, we could spur the creation of a computing grid, analogous in form and utility to the electric power grid. Researchers subsequently developed these ideas in many exciting ways, producing for example large-scale federated systems (TeraGrid, Open Science Grid, caBIG, EGEE, Earth System Grid, …) that provide not just computing power, but also data and software, on demand. Standards organizations (e.g., OGF, OASIS) defined relevant standards. More prosaically, the term was also co-opted by industry as a marketing term for clusters. But no viable commercial grid computing providers emerged, at least not until recently.

So is “cloud computing” just a new name for grid? In information technology, where technology scales by an order of magnitude, and in the process reinvents itself, every five years, there is no straightforward answer to such questions.

Yes: the vision is the same—to reduce the cost of computing, increase reliability, and increase flexibility by transforming computers from something that we buy and operate ourselves to something that is operated by a third party.

But no: things are different now than they were 10 years ago. We have a new need to analyze massive data, thus motivating greatly increased demand for computing. Having realized the benefits of moving from mainframes to commodity clusters, we find that those clusters are darn expensive to operate. We have low-cost virtualization. And, above all, we have multiple billions of dollars being spent by the likes of Amazon, Google, and Microsoft to create real commercial grids containing hundreds of thousands of computers. The prospect of needing only a credit card to get on-demand access to 100,000+ computers in tens of data centers distributed throughout the world—resources that be applied to problems with massive, potentially distributed data, is exciting! So we’re operating at a different scale, and operating at these new, more massive scales can demand fundamentally different approaches to tackling problems. It also enables—indeed is often only applicable to—entirely new problems.

Nevertheless, yes: the problems are mostly the same in cloud and grid. There is a common need to be able to manage large facilities; to define methods by which consumers discover, request, and use resources provided by the central facilities; and to implement the often highly parallel computations that execute on those resources. Details differ, but the two communities are struggling with many of the same issues.

Unfortunately, at least to date, the methods used to achieve these goals in today’s commercial clouds have not been open and general purpose, but instead been mostly proprietary and specialized for the specific internal uses (e.g., large-scale data analysis) of the companies that developed them. The idea that we might want to enable interoperability between providers (as in the electric power grid) has not yet surfaced. Grid technologies and protocols speak precisely to these issues, and should be considered.

A final point of commonality: we seem to be seeing the same marketing. The first “cloud computing clusters”—remarkably similar to the “grid clusters” of a few years ago—are appearing. Perhaps Oracle 11c is on the horizon?

What does the future hold? I will hazard a few predictions, based on my belief that the economics of computing will look more and more like those of energy. Neither the energy nor the computing grids of tomorrow will look like yesterday’s electric power grid. Both will move towards a mix of microproduction and large utilities, with increasing numbers of small-scale producers (wind, solar, biomass, etc., for energy; for computing, local clusters and embedded processors—in shoes and walls?) co-existing with large-scale regional producers, and load being distributed among them dynamically. Yes, I know that computing isn’t really like electricity, but I do believe that we will nevertheless see parallel evolution, driven by similar forces.

In building this distributed “cloud” or “grid” (“groud”?), we will need to support on-demand provisioning and configuration of integrated “virtual systems” providing the precise capabilities needed by an end-user. We will need to define protocols that allow users and service providers to discover and hand off demands to other providers, to monitor and manage their reservations, and arrange payment. We will need tools for managing both the underlying resources and the resulting distributed computations. We will need the centralized scale of today’s cloud utilities, and the distribution and interoperability of today’s grid facilities.

Some of the required protocols and tools will come from the smart people at Amazon and Google. Others will come from the smart people working on grid. Others will come from those creating whatever we call this stuff after grid and cloud. It will be interesting to see to what extent these different communities manage to find common cause, or instead proceed along parallel paths.

============== 以下是一些网友的评论 =========================

Lisa Childers说： I agree. While a scientist running a simulation may rely on the computing power provided by a cloud (or petascale machine, or whatever) for runs of a massive scale, results will often be propagated within the interested scientific community for further analysis. There is life outside the cloud!

Bert Armijo说： the methods used to achieve these goals in today’s commercial clouds have not been open and general purpose, but instead been mostly proprietary and specialized for the specific internal uses (e.g., large-scale data analysis) of the companies that developed them

I agree, but this is a common first step in technological development. Those who encounter the need first build solutions of necessity.

In the case of cloud/utility computing, general purpose software and computer hardware vendors didn’t foresee or respond to the needs of internet operators who needed to run massive server farms because they were focused on corporate IT. As a direct result Google, Amazon, Yahoo, and others were forced to build their own infrastructure systems which naturally are tailored to their unique needs.

However, while well publicized, these are not the only systems. We at 3tera have built a grid OS specifically for enabling utility computing services. Plus, just to further your point about mixing microproduction and large utilities, rather than build our own data centers we’re partnering with commodity hosting providers who already operate numerous data centers and hundreds of thousands of servers.

Michael Behrens 说：Thanks for the blog entry Ian.
I would like to see less focus on clouds and more focus on grid 2.0 capabilities which might include autonomic application/service virtualiation and transparency. I’m Looking forward to future advances in distributed computing made by acadamia, and industry. And I hope that all of us, together with the Standards Developing Organizations (SDOs), can solidify them into adopted interoperable standards.

Marlon Pierce说： I particularly agree with your point that the next generation of computing will need to range seamlessly from "micro-production" computers to mega-clusters. The thought that I would want to rely solely on online cloud computing and data services when I can have a supercomputer under my desk seems counter-intuitive. Most people will always be consumers of services, but there will be an increasing number of "long-tail" (sorry) service providers, including home enthusiasts. There should therefor be an important opportunity to provide "home edition" cloud and Web service software to these folks.

Paul Wallis 说： Ian,

During 2003, the late Jim Gray made an analysis of Distributed Computing Economics:

“’On Demand’ computing is only economical for very cpu-intensive (100,000 instructions per byte or a cpu-day-per gigabyte of network traffic) applications. Pre-provisioned computing is likely to be more economical for most applications - especially data-intensive ones.”

And

“If telecom prices drop faster than Moore’s law, the analysis fails. If telecom prices drop slower than Moore’s law, the analysis becomes stronger.”

Since then, telecom prices have fallen and bandwidth has increased, but more slowly than processing power, leaving the economics worse than in 2003.

By 2012, the proposed Blue Gene/Q will operate at about 10,000 TFLOPS outstripping Moore’s law by a factor of about 10.

I’ve tried to put The Cloud in historical context and discussed some of its forerunners here

http://www.keystonesandrivets.com/kar/2008/02/cloud-computing.html

My take is that:

“I’m sure that advances will appear over the coming years to bring us closer, but at the moment there are too many issues and costs with network traffic and data movements to allow it to happen for all but select processor intensive applications, such as image rendering and finite modelling.”

I don’t know when enterprises will switch to “The Cloud” but given current technological circumstances, and recent events like The Gulf cables being cut and Amazon S3 failing, today the business is being asked to take a leap of faith to put mission critical applications in The Cloud.

Suraj Pandey 说：With the advent of new requirements for large scale computing like dynamic provisioning, massive storage space, on the fly computations etc., researchers coined the new word "Cloud Computing" citing the inadequacies of grid to provide these facilities.
"Grid Computing" as it has emerged is considered as loosely coupled and fragile unlike the new vision of Clouds.
The problems are the same, scale is different. With multi-core technologies getting firm grip on computers, centralized Grid or Cloud is already becoming a reality.

Sambath Narayanan 说：It appears to me that cloud is a new name given to Grid. One of the key difference between Grid computing and Cloud computing lies in the nature of applications. The applications those run on cloud seems to be general purpose ranging from the traditional scientific applications to modern social networking applications. As Ian explains in his book, ’managed, shared, virtual system’ as the next step in the evolution of Grid. May be this is nothing but Cloud.

igre 说：Interesting reading and I agree that methods used to achieve goals in today’s commercial clouds have not been opened to general purpose, but instead been mostly proprietary and specialized for the specific internal uses for companies.

Snehal Antani 说： Clouds and Grids are complements, not supplements. More over, they solve very different problems and should really be treated as such. Unfortunately though, Cloud/Grid/Utility are terms that have become very overloaded :).

Cloud Computing is about the dynamic provisioning of logical partitions (LPAR’s) and leveraging utility-computing technologies like metering and workload management to provide chargeback and goals-oriented execution for the physical resources consumed by those LPAR’s. So Amazon EC2 for example can quickly and cheaply provision new OS images (LPAR’s) on which applications can run. The LPAR’s don’t care about the applications, what they do, what data they need to access, etc.

Grid Computing is about the coordinated execution of some complex task across a collection of resources. For example, protein folding is some complex task which could be broken into discrete units of work, and each unit of work could be executed concurrently across a cluster of servers. The grid application infrastructure has the burden of creating the partitions of work and providing operational management (start, stop, dispatching, results aggregration, etc) of those discrete chunks.

Within grid computing there are sub-categories: compute grids and data grids. Compute Grids are responsible for breaking large tasks into discrete chunks, executing some computations on them, and aggregating the results. Data Grids are about partitioning data across a collection of resources for scalability and higher performance. You should see Compute Grids and Data Grids working together to provide a high-performance, scalable processing infrastructure. You can read more about building high performance grids at: http://www-128.ibm.com/developerworks/websphere/techjournal/0804_antani/0804_antani.html. For an example architecture where compute grids and data grids are working together, see this specific section: http://www-128.ibm.com/developerworks/websphere/techjournal/0804_antani/0804_antani.html#xdegc

So to summarize: Clouds are about the dynamic provisioning of LPAR’s and leveraging metering and WLM technologies to manage and charge for those logical partitions. Grid computing is an application infrastructure and programming style that allows complex tasks to be broken into smaller pieces and executed across a collection of resources. Grid Computing is about executing business logic quickly, Cloud Computing is about provisioning infrastructure. The grid computing infrastructure would run on top of a cloud computing infrastructure.

This isn’t all that new. Big Iron hardware, their operating systems, and the middleware stack have been doing this type of work for decades. The difference today is that Amazon EC2 can quickly and cheaply provision new LPAR’s whereas LPARs in big iron machines are statically defined. Both leverage hardware virtualization (executing on shared hardware resources), OS virtualization via a VM/hypervisor, have some application container for executing the business logic per some QoS requirements(application servers, etc), and leverage some type of workload management mechanism for metering and goals-oriented execution.

There is nothing wrong with statically defined LPAR’s as long as we have very smart workload management, good hypervisors, and hardware virtualization technologies (See System Z). Dynamic provisioning of resources within the datacenter already exists, see Tivoli Provisioning Manager and other such technologies. The future will probably be a more integrated and cohesive hardware and software stack for "private clouds" and "enterprise grids" (the ’enterprise’ in "enterprise grids" implies a high level of QoS like security, resource management, transactions, etc are expected).

I also discuss this in another blog post: http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=214794&tstart=0

Sam Johnston 说： Hi Ian,

Having spent the last few days working on finding a consensus definition for Cloud Computing it seems there is a lot (too much IMO) of focus on grid computing, and indeed confusion between the more general concept of cloud computing and the very specific grid computing solution to horizontal scalability. A grid, however intelligent, is one component of cloud computing rather than cloud computing itself... I don’t care if my cloud based services are powered by a mainframe, an army of cheap PCs, my neighbour’s screensaver or a cluster of commodore 64’s, so long as it’s fast, cheap and secure - it just so happens that the best way to scale today is by strapping a bunch of PCs together... will this still be true when chips with thousands of cores arrive? I doubt it.

I agree fully with your vision about creation of a computing grid (like the electricity grid) but I think the same opportunistic vendors and overnight experts who hijacked the term ’grid computing’ for local installations (the electrical equivalent of in-house diesel gensets) are trying to do the same with cloud computing.

Perhaps you will find my recent article (The Cloud and Cloud Computing consensus definition?) on the topic interesting.

Sam

Jacquette 说：这篇博客写的比较中肯，迄今为止，还未见运作成功的商业网格。 “But no viable commercial grid computing providers emerged, at least not until recently.”
工业界更多的把“网格”这个词作为一个广告词，来推销其集群产品，或者标准。

基本否定了最初 “网格计算”的那种“接上插头就能用电”的计算“理想”，
“计算”不像电一样，有着标准单一的接入方式，
计算模式是多样化的，有单机计算，也有网络计算，也有集群计算，也有嵌入式计算。。。。

现在云计算也有不幸成为“广告词”的趋势。

新的思想提出之际，重要的问题是：它的killer app? 或者说killer scenario?

关键字：云计算中国云计算云计算技术

外国技术“大牛”眼中的云和网格

热门文章

最新文章

相关课程

相关电子书

相关实验场景