What is the Actual Performance of HANA?

简介: What is the Actual Performance of HANA?   http://www.linkedin.

What is the Actual Performance of HANA?

 

http://www.linkedin.com/pulse/what-actual-hana-performance-shaun-snapp?trk=hp-feed-article-title-publish

 

Introduction

I covered the topic of the actual performance of HANA versus competitive databases in the article Which if Faster, HANA or Oracle 12C. In this article I will cover the various database benchmarks on HANA and its competitors in more detail.

 

Who Performs Benchmark Testing in Databases?

The first thing to establish is that there is no independent body – such as a Consumer Reports for database benchmarking. This means that vendors performed the benchmarks that I reviewed. This is obviously a major issue.

Let us enumerate the problems with having no independent source for benchmarking as it relates to databases.

Selective Release: A vendor would never release a benchmark, which showed it as losing to a competing vendor across the board. The result of the benchmark would have to be positive for the vendor in some dimension, and more positive than negative for the results to be released. This brings up the issue that pharmaceutical companies drug testing shows that negative studies tend to go unpublished. “…studies about antidepressants made the drugs appear to work much better than they really did. Of 74 antidepressant studies registered with the FDA, 37 studies that showed positive results ended up being published. By contrast, studies that showed iffy or negative results mostly ended up going unpublished or had their data distorted to appear positive, Turner found. The missing or skewed studies helped create the impression that 94 percent of antidepressant trials had produced positive results, according to Turner's analysis, published in the New England Journal of Medicine. In reality, all the studies together showed just 51-percent positive results.” For instance, a past analysis of clinical trials supporting new drugs approved by the FDA showed that just 43 percent of more than 900 trials on 90 new drugs ended up being published. In other words, about 60 percent of the related studies remained unpublished even five years after the FDA had approved the drugs for market. That meant physicians were prescribing the drugs and patients were taking them without full knowledge of how well the treatments worked. - LifeScience. We will address this topic directly as it appears that SAP is doing the same thing with its OLTP benchmarks for HANA.

Skill Familiarity Bias: Because a vendor will always have more skills in their solutions than in a competitor solution and because databases can be “tuned up” and because of differences in hardware that is selected as well as a number of other differences, even if a vendor were 100% above board, they would still tend to observe better performance in their solution than a competing solution.  

Hardware Bias: The vendors spare no expense in hardware for these tests. The customers will often purchase hardware that is lower in its specification than that used by the vendor.

The Laboratory Environment Bias: The hardware and database is run in a “lab” environment. It has no other batch jobs pulling its resources – which are of course unrealistic. Therefore the performance of the benchmark would normally not be attainable in a production setting. I see the benchmark results are more comparable between different benchmarks than between the benchmark and a production environment.

Sales Bias: Every benchmark paper I looked at had one clear purpose. That was to improve the sales of the product benchmarked by the vendor that wrote the paper.

Interpretational Bias: The benchmarks that are released are then viewed through the prism of bias. That is, people that have an incentive to prefer a particular software vendor. One entity that has published inaccurate information about benchmarks that have been right in line with their financial bias has been the consulting firm Bluefin, which is overall one of the least reliable providers of information on HANA.

 

The Benchmark Tests

The following benchmarks were reviewed that were performed for these databases.

SAP OLTP Benchmark: This is a benchmark for transaction processing. So things that ERP systems tend to do the most like recording journal entries, decrementing inventory when performing a goods issue, etc..

SAP BW-EML (Business Warehouse Mixed Workload) Benchmark: This is an analytics benchmark.

 

SAP’s Missing Benchmarks

For years SAP would release an OLTP benchmark for databases. However with HANA, SAP stopped releasing this benchmark. Database design would predict that HANA would perform poorly in this benchmark and this is the most likely reason why SAP never produced this benchmark. However, the consulting firm Bluefin has the following way of covering this up:

“The SAP HANA platform was designed to be a data platform on which to build the business applications of the future. One of the interesting impacts of this is that the benchmarks of the past (e.g. Sales/Distribution) were not the right metric by which to measure SAP HANA.” – Behind the SAP BW EML Benchmark

At no point in this article by John Appleby does he declare the fact that he has a quota, or leads a group with a quota to sell HANA. John Appleby presents himself as if he is some disinterested third party. So that is problem number one. But the second problem is that Appleby is speaking what amounts to gibberish in this quotation.

S4 has a Sales module.

 This sales module will be performing the same functions as the current ECC SD module. Will there be analytics involved in the Sales module? Of course. However, there will also be transactions or OLTP performed.

S4 Sales will record sales orders, update sales orders, etc…

Therefore it is demonstrably untrue that an OLTP benchmark is now irrelevant because “the platform was designed to be a data platform on which to build business applications of the future.” That sentence is just a straight up lie, and it’s hard to twist oneself up into a pretzel to try to defend it. The person seems to be preparing to run for political office.

Appleby’s interpretation of the BW-EML benchmark contains other nonsense like

“the configuration used by published results is the stock installation…there are not performance constructs like additional indexes or aggregates in use.”

The reason this is nonsensical is that column-oriented databases don’t use indexes. They don’t need them. Why Appleby is impressed by this is a head scratcher. How many times has it been established that the primary reason for the reduction in the size of the database footprint is due to the removal of indexes? If so, and if this is widely accepted, why is it surprising to Appleby that the BW-EML benchmark for a column-oriented database does not have indexes???

On the topic of aggregates, HANA does use aggregates, but does not call them aggregates. So what Appleby is saying there is incorrect. Although there are fewer aggregates. Hasso Plattner has had an obsession with eliminating aggregates for some time and he rails against aggregates in his articles and his books, but in many cases aggregates are beneficial. Unlike what Hasso Plattner states, not everything needs to be constantly recalculated. And not everything needs to be recalculated every time it is accessed. This is just a waste of processing cycles. Let us take an example. Lets say we want to see a report of all the sales orders that a company has processed for the past 3 months. This report was processed and aggregated along different dimensional attributes yesterday. Under Hasso Plattner’s logic this aggregate is worthless because it is pre-calculated. However let us look at that statement in detail.

Let us say that the aggregate was calculated yesterday exactly 24 hours prior.

1 day is roughly 1/90th of a 3 month period.

If we look back 90 days in the report, we would show say 100,000 sales orders. That is an average of 1111 sales orders per day (yes, weekends would be less than work days, but as a average 1111 sales orders)

Now let us say that the day that drops off if we run the report anew had 1500 sales orders created (so a high day). And let us say that the day that was added, which is yesterday plus the hours up until the present hour are 700 sales orders (a low day).

So instead of looking at 100,000 sales orders, we are now looking at 99,200 sales orders. 1500 - 700 = 800 fewer sales orders. That is a change of .08% in the number of sales orders. Is that a real problem? Is the last 24 hours more representative than the 24 hour period from 3 months ago? Probably not. But if it is, how much more should the company be willing to spend to get rid of all aggregates? And are there other investments that might be a better use of that money?

There are an unlimited number of scenarios that could be imagined to determine the importance of the removal of aggregates. For instance, if just two days of sales orders were reviewed, then the company would receive a much larger variation. However generally, the needs for instantly recalculated information are greatly overestimated in vendor marketing documentation and in analytics vendor documentation in particular. I have a future article I am preparing which describes the testing of a long held belief that forecasting information must be frequently updated with the most recent sales history to obtain the highest forecast accuracy. I have been testing actual client data, and from a client with difficult to forecast sales history, and will show that as with the tests I performed at previous clients, this is actually not all that important and contributes little to forecast accuracy.

So, while there can be scenarios where getting the most up to date information is critical, SAP tends to take these few scenarios and generalize them to the be "normal," when in fact they tend to be the exceptions. Hasso Plattner has a way of presenting things that are often quite grey as black or white. And of course, all of Hasso Plattner’s examples have the peculiar and consistent outcome of handing over more money to SAP. I don't make more money if I can exaggerate the way that Hasso Plattner does, and therefore his proposals tend to come off as sales fluff...at least to me.

 

Logic for The Improved Analytical Performance of Column Oriented Databases

I found this quotation from IDC to be a very good explanation of my column-oriented databases is so effective for analytics.

"The established approach to setting up a query/reporting database (ODS, data mart, data warehouse) has involved establishing indexes for all columns that might have value lookup operations in the queries. Many organizations now use columnar databases, which have the same relational characteristics as row-oriented databases but store the data in blocks of column rather than row data for speed of retrieval. This obviates the need for indexes and, in some cases, for cubes and materialized views." - IDC

"If live data is to be queried and updated at the same time, the queries must be very fast in order to avoid consuming resources on the database server and slowing down transactions. A number of vendors have created database technologies that optimize query performance by combining two key elements: query-optimized columnar organization for the data and memory-optimized database operations. In the case discussed here, however, there is an additional challenge, which is to maintain that data in a form that also supports a high-performance transactional database." – IDC

“Database In-Memory leverages a unique “dual-format” architecture that enables tables to be in memory simultaneously in a traditional row format and a new in-memory column format. The Oracle SQL Optimizer automatically routes analytic queries to the column format and OLTP queries to the row format, transparently delivering best-of-both-worlds performance. Oracle Database 12 c automatically maintains full transactional consistency between the row and the column formats, just as it maintains consistency between tables and indexes today.

·     Access only the columns that are needed.

·     Scan and filter data in a compressed format.

·     Prune out any unnecessary data within each column.

·     Use SIMD to apply filter predicates.” – Oracle

However, this does not mean, and Oracle is not implying that a column-oriented database is better for applications outside of analytics. And as far as I can determine from reading the perspective of different database vendors on this topic, SAP is the only database vendor that proposes that a column oriented design is better for all types of applications.

 

Some of the Results

For instance in the Oracle benchmark paper released in 2015, the benchmark was tested on hardware similar to what SAP used in its BW-EML benchmark, but leaves out the topic of how many customers would use this hardware configuration. I don’t know myself as I have not recorded the hardware specification of many clients, but the hardware used by Oracle appeared quite advanced. At one point SAP’s benchmark used a machine with 1536 GB of RAM. I have personally never heard of this much RAM being used on a server at any account that I have worked on. It probably exists as there are very advanced companies out there doing scientific computing. But it is a small number. At one point Oracle points out that the monster machine used by SAP beat Oracle’s BW-EML benchmark, but needed 3 times the amount of memory to do this. Things bring up the question of whether SAP’s hardware was simply reengineered to beat the Oracle benchmark. So did SAP first try the machine with 1000 GB of RAM, and then add 200 GB or RAM and then test again, and then add another 200 and test again, etc until it finally beat the Oracle score? In another benchmark SAP installed 100 IBM servers in a SAP HANA cluster. Furthermore, if no one outside of the NSA, Amazon AWS (which resells portions of its hardware over the cloud) or a scientific computing center will be willing to buy this size of hardware how relevant are these benchmarks to the majority of HANA customers?

 

The Impact of Marketing on SAP Benchmarking

SAP needs to get marketing out of the process of releasing benchmarking information. In the benchmark publication SAP HANA Performance: Efficient Speed and Scale-Out for Real-Time Business Intelligence, I don’t need to see a cover plastered with stock photograph imagery of a man pulling a “fly” snowboarding maneuver, and then an image of a bunch of men rowing together, along with a marketing written introduction that uses a word salad of terms like NetWeaver components. This should be a scientific paper that is not word-smithed and couched in the deceptive marketing language. SAP marketing must acknowledge that not every paper produced by SAP needs have their fingerprints on it. This is the type of BS writing that I am referring to.

“The drill-down queries (276 to 483 milliseconds) demonstrate SAP HANA’s aggressive support for ad hoc joins and, therefore, to provide unrestricted ability for users to “slice and dice” with- out having to first involve the technical staff to provide indexes to support it (as would be the case with a conventional database).”

Please do not use the term “slice and dice” in a technical paper, or the term “unrestricted,” or the colorful “HANA’s aggressive support.” This is not scientific terminology. SAP’s benchmarking paper needs to be completely rewritten just using the original data. Then at the end SAP has quotations like the following:

“We have seen massive system speed improvements and increased ability to analyze the most detailed levels of customers and products.” – Colgate Palmolive

So this is an anecdote, and it sounds like it was written by Donald Trump (except it use the word massive instead of tremendous.) What is an anecdote doing in a benchmarking study!? Does SAP Marketing have any idea of what a study like this is actually supposed to contain?

 

Conclusion

When one compares what Bill McDermott, Hasso Plattner, SAP marketing, Bluefin, Deloitte and others say about the game changing aspects of HANA to the technical benchmarks there is absolutely no correspondence. SAP invests comparatively little in benchmarking, but its marketing spending on HANA is off the charts. This is reminiscent to pharmaceutical companies. Pharmaceutical companies spend far more on marketing than research, and the research is mostly just running clinical trials, which is based upon research that is performed by universities and is publicly funded. I call this the illusion of innovation.

Oracle has provided compelling evidence that its 12c database outperforms SAP HANA. I say this acknowledging the fact that there is no independent body that performs database benchmarking. Oracle invests much more into database benchmarking and its benchmarking studies are more transparent and make the case far better than SAP’s. For all of the talk of HANA’s performance, SAP produces a single benchmark to support these supposed claims of superiority over Oracle 12c and others. While we do not have independent verification, sifting through the results it seems more likely than not that Oracle 12c is not only a little bit faster, but far faster than HANA. And secondly, while SAP has placed speed as the first priority in the design of its database, Oracle’s orientation is far more holistic, placing reliability first. Secondly, given 12c’s design, it will almost certainly easily beat HANA for OLTP processing.

This article did not review the benchmarking of other database vendors. However, I find it more likely than not that vendors like Teradata, given the database talent that they have, not also have a solution that is superior to HANA in performance. And the list of other database vendors that can also beat HANA is likely more than just Oracle and Teradata.

I was not paid or otherwise compensated by any vendor or other entity to write this article.

References

Memory-Optimized Transactions and Analytics in One Platform: Achieving Business Agility with Oracle Database (IDC Sponsored by Oracle)

OracleVoice: Oracle Challenges SAP On In-Memory Database Claims, John Soat, Oracle.

Benchmark Results Reveal the Benefits of Oracle Database In-Memory for SAP Applications, Oracle White Paper, September 2015

http://www.livescience.com/8365-dark-side-medical-research-widespread-bias-omissions.html

http://www.livescience.com/5815-fraud-errors-misconceptions-medical-research.html

Behind the SAP BW EML Benchmark, John Appleby, Bluefin, March 19, 2015. 

 

 

 

 

 

 

 

 

 

 

 

 

 

目录
相关文章
|
Web App开发 JavaScript 前端开发
how the Fiori tile information is retrieved from HANA HCP
how the Fiori tile information is retrieved from HANA HCP
100 0
how the Fiori tile information is retrieved from HANA HCP
SAP CRM Fiori note automatic delete deletion scenario
SAP CRM Fiori note automatic delete deletion scenario
129 0
SAP CRM Fiori note automatic delete deletion scenario
CRM Fiori:Complex note optimization design
CRM Fiori:Complex note optimization design
CRM Fiori:Complex note optimization design
Why I cannot create follow up transactions in CRM Fiori Application
Why I cannot create follow up transactions in CRM Fiori Application
126 0
Why I cannot create follow up transactions in CRM Fiori Application
QHD DDIC is implemented via HANA
QHD DDIC is implemented via HANA
79 0
QHD DDIC is implemented via HANA
|
容器
Lead creation performance
Lead creation performance
107 0
|
XML 数据格式
Some more technical details about SAP note
I use this note 2184333 which I am responsible for as an example:
121 0
Some more technical details about SAP note
|
SQL 存储 算法
The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database
今天我们要介绍的MemSQL就采用这样一种新的形态(Oracle也变为了这种方式 ):即在做transformation时,要基于cost确定其是否可应用。 当然,本篇paper不止讲解了CBQT,还包括一些MemSQL优化器其他方面的介绍,包括一个有意思的heurstic based bushy join的方案。
344 0
The MemSQL Query Optimizer: A modern optimizer for real-time analytics in a distributed database
Uptime And Monitoring Strategies For Cloud-Based E-Commerce Applications/Websites
In order to keep your e-commerce site functioning properly, you need to take positive steps to monitor both its performance and functionality.
1484 0