SQL Server Parallel Data Warehouse (PDW) 介绍

本文涉及的产品
云数据库 RDS SQL Server,基础系列 2核4GB
RDS SQL Server Serverless,2-4RCU 50GB 3个月
推荐场景:
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
简介:

最近大数据概念非常火热,各个厂家都讲大数据视为未来IT的一个重要方向,因此各个厂家都想在这个领域有所作为。前几天参加了IBM大数据研讨会,会上IBM推出了他们针对于大数据的解决方案,三种一体机(PureSystem,另外IBM在推出了DB2 v10,为了打Oracle RAC专门设计的PureScale正式加入了DB2大版本中)。

 

在MPP架构方面,以前微软是被诟病的,缺乏产品应对大数据的挑战。之后从网上查了一下发现微软从2008 R2之后也发布了MPP数据仓库架构,并且在今年会推出自己的一体机。

 

针对于SQL Server PDW机构,有一篇文章讲的非常详细,大家有兴趣可以看一下:(文章来源于:http://www.jamesserra.com/archive/2011/08/microsoft-sql-server-parallel-data-warehouse-pdw-explained/

 

Microsoft SQL Server Parallel Data Warehouse (PDW), formally called by its code name “Project Madison”, is an edition of Microsoft’s SQL Server 2008 R2 that was released in December 2010. PDW is Microsoft’s reworking of theDatAllegro Inc. massive parallel processing (MPP) product that Microsoftacquired in July 2008. It only works with certain hardware (two so far), the first of which isHP Enterprise Data Warehouse Appliance (Dell is the other, with a couple more to come in the near future: IBM and Bull). This edition of SQL Server can’t be bought as an independent piece of software, it has to be bought along with the hardware.

So what is MPP? Until now, SQL Server has been a Symmetric Multiprocessing (SMP) solution, which essentially means it uses one server. MPP provides scalability and query performance by running independent servers in parallel. That is the quick definition. For more details, read What MPP means to SQL Server Parallel Data Warehouse.

MPP is also available from other companies such as EMC GreenplumTeradataOracle ExadataHP Vertica, andIBM Netezza, but those use proprietary systems, where PDW can be used with commodity hardware, providing a much lower cost per terabyte. But it’s still not in-expensive: The hardware and installation will cost around $2 million (not including software licenses), but gets you “200 times faster queries and 10 times the scalable than traditional Microsoft SQL Server deployments” (seepress release). PDW also comes with its own support model.

Microsoft has had clustering capabilities in SQL Server for a while, but the scalability part was lacking. This is where PDW comes in. Scalability in PDW means handling tens of terabytes of data and then moving to hundreds of terabytes worth (up to 600 TB). At about 50 terabytes to 60 terabytes of data, clustering is needed; thereafter, clustering starts to approach its limits, and that is when you need to move to PDW. Clustering brings concurrency to the system and reduces load, but it can’t reduce the time that a single query would take without any resource latency. To break this barrier, parallelism would be required to execute bits of the same request simultaneously and this is what exactly this setup would bring to the table. PDW partitions large tables across multiple physical nodes, each having its own dedicated CPU, memory, storage, and each running its own instance of SQL Server in a parallel shared nothing design. Tables can either be replicated, where a copy will be on each node (usually for dimension tables), or distributed, where portions of a table are uniformly distributed across all nodes (usually for fact tables).

One drawback to PDW is that it does not use SQL Server Management Studio, but uses a third-party tool calledNexus Chameleon (this third-party tool is needed because SSMS hasn’t been reworked to connect directly to the control node of the Parallel Data Warehouse). It also uses its own query engine and not all features of SQL Server are supported. So, you might not be able to use all your DBA tricks. And you wouldn’t want to build a solution against SQL Server and then just hope to upsize it to Parallel Data Warehouse Edition.

PDW uses multiple servers within the appliance, virtualized as if they were one unified data warehousing resource available. It can use up to 480 cores. PDW Works by controlling several different physical servers each running their own instance of SQL Server 2008 R2. The database and it’s tables are spread across these physical servers but appear as one database and table(s) to the end-user. Thedata warehouse appliance or brain of the PDW manages query execution and the meta data for what is stored and processed on what portion of the PDW.

Microsoft IT’s experience with PDW showed when they migrated Information Security Consolidated Event Management (ICE) to PDW, they saw query performance improve to an average of 15-20 times faster in PDW, SSIS data load throughput of up to 285 GB/hour (with minimal query performance impact), and support for up to 12 TB/day in throughput in SSIS. Seevideo.

Part of the technology incorporated into PDW includes a parallel database copy that enables rapid data movement and consistency between PDW and data marts used by SSAS.

In short, PDW is ideal for large data warehouses and BI, but not for OLTP systems. Write one check, and you get a complete soup-to-nuts data warehouse storage engine that includes everything from the servers, SAN, configuration, and training.

HP calls PDW by a different name: Enterprise Data Warehouse (EDW). Here is the layout of the HP Enterprise Data Warehouse Appliance (full specshere withreview and architecture overview and performance guide). The architecture is hub-and-spoke and supports up to 47 servers, made up of a control rack and a data rack. A one rack system has 17 servers, 22 processors/132 cores, and 125TB and can be scaled out to a four rack system with 47 servers, 82 processors/492 cores, and 500TB:

The future road map for PDW includes column store, petabyte scalability, real-time data warehousing, MDM, and data 


本文转自 lzf328 51CTO博客,原文链接:

http://blog.51cto.com/lzf328/1110581
相关实践学习
使用SQL语句管理索引
本次实验主要介绍如何在RDS-SQLServer数据库中,使用SQL语句管理索引。
SQL Server on Linux入门教程
SQL Server数据库一直只提供Windows下的版本。2016年微软宣布推出可运行在Linux系统下的SQL Server数据库,该版本目前还是早期预览版本。本课程主要介绍SQLServer On Linux的基本知识。 相关的阿里云产品:云数据库RDS SQL Server版 RDS SQL Server不仅拥有高可用架构和任意时间点的数据恢复功能,强力支撑各种企业应用,同时也包含了微软的License费用,减少额外支出。 了解产品详情: https://www.aliyun.com/product/rds/sqlserver
相关文章
|
7月前
|
SQL
启动mysq异常The server quit without updating PID file [FAILED]sql/data/***.pi根本解决方案
启动mysq异常The server quit without updating PID file [FAILED]sql/data/***.pi根本解决方案
66 0
|
2月前
|
SQL 存储 机器学习/深度学习
将 AWS Data Lake 和 S3 与 SQL Server 结合使用
将 AWS Data Lake 和 S3 与 SQL Server 结合使用
61 0
|
7月前
|
SQL Java 关系型数据库
spring data elasticsearch 打印sql(DSL)语句
spring data elasticsearch 打印sql(DSL)语句
466 0
|
SQL JSON 分布式计算
Spark SQL实战(07)-Data Sources
Spark SQL通过DataFrame接口支持对多种数据源进行操作。 DataFrame可使用关系型变换进行操作,也可用于创建临时视图。将DataFrame注册为临时视图可以让你对其数据运行SQL查询。
231 0
|
SQL Java 数据库连接
Data Access 之 MyBatis(四) - Dynamic SQL(下)
Data Access 之 MyBatis(四) - Dynamic SQL
Data Access 之 MyBatis(四) - Dynamic SQL(下)
|
SQL Java 数据库连接
Data Access 之 MyBatis(四) - Dynamic SQL(上)
Data Access 之 MyBatis(四) - Dynamic SQL
Data Access 之 MyBatis(四) - Dynamic SQL(上)
|
SQL XML Java
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)(下)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)(下)
|
SQL XML Java
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)(上)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part C)(上)
|
SQL XML 安全
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)(下)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)(下)
|
SQL XML 缓存
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)(上)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)
Data Access 之 MyBatis(三) - SQL Mapping XML(Part B)(上)