SAS vs. R (vs. Python) – which tool should I learn?

简介:
原文  : 
http://www.analyticsvidhya.com/blog/2014/03/sas-vs-vs-python-tool-learn/

We love comparisons!

From Samsung vs. Apple vs. HTC in smartphones; iOS vs. Android vs. Windows in mobile OS to comparing candidates for upcoming elections or selecting captain for the world cup team, comparisons and discussions enrich us in our life. If you love discussions, all you need to do is pop up a relevant question in middle of a passionate community and then watch it explode! The beauty of the process is that everyone in the room walks away as a more knowledgeable person.

I am sparking something similar here. SAS vs. R has probably been the biggest debate analytics industry might have witnessed. Python is another worthy candidate to put in the mix now. The reason for me to start this discussion is not to watch it explode (that would be fun as well though). I know that we all will benefit from the discussion.

This has also been one of the most commonly asked question to me on this blog. So, I thought I’ll discuss it with all my readers and visitors!

analytics tool fight

Hasn’t a lot already been said on this topic?

Probably yes! But I still feel the need for discussion for following reasons:

  • The industry is very dynamic. Any comparison which was done 2 years back might not be relevant any more.
  • Traditionally Python has been left out of the comparison. I think it is a worthy consideration now.
  • While I’ll discuss global trends about the languages, I’ll add specific information with regards to Indian analytics industry (which is at a different level of evolution)

So, without any further delay, let the combat begin!

Background:

Here is a brief description about the 3 ecosystems:

  • SAS: SAS has been the undisputed market leader in commercial analytics space. The software offers huge array of statistical functions, has good GUI (Enterprise Guide & Miner) for people to learn quickly and provides awesome technical support. However, it ends up being the most expensive option and is not always enriched with latest statistical functions.
  • R: R is the Open source counterpart of SAS, which has traditionally been used in academics and research. Because of its open source nature, latest techniques get released quickly. There is a lot of documentation available over the internet and it is a very cost-effective option.
  • Python: With origination as an open source scripting language, Python usage has grown over time. Today, it sports libraries (numpy, scipy and matplotlib) and functions for almost any statistical operation / model building you may want to do. Since introduction of pandas, it has become very strong in operations on structured data.

Attributes for comparison:

I’ll compare these languages on following attributes:

  1. Availability / Cost
  2. Ease of learning
  3. Data handling capabilities
  4. Graphical capabilities
  5. Advancements in tool
  6. Job scenario
  7. Customer service support and Community

I am comparing these from point of view of an analyst. So, if you are looking for purchasing a tool for your company, you may not get complete answer here. The information below will still be useful. For each attribute I give a score to each of these 3 languages (1 – Low; 5 – High).

The weightage for these parameters will vary depending on what point of career you are in and your ambitions.

1. Availability / Cost:

SAS is a commercial software. It is expensive and still beyond reach for most of the professionals (in individual capacity). However, it holds the highest market share in Private Organizations. So, until and unless you are in an Organization which has invested in SAS, it might be difficult to access one.

R & Python, on the other hand are free and can be downloaded by any one. Here are my scores on this parameter:

SAS – 2

R – 5

Python – 5

2. Ease of learning:

SAS is easy to learn and provides easy option (PROC SQL) for people who already know SQL. Even otherwise, it has a good stable GUI interface in its repository. In terms of resources, there are tutorials available on websites of various university and SAS has a comprehensive documentation. There are certifications from SAS training institutes, but they again come at a cost.

R has the steepest learning curve among the 3 languages listed here. It requires you to learn and understand coding. R is a low level programming language and hence simple procedures can take longer codes.

Python is known for its simplicity in programming world. This remains true for data analysis as well. While there are no widespread GUI interfaces as of now, I am hoping Python notebooks will become more and more mainstream. They provide awesome features for documentation and sharing.

SAS – 4.5

R – 2.5

Python – 3.5

3. Data handling capabilities:

This used to be an advantage for SAS till some time back. R computes every thing in memory (RAM) and hence the computations were limited by the amount of RAM on 32 bit machines. This is no longer the case. All three languages have good data handling capabilities and options for parallel computations. This I feel is no longer a big differentiation. Also, I might not be aware of the latest innovation in each ecosystem and hence I see all 3 as equally capable.

SAS – 4

R – 4

Python – 4

4. Graphical capabilities:

SAS has decent functional graphical capabilities. However, it is just functional. Any customization on plots are difficult and requires you to understand intricacies of SAS Graph package.

R has the most advanced graphical capabilities among the three. There are numerous packages which provide you advanced graphical capabilities.

Python capabilities will lie somewhere in between, with options to use native libraries (matplotlib) or derived libraries (allowing calling R functions).

SAS – 3

R – 4.5

Python – 4

5. Advancements in tool:

All 3 ecosystems have all the basic and most needed functions available. This feature only matters if you are working on latest technologies and algorithms.

Due to their open nature, R & Python get latest features quickly (R more so compared to Python). SAS, on the other hand updates its capabilities in new version roll-outs. Since R has been used widely in academics in past, development of new techniques is fast.

Having said this, SAS releases updates in controlled environment, hence they are well tested. R & Python on the other hand, have open contribution and there are chances of errors in latest developments.

SAS – 4

R – 4.5

Python – 4

6. Job scenario:

Globally, SAS is still the market leader in available corporate jobs. Most of the big organizations still work on SAS. R / Python, on the other hand are better options for start-ups and companies looking for cost efficiency. Also, number of jobs on R / Python have been reported to increase over last few years. Here is a trend widely published on internet, which shows the trend for R and SAS jobs. Python jobs for data analysis will have similar trend as R jobs:

Source: r4stats.com

Source: r4stats.com

In India, specifically, the gap in SAS vs. R is bigger. My estimate is that SAS would have about 70% of market share, R around 15% and Python less than 5%. However, the trends are similar to global trends.

SAS – 4.5

R – 3.5

Python – 2.5

7. Customer service support & community:

R has the biggest online community but no customer service support. So if have trouble, you are on your own. You will get a lot of help though. Similar for python, though at a lower scale.

SAS on the other hand has dedicated customer service along with the community. So, if you have problems in installation or any other technical challenges, you can reach out to them.

SAS – 4

R – 3.5

Python – 3

Other factors:

Following are some more points worthy to note:

  • Python is used widely in web development. So if you are in an online business, using Python for web development and analytics can provide synergies
  • SAS used to have a big advantage of deploying end to end infrastructure (Visual Analytics, Data warehouse, Data quality, reporting and analytics), which has been mitigated by integration / support of R on platforms like SAP HANA and Tableau. It is still, far away from seamless integration like SAS, but the journey has started.

Conclusion:

Clearly, there is no winner in this race yet. It will be pre-mature to place bets on what will prevail, given the dynamic nature of industry. Depending on your circumstances (career stage, financials etc.) you can add your own weights and come up with what might be suitable for you. Here are a few specific scenarios:

  • If you are a fresher entering in analytics industry (specifically so in India), I would recommend to learn SAS as your first language. It is easy to learn and holds highest job market share.
  • If you are some one who has already spent time in industry, you should try and diversify your expertise be learning a new tool.
  • For experts and pros in industry, people should know at least 2 of these. That would add a lot of flexibility for future and open up new opportunities.
  • If you are in a start-up / freelancing, R / Python is more useful

Here is the final scorecard:

table - tool comparison

These are my views on this comparison. Now, its your turn to share your views through the comments below.

目录
相关文章
|
5月前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【4月更文挑战第9天】本文对比了Python三大Web框架Django、Flask和Pyramid。Django功能全面,适合快速开发,但学习曲线较陡;Flask轻量灵活,易于入门,但默认配置简单,需自行添加功能;Pyramid兼顾灵活性和可扩展性,适合不同规模项目,但社区及资源相对较少。选择框架应考虑项目需求和开发者偏好。
243 0
|
8天前
|
数据采集 缓存 Java
Python vs Java:爬虫任务中的效率比较
Python vs Java:爬虫任务中的效率比较
|
3月前
|
安全 数据安全/隐私保护 数据中心
Python并发编程大挑战:线程安全VS进程隔离,你的选择影响深远!
【7月更文挑战第9天】Python并发:线程共享内存,高效但需处理线程安全(GIL限制并发),适合IO密集型;进程独立内存,安全但通信复杂,适合CPU密集型。使用`threading.Lock`保证线程安全,`multiprocessing.Queue`实现进程间通信。选择取决于任务性质和性能需求。
85 1
|
3天前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
Python Web框架比较:Django vs Flask vs Pyramid
9 4
|
4天前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【10月更文挑战第10天】本文比较了Python中三个最受欢迎的Web框架:Django、Flask和Pyramid。Django以功能全面、文档完善著称,适合快速开发;Flask轻量灵活,易于上手;Pyramid介于两者之间,兼顾灵活性和安全性。选择框架时需考虑项目需求和个人偏好。
18 1
|
9天前
|
安全 数据库 C++
Python Web框架比较:Django vs Flask vs Pyramid
【10月更文挑战第6天】本文比较了Python中三个最受欢迎的Web框架:Django、Flask和Pyramid。Django功能全面,适合快速开发;Flask灵活轻量,易于上手;Pyramid介于两者之间,兼顾灵活性和可扩展性。文章分析了各框架的优缺点,帮助开发者根据项目需求和个人偏好做出合适的选择。
20 4
|
15天前
|
C++ Python
Python Tricks--- Object Comparisons:“is” vs “==”
Python Tricks--- Object Comparisons:“is” vs “==”
15 1
|
2月前
|
Shell 数据处理 C++
【震撼揭秘】Python正则VS Shell正则:一场跨越编程边界的史诗级对决!你绝不能错过的精彩较量,带你领略文本处理的极致魅力!
【8月更文挑战第19天】正则表达式是文本处理的强大工具,在Python与Shell中有广泛应用。两者虽语法各异,但仍共享许多基本元素,如`.`、`*`及`[]`等。Python通过`re`模块支持丰富的功能,如非捕获组及命名捕获组;而Shell则依赖`grep`、`sed`和`awk`等命令实现类似效果。尽管Python提供了更高级的特性和函数,Shell在处理文本文件方面仍有其独特优势。选择合适工具需根据具体需求和个人偏好决定。
34 1
|
2月前
|
存储 C语言 Python
|
2月前
|
机器学习/深度学习 数据可视化 数据处理
Python vs R:机器学习项目中的实用性与生态系统比较
【8月更文第6天】Python 和 R 是数据科学和机器学习领域中最受欢迎的两种编程语言。两者都有各自的优点和适用场景,选择哪种语言取决于项目的具体需求、团队的技能水平以及个人偏好。本文将从实用性和生态系统两个方面进行比较,并提供代码示例来展示这两种语言在典型机器学习任务中的应用。
68 1