Data analysis plays a vital role in organizations today. It enables effective decision-making by addressing fundamental business questions based on the understanding of the available data. While there are tons of open source and enterprise tools for conducting data analysis, IBM SPSS Statistics has emerged as a popular tool among statistical analysts and researchers. It offers them the perfect platform to quickly perform data exploration and analysis, and share their findings with ease.
In this interview, we take a look at the world of statistical data analysis and see how IBM SPSS Statistics makes it easier to derive business sense from data. Kenneth and Anthony also walk us through their recently published book – Data Analysis with IBM SPSS Statistics – and tell us how it benefits aspiring data analysts and statistical researchers.
Key Takeaways – IBM SPSS Statistics
- IBM SPSS Statistics is a key offering of IBM Analytics – providing an integrated interface for statistical analysis on-premise and on the cloud
- SPSS Statistics is a self-sufficient tool – it does not require you to have any knowledge of SQL or any other scripting language
- SPSS Statistics helps you avoid the 3 most common pitfalls in data analysis, i.e. handling missing data, choosing the best statistical method for analysis and understanding the results of the analysis
- R and Python are not direct competitors to SPSS Statistics – instead, you can create customized solutions by integrating SPSS Statistics with these tools for effective analyses and visualization
- Data Analysis with IBM SPSS Statistics highlights various popular statistical techniques to the readers, and how to use them in order to gather useful hidden insights from their data
IBM SPSS Statistics is a popular tool for efficient statistical analysis. What do you think are the 3 notable features of SPSS Statistics that make it stand apart from the other tools available out there?
SPSS Statistics has a very short learning curve which makes it ideal for analysts to use efficiently. It also has a very comprehensive set of statistical capabilities so virtually everything a researcher would ever need is encompassed in a single application. Finally, SPSS Statistics provides a wealth of features for preparing and managing data so it is not necessary to master SQL or another database language to address data-related tasks.
With over 20 years of experience in this field, you have a solid understanding of the subject and, equally, of SPSS Statistics. How do you use the tool in your work? How does it simplify your day to day tasks related to data analysis?
I have used SPSS Statistics in my work with SPSS and IBM clients over the years. In addition, I use SPSS for my own research analysis. It allows me to make good use of my time whether I’m serving clients or doing my own analysis because of the breadth of capabilities available within this one program. The fact that SPSS produces presentation-ready output further simplifies things for me since I can collect key results as I work and put them into a draft report and share them as required.
What are the prerequisites to use SPSS Statistics effectively? For someone who intends to use SPSS Statistics for their data analysis tasks, how steep is the curve when it comes to mastering the tool?
It certainly helps to have a understanding of basic statistics when you begin to use SPSS Statistics but it can be a valuable tool even with a limited background in statistics. The learning curve is a very “gentle slope” when it comes to acquiring sufficient familiarity with SPSS Statistics to use it very effectively. Mastering the software does involve more time and effort but one can accomplish this over time as one builds on the initial knowledge that comes fairly easily. The good news is that one can obtain a lot of value from the software well before one truly masters it by discovering the many features.
What are some of the common problems in data analysis? How does this book help the readers overcome them?
Some of the most common pitfalls encountered when analyzing data involve handling missing/incomplete data, deciding which statistical method(s) to employ and understanding the results. In the book, we go into the details of detecting and addressing data issues including missing data. We also describe what each statistical technique provides and when it is most appropriate to use each of them. There are numerous examples of SPSS Statistics output and how the results can be used to assess whether a meaningful pattern exists.
In the context of all the above, how does your book Data Analysis with IBM SPSS Statistics help readers in their statistical analysis journey? What, according to you, are the 3 key takeaways for the readers from this book?
The approach we took with our book was to share with readers the most straightforward ways to use SPSS Statistics to quickly obtain the results needed to effectively conduct data analysis. We did this by showing the best way to proceed when it comes to analyzing data and then showing how this process can be done best in the software. The key takeaways from our book are the way to approach the discovery process when analyzing data, how to find hidden patterns present in the data and what to look for in the results provided by the statistical techniques covered in the book.
IBM SPSS Statistics 25 was released recently. What are the major improvements or features introduced in this version? How do these features help the analysts and researchers?
There are a lot of interesting new features introduced in SPSS Statistics 25. For starters, you can copy charts as Microsoft Graphic Objects, which allows you to manipulate charts in Microsoft Office. There are changes to the chart editor that make it easier to customize colors, borders, and grid line settings in charts. Most importantly, it allows the implementation of Bayesian statistical methods. Bayesian statistical methods enable the researcher to incorporate prior knowledge and assumptions about model parameters. This facility looks like a good teaching tool for Statistical Educators.
Data visualization goes a long way in helping decision-makers get an accurate sense of their data. How does SPSS Statistics help them in this regard?
Kenneth: Data visualization is very helpful when it comes to communicating findings to a broader audience and we spend time in the book describing when and how to create useful graphics to use for this purpose. Graphical examination of the data can also provide clues regarding data issues and hidden patterns that warrant deeper exploration. These topics are also covered in the book.
Anthony: SPSS Statistics’ data visualizations capabilities are excellent. The menu system makes it easy to generate common chart types. You can develop customized looks and save them as a template to be applied to future charts. Underlying SPSS Graphics is an influential approach called the Grammar of Graphics. The SPSS graphics capabilities are embodied in a versatile syntax called Graphics Programming Language.
Do you foresee SPSS Statistics facing stiff competition from open source alternatives in the near future? What is the current sentiment in the SPSS community regarding these topics?
Kenneth: Open source tools based alternatives such as Python and R are potential competition for SPSS Statistics but I would argue otherwise. These tools, while powerful, have a much steeper learning curve and will prove difficult for subject matter experts that periodically need to analyze data. SPSS is ideally suited for these periodic analysts whose main expertise lies in their field which could be healthcare, law enforcement, education, human resources, marketing, etc.
Anthony: The open source programs have a lot of capability but they are also fairly low-level languages, so you must learn to code. The learning curve is steep, and there are many maintainability issues. R has 2 major releases a year. You can have a situation where the data and commands remain the same, but the result changes when you update R. There are many dependencies among R packages. R has many contributors and is an avenue for getting your hands on new methods. However, there is a wide variance in the quality of the contributors and contributed packages. The occasional user of SPSS has an easier time jumping back in than does the occasional user of open source software. Most importantly, it is easier to employ SPSS in production settings.
SPSS Statistics supports custom analytical solutions through integration with R and Python. Is this an intent from IBM to join hands with the open source community?
This is a good follow-up question to the one asked before. Actually, the integration with R and Python allows SPSS Statistics to be extended to accommodate a situation in which an analyst wishes to try an algorithm or graphical technique not directly available in the software but which is supported in one of these languages. It also allows those familiar with R or Python to use SPSS Statistics as their platform and take advantage of all the built-in features it comes with, out of the box while still having the option to employ these other languages where they provide additional value.
Lastly, this book is designed for analysts and researchers who want to get meaningful insights from their data as quickly as possible. How does this book help them in this regard?
SPSS Statistics does make it possible to very quickly pull in data and get insightful results. This book is designed to streamline the steps involved in getting this done while also pointing out some of the less obvious “hidden gems” that we have discovered during the decades of using SPSS in virtually every possible situation.