In this article by Vinay Singh, author of the book Real Time Analytics with SAP HANA, this article covers Full Text Search and hierarchies in SAP HANA, and how to create and use them in our data models.
After completing this article, you should be able to:
- Create and use Full Text Search
- Create hierarchies—level and parent child hierarchies
(For more resources related to this topic, see here.)
Creating and using Full Text Search
Before we proceed with the creation and use of Full Text Search, let’s quickly go through the basic terms associated with it. They are as follows:
- Text Analysis: This is the process of analyzing unstructured text, extracting relevant information, and then transforming this information into structure information that can be leveraged in different ways. The scripts provide additional possibilities to analyze strings or large text columns by providing analysis rules for many industries in many languages in SAP HANA.
- Full Text Search: This capability of HANA helps to speed up search capabilities within large amounts of text data significantly. The primary function of Full Text Search is to optimize linguistic searches.
- Fuzzy Search: This functionality enables to find strings that match a pattern approximately (rather than exactly). It’s a fault-tolerant search, meaning that a query returns records even if the search term contains additional or missing characters, or even spelling mistakes. It is an alternative to a non-fault tolerant SQL statement.
- The score() function: When using contains() in the where clause of a select statement, the score() function can be used to retrieve the score. This is a numeric value between 0.0 and 1.0. The score defines the similarity between the user input and the records returned by the search. A score of 0.0 means that there is no similarity. The higher the score, the more similar a record is to the search input.
Some of the applied applications of fuzzy search could be:
- Fault-tolerant check for duplicate records. Its helps to prevent duplication entry in Systems by searching similar entries.
- Fault-tolerant search in text columns—for example, search documents on diode and find all documents that contain the term “triode”.
- Fault-tolerant search in structure database content search for rhyming words, for example coffee Krispy biscuit and find toffee crisp biscuits (the standard example given by SAP).
Let’s see what are the use cases for text search:
- Combining structure and unstructured data
- Medicine and healthcare
- Brand monitoring and the buying pattern of consumer
- Real-time analytics on a large volume of data
- Data from social media
- Finance data
- Sales optimization
- Monitoring and production planning
The results of text analysis are stored in a table and therefore, can be leveraged in all the HANA- supported scenarios:
- Standard Analytics: Create analytical views and calculation views on top. For example, companies mentioned in news articles over time.
- Data mining, predictive: Using R, Predictive Analysis Library (PAL) functions. For example, clustering, time series analysis, and so on.
- Search-based applications: Create a search model and build a search UI with the HANA Info Access (InA) toolkit for HTML5. Text analysis results can be used to navigate and filter search results. For example, People finder, search UI for internal documents.
The capabilities of HANA Full Text Search and text analysis are as follows:
- Native full text search
- Database text analysis
- The graphical modeling of search models
- Info Access toolkit for HTML5 UIs.
The benefits of full text search:
- Extract unstructured content with no additional cost
- Combine structure and unstructured information for unified information access
- Less data duplication and transfer
- Harness the benefit of InA (Info Access toolkit ) for an HTML5 application
The following are the supported data types by fuzzy search:
- Short text
- Data with full text index.
Enabling search option
Before we can use the search option in any attribute or analytical view, we will need to enable this functionality in the SAP HANA Studio Preferences as shown in the following screenshot:
We are well prepared to move ahead with the creation and use of Full Text search. Let’s do this step by step as follows:
- Create the table that we will use to perform the Full Text Search statements:
Create Schema <DEMO>; // I am creating , it would be already present from our
previous exercises. SET SCHEMA DEMO; // Set the schema name Create a Column Table including FUZZY SEARCH indexed columns. DROP TABLE DEMO.searchtbl_FUZZY; CREATE COLUMN TABLE DEMO.searchtbl_FUZZY ( CUST_NAME TEXT FUZZY SEARCH INDEX ON, CUST_COUNTY TEXT FUZZY SEARCH INDEX ON, CUST_DEPT TEXT FUZZY SEARCH INDEX ON, );
- Prepare the fuzzy search logic (SQL logic):
Search for customers in the countries that contain the ‘MAIN’ word:
SELECT score() AS score, * FROM searchtbl_FUZZY WHERE CONTAINS(cust_county, 'MAIN');
Search for customers in the countries that contain the ‘MAIN’ word but with Fuzzy parameter 0.4
SELECT score() AS score, * FROM searchtbl_FUZZY WHERE CONTAINS(cust_county, 'West', FUZZY(0.3));
Perform a fuzzy search for a customer working in a department that includes the department word :
SELECT highlighted(cust_dept), score() AS score, * FROM searchtbl_FUZZY WHERE CONTAINS(cust_dept, 'Department', FUZZY(0.5));
Fuzzy search for all the columns by looking for the customer word:
SELECT score() AS score, * FROM searchtbl_FUZZY WHERE CONTAINS(*, 'Customer', FUZZY(0.5));
Hierarchies are created to maintain data in a structured format, such as maintaining customer or employee data based on their roles and splitting the data based on geographies. Hierarchical data is very useful for organizational purposes during decision making.
Two types of hierarchies can be created in SAP HANA:
- The level hierarchy
- Parent-child hierarchy
The hierarchies are initially created in the attribute view and later can be combined in the analytic view or calculation view for consumption in a report as per business requirements. Let’s create both types of hierarchies in attribute views.
Creating level hierarchy
Each level represents a position in the hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Each level above the base level contains aggregate values for the levels below it.
- Create a new attribute view (for your own practice, I would suggest you to create a new one). You can also use an existing one. Use the SNWD_PD EPM sample tables.
- In output view, mark the following as output:
- In the semantic node of the view, create new hierarchy as shown in the following screenshot and fill the details:
- Save and Activate the view.
- Now the hierarchy is ready to be used in an analytical view.
Add a client and node key again as output to your attribute view that you just created, that is AT_LEVEL_HIERARCY_DEMO, as we will use these two fields in
- Create an analytical view. It should look like the following screenshot.
Add the attribute view created in the preceding step and the SNWD_SO_I table to the data foundation:
- Join client to client and product guide to node key:
- Save and activate.
- Go to MS Excel | All Programs | Microsoft Office | Microsoft Excel 2010 then go to Data tab | From Other Sources | From Data Connection Wizard.
- You will get a new popup for Data Connection Wizard | Other/Advanced | SAP HANA MDX Provider:
- You will be asked to provide the connection details, fill the details, and test the connection (these are the same details that you used while adding the system to SAP HANA Studio).
- Data Connection Wizard will now ask you to choose the analytical view (choose the one that you just created in the preceding step):
- The preceding steps will take you to an excel sheet and you will see data as per the choices that you chose in the Pivot table field list:
Create parent-child hierarchy
The parent-child hierarchy is a simple, two-level hierarchy where the child element has an attribute containing the parent element. These two columns define the hierarchical relationships among the members of the dimension. The first column, called the member key column, identifies each dimension member. The other column, called the parent column, identifies the parent of each dimension member. The parent attribute determines the name of each level in the parent-child hierarchy and determines whether the data for parent members should be displayed
Let’s create a parent-child hierarchy using the following steps:
- Create an attribute view.
- Create a table that has the parent-child information:
The following is the sample code and the insert statement:
CREATE COLUMN TABLE "DEMO"."CCTR_HIE"( "CC_CHILD" NVARCHAR(4), "CC_PARENT" NVARCHAR(4)); insert into "DEMO"."CCTR_HIE" values('','') insert into "DEMO"."CCTR_HIE" values('C11','c1'); insert into "DEMO"."CCTR_HIE" values('C12','c1'); insert into "DEMO"."CCTR_HIE" values('C13','c1'); insert into "DEMO"."CCTR_HIE" values('C14','c2'); insert into "DEMO"."CCTR_HIE" values('C21','c2'); insert into "DEMO"."CCTR_HIE" values('C22','c2'); insert into "DEMO"."CCTR_HIE" values('C31','c3'); insert into "DEMO"."CCTR_HIE" values('C1','c'); insert into "DEMO"."CCTR_HIE" values('C2','c'); insert into "DEMO"."CCTR_HIE" values('C3','c');
- We will put the preceding table into our data foundation of attribute view as follows:
- Make CC_CHILD as the key attribute.
- Now let’s create new hierarchy as shown in the following screenshot:
- Save and activate the hierarchy.
- Create a new analytical view and add the HIE_PARENT_CHILD_DEMO view and the CCTR_COST table in data foundation.
- Join CCTR to CCTR_CILD with many is to one relationship. Make sure that in the semantic node, COST is set as a measure.
- Save and Activate the analytical view.
- Preview the data.
As per the business need, we can use one of the two hierarchies along with attribute view or analytical view.
In this article, we took a deep dive into Full Text Search, fuzzy logic, and hierarchies concepts. We learned how to create and use text search and fuzzy logic. The parent-child and level hierarchies were discussed in detail with a hands-on approach on both.
Resources for Article:
- Sabermetrics with Apache Spark [article]
- Meeting SAP Lumira [article]
- Achieving High-Availability on AWS Cloud [article]