





















































In this article by Sumit Gupta, author of the book Neo4j Essentials, we will discuss data modeling in Neo4j, which is evolving and flexible enough to adapt to changing business requirements. It captures the new data sources, entities, and their relationships as they naturally occur, allowing the database to easily adapt to the changes, which in turn results in an extremely agile development and provides quick responsiveness to changing business requirements.
(For more resources related to this topic, see here.)
Data modeling is a multistep process and involves the following steps:
This whole process is applied in an iterative and incremental manner, similar to what we do in agile, and has to be repeated again whenever we change our goals or add new goals/questions, which need to be answered by your graph model.
Let's see in detail how data is organized/structured and implemented in Neo4j to bring in the agility of graph models.
Based on the principles of graph data structure available at http://en.wikipedia.org/wiki/Graph_(abstract_data_type), Neo4j implements the property graph data model at storage level, which is efficient, flexible, adaptive, and capable of effectively storing/representing any kind of data in the form of nodes, properties, and relationships.
Neo4j not only implements the property graph model, but has also evolved the traditional model and added the feature of tagging nodes with labels, which is now referred to as the labeled property graph.
Essentially, in Neo4j, everything needs to be defined in either of the following forms:
Let's see how all of these four forms are related to each other and represented within Neo4j.
A graph essentially consists of nodes, which can also have properties. Nodes are linked to other nodes. The link between two nodes is known as a relationship, which also can have properties. Nodes can also have labels, which are used for grouping the nodes.
Let's take up a use case to understand data modeling in Neo4j. John is a male and his age is 24. He is married to a female named Mary whose age is 20. John and Mary got married in 2012.
Now, let's develop the data model for the preceding use case in Neo4j:
Easy, simple, flexible, and natural… isn't it?
The data structure in Neo4j is adaptive and effectively can model everything that is not fixed and evolves over a period of time.
The next step in data modeling is fetching the data from the data model, which is done through traversals. Traversals are another important aspect of graphs, where you need to follow paths within the graph starting from a given node and then following its relationships with other nodes. Neo4j provides two kinds of traversals: breadth first available at http://en.wikipedia.org/wiki/Breadth-first_search and depth first available at http://en.wikipedia.org/wiki/Depth-first_search.
If you are from the RDBMS world, then you must now be wondering, "What about the schema?" and you will be surprised to know that Neo4j is a schemaless or schema-optional graph database. We do not have to define the schema unless we are at a stage where we want to provide some structure to our data for performance gains. Once performance becomes a focus area, then you can define a schema and create indexes/constraints/rules over data.
Unlike the traditional models where we freeze requirements and then draw our models, Neo4j embraces data modeling in an agile way so that it can be evolved over a period of time and is highly responsive to the dynamic and changing business requirements.
In this section, we will discuss one of the most important aspects of Neo4j, that is, read-only Cypher queries.
Read-only Cypher queries are not only the core component of Cypher but also help us in exploring and leveraging various patterns and pattern matching constructs. It either begins with MATCH, OPTIONAL MATCH, or START, which can be used in conjunction with the WHERE clause and further followed by WITH and ends with RETURN. Constructs such as ORDER BY, SKIP, and LIMIT can also be used with WITH and RETURN.
We will discuss in detail about read-only constructs, but before that, let's create a sample dataset and then we will discuss constructs/syntax of read-only Cypher queries with illustration.
Let's perform the following steps to clean up our Neo4j database and insert some data which will help us in exploring various constructs of Cypher queries:
//Delete all relationships between Nodes MATCH ()-[r]-() delete r; //Delete all Nodes MATCH (n) delete n;
CREATE (:Movie {Title : 'Rocky', Year : '1976'}); CREATE (:Movie {Title : 'Rocky II', Year : '1979'}); CREATE (:Movie {Title : 'Rocky III', Year : '1982'}); CREATE (:Movie {Title : 'Rocky IV', Year : '1985'}); CREATE (:Movie {Title : 'Rocky V', Year : '1990'}); CREATE (:Movie {Title : 'The Expendables', Year : '2010'}); CREATE (:Movie {Title : 'The Expendables II', Year : '2012'}); CREATE (:Movie {Title : 'The Karate Kid', Year : '1984'}); CREATE (:Movie {Title : 'Rocky', Year : '1976'}); CREATE (:Movie {Title : 'Rocky II', Year : '1979'}); CREATE (:Movie {Title : 'Rocky III', Year : '1982'}); CREATE (:Movie {Title : 'Rocky IV', Year : '1985'}); CREATE (:Movie {Title : 'Rocky V', Year : '1990'}); CREATE (:Movie {Title : 'The Expendables', Year : '2010'}); CREATE (:Movie {Title : 'The Expendables II', Year : '2012'}); CREATE (:Movie {Title : 'The Karate Kid', Year : '1984'}); CREATE (:Movie {Title : 'The Karate Kid II', Year : '1986'});
CREATE (:Artist {Name : 'Sylvester Stallone', WorkedAs : ["Actor", "Director"]}); CREATE (:Artist {Name : 'John G. Avildsen', WorkedAs : ["Director"]}); CREATE (:Artist {Name : 'Ralph Macchio', WorkedAs : ["Actor"]}); CREATE (:Artist {Name : 'Simon West', WorkedAs : ["Director"]});
Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky"}) CREATE (artist)-[:ACTED_IN {Role : "Rocky Balboa"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky II"}) CREATE (artist)-[:ACTED_IN {Role : "Rocky Balboa"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky III"}) CREATE (artist)-[:ACTED_IN {Role : "Rocky Balboa"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky IV"}) CREATE (artist)-[:ACTED_IN {Role : "Rocky Balboa"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky V"}) CREATE (artist)-[:ACTED_IN {Role : "Rocky Balboa"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "The Expendables"}) CREATE (artist)-[:ACTED_IN {Role : "Barney Ross"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "The Expendables II"}) CREATE (artist)-[:ACTED_IN {Role : "Barney Ross"}]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky II"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky III"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "Rocky IV"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "Sylvester Stallone"}), (movie:Movie {Title: "The Expendables"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "John G. Avildsen"}), (movie:Movie {Title: "Rocky"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "John G. Avildsen"}), (movie:Movie {Title: "Rocky V"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "John G. Avildsen"}), (movie:Movie {Title: "The Karate Kid"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "John G. Avildsen"}), (movie:Movie {Title: "The Karate Kid II"}) CREATE (artist)-[:DIRECTED]->(movie); Match (artist:Artist {Name : "Ralph Macchio"}), (movie:Movie {Title: "The Karate Kid"}) CREATE (artist)-[:ACTED_IN {Role:"Daniel LaRusso"}]->(movie); Match (artist:Artist {Name : "Ralph Macchio"}), (movie:Movie {Title: "The Karate Kid II"}) CREATE (artist)-[:ACTED_IN {Role:"Daniel LaRusso"}]->(movie); Match (artist:Artist {Name : "Simon West"}), (movie:Movie {Title: "The Expendables II"}) CREATE (artist)-[:DIRECTED]->(movie);
Now, let's understand the different pieces of read-only queries and execute those against our movie dataset.
MATCH is the most important clause used to fetch data from the database. It accepts a pattern, which defines "What to search?" and "From where to search?". If the latter is not provided, then Cypher will scan the whole tree and use indexes (if defined) in order to make searching faster and performance more efficient.
Let's start asking questions from our movie dataset and then form Cypher queries, execute them on <$NEO4J_HOME>/bin/neo4j-shell against the movie dataset and get the results that will produce answers to our questions:
We can also return specific columns (similar to SQL). For example, the preceding statement can also be formed as MATCH (n:Artist {WorkedAs:["Actor"]}) RETURN n.name as Name;.
Let's understand the process of defining relationships in the form of Cypher queries in the same way as you did in the previous section while working with nodes:
For matching multiple relations replace [r:ACTED_IN] with [r:ACTED_IN | DIRECTED] and use single quotes or escape characters wherever there are special characters in the name of relationships.
In this section, we will talk about the integration of the BI tool—QlikView with Neo4j. QlikView is available only on the Windows platform, so this section is only applicable for Windows users.
Neo4j as an open source database exposes its core APIs for developers to write plugins and extends its intrinsic capabilities.
Neo4j JDBC is one such plugin that enables the integration of Neo4j with various BI / visualization and ETL tools such as QlikView, Jaspersoft, Talend, Hive, Hbase, and many more.
Let's perform the following steps for integrating Neo4j with QlikView on Windows:
You can configure the Logging Level and also define the JVM runtime options such as -Xmx and -Xms in the textbox provided for Option in the preceding screenshot.
Instead of adding individual libraries, we can also add a folder containing the same list of libraries by clicking on the Add Folder option.
We can also use non JDBC-4 compliant drivers by mentioning the name of the driver class in the Advanced Settings tab. There is no need to do that, however, if you are setting up a configuration profile that uses a JDBC-4 compliant driver.
CREATE (movies1:Movies {Title : 'Rocky', Year : '1976'}); CREATE (movies2:Movies {Title : 'Rocky II', Year : '1979'}); CREATE (movies3:Movies {Title : 'Rocky III', Year : '1982'}); CREATE (movies4:Movies {Title : 'Rocky IV', Year : '1985'}); CREATE (movies5:Movies {Title : 'Rocky V', Year : '1990'}); CREATE (movies6:Movies {Title : 'The Expendables', Year : '2010'}); CREATE (movies7:Movies {Title : 'The Expendables II', Year : '2012'}); CREATE (movies8:Movies {Title : 'The Karate Kid', Year : '1984'}); CREATE (movies9:Movies {Title : 'The Karate Kid II', Year : '1986'});
And we are done!!!!
You will see the data appearing in the listbox in the newly created Table Object. The data is fetched from the Neo4j database and QlikView is used to render this data.
The same process is used for connecting to other JDBC-compliant BI / visualization / ETL tools such as Jasper, Talend, Hive, Hbase, and so on. We just need to define appropriate JDBC Type-4 drivers in JDBC Connector.
We can also use ODBC-JDBC Bridge provided by EasySoft at http://www.easysoft.com/products/data_access/odbc_jdbc_gateway/index.html. EasySoft provides the ODBC-JDBC Gateway, which facilitates ODBC access from applications such as MS Access, MS Excel, Delphi, and C++ to Java databases. It is a fully functional ODBC 3.5 driver that allows you to access any JDBC data source from any ODBC-compatible application.
In this article, you have learned the basic concepts of data modeling in Neo4j and have walked you through the process of BI integration with Neo4j.
Further resources on this subject: