Nonrelational Data Stores
There is a wide variety of data that doesn’t fit in a relational model. Nonrelational data, also known as NoSQL (Not Only SQL), refers to data that doesn’t fit into a relational model. Some solutions require more flexible data models than that of a relational database and can afford to trade ACID compliancy for speed and horizontal scale. NoSQL databases can handle volatile data that is written and read quickly better than relational databases because they don’t force the data to conform to a specific structure.
Binary objects such as images, videos, and PDFs are also considered nonrelational data. While relational databases such as SQL Server can store files such as the PDF copy of this book using features such as FILESTREAM, it is not the most optimal solution for file storage. Object stores are optimized for binary file storage and can be easily accessed to serve these files to applications. They can also be used to create highly scalable data lake ecosystems for big data processing solutions.
NoSQL Databases
NoSQL databases do not impose a schema on data they store, allowing data to maintain its natural format as it is ingested. In fact, one of the primary benefits is that users who are designing a NoSQL database solution do not need to define the schema ahead of time. This flexibility makes NoSQL databases the ideal choice for solutions that require millisecond response times and need to be able to scale rapidly. Scenarios where NoSQL databases are potentially better options than relational databases include ingesting and analyzing bursts of data from IoT sensors, storing product catalog data for an e-commerce site’s web search functionality, and storing user-generated content for web, mobile, and social media applications.
Instead of storing data as rows in a table as in a relational database, data is stored as entities in collections or containers. Unlike rows in a table, entities in the same collection can have a different set of fields. This flexibility allows for several different implementations of NoSQL databases depending on the solution requirements. Generally, these implementations fall into the following four categories:
- Key-value stores are the simplest types of NoSQL database for inserting and querying data (see Figure 1.1). Each piece of data contains a key and a value. The key serves as a unique identifier for the piece of data, and the value contains the data. Values can be scalar values or complex structures such as a JSON array. When applications are querying data from key-value stores, they issue queries that specify the keys to retrieve the values. Figure 1.1 is an example of a phone directory that stores one or more phone numbers per person in a key-value store. Examples of key-value stores include Python dictionary objects, Azure Table storage, and the Azure Cosmos DB Table API.
FIGURE 1.1 Key-value store
- Document databases are the most common types of NoSQL databases (see Figure 1.2). Pieces of data are defined as documents and are typically stored in JSON, XML, YAML, or BSON format. Each document includes a document key that serves as a unique identifier for management and query lookups. Unlike a key-value store that can only retrieve data by doing a search on the key, applications querying a document database can perform lookups on a document’s key and/or one or more of its fields to retrieve specific sets of data. This feature makes document databases a better option for applications that need to be more selective. Figure 1.2 illustrates an example of customer orders stored as documents in a document database. Examples of document databases include MongoDB and the Azure Cosmos DB Core (SQL) API.
FIGURE 1.2 Document database
- Columnar databases appear like relational databases conceptually (see Figure 1.3). They organize data into rows and columns but denormalize data so that it is divided into multiple column families. Each column family holds a set of columns that are logically related. Figure 1.3 is an example of a bicycle company’s product information stored in a columnar format. An example of a columnar database is the Azure Cosmos DB Cassandra API.
FIGURE 1.3 Columnar database
- Graph databases store data as entities and focus on the relationship that these entities have with each other (see Figure 1.4). Entities are defined as nodes, while the relationships between them are defined as edges. Applications querying a graph database do so by traversing the network of nodes and edges, analyzing the relationships between entities. While relational databases can accomplish similar goals, large graph databases can perform very traverse relationships very quickly bypassing the need to perform multiple join operations on many tables. Figure 1.4 illustrates an example of a graph database that stores an organization’s personnel chart. The entities represent different job titles and departments, while the edges represent how each entity is related. Examples of graph databases include Neo4j and the Azure Cosmos DB Gremlin API.
FIGURE 1.4 Graph database