Data Value
The first question that needs to be answered when designing a data-driven solution is, what value will be gained by processing, storing, and analyzing potential data sources? What answers are the business trying to solve? While it is true that having more data can provide new and more fine-grained insights, it can sometimes come at a cost. Organizations must give considerable thought to what data is valuable and what data is not, all the while trying to minimize the amount of time spent in the decision-making process.
Designing a data-driven solution requires everyone involved to focus on deriving value from every process in the solution. This means that data architects must know the business goal of the solution from the beginning. Is this going to be a transactional database that provides the backend for a business’s e-commerce site? Will it be a data warehouse aggregating data from multiple source systems to provide a holistic view of a business’s performance? Or will the data store need to be able to ingest bursts of IoT data for real-time analytics? To answer these questions, we first need to understand the different types of data stores and the scenarios for which each one is best suited.
Relational Databases
Relational databases organize data into tables that can be linked based on data common to each other. The relationship between tables allows users to easily query multiple tables in the same query by joining columns from multiple tables together. Database tables store data as rows and are organized into a set number of columns. Columns are defined by specific data types such as integer or string so that only specific types of data from new or modified rows of data is accepted. For example, if you have a database table with a name column that only accepts string values, then trying to insert a number into that column will fail. Relational databases allow designers to go a step forward and design constraints on columns so that data must meet predefined criteria. This predefined structure that data in relational databases must adhere to is called a schema and is fundamental to how users query relational data.
Users querying a relational database use a version of the Structured Query Language (SQL) to issue queries to the database. Depending on the vendor, most relational database management systems (RDBMSs) have their own variation of SQL that are based on the ANSI standardized version of SQL. For example, the Microsoft suite of RDBMSs (e.g., SQL Server, Azure SQL Database, Azure SQL Managed Instance) can be interacted with using Transact SQL (T-SQL). T-SQL provides four flavors of commands for query development:
- Data Manipulation Language (DML) commands are used to manipulate data in database tables. DML commands include SELECT, INSERT, UPDATE, and DELETE.
- Data Definition Language (DDL) commands are used to define RDBMS objects such as databases, tables, views, stored procedures, and triggers. DDL commands include CREATE, ALTER, and DROP.
- Data Control Language (DCL) commands are used to manage permissions and access control for users in a database. DCL commands include GRANT, REVOKE, and DENY.
- Transaction Control Language (TCL) commands are used to explicitly manage and control transaction execution to ensure that a specific transaction is successfully done without violating database integrity. TCL commands include BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION.
Relational database design considerations largely depend on what the database will be supporting. A database that’s supporting a business’s e-commerce site and needs to log every transaction made by a customer has vastly different requirements than a database that supports a report application. While there are many different design patterns for data-driven solutions, most of them fall into one of two broad categories: transactional processing systems or analytical systems.