Identify Microsoft cloud services for real-time analytics – Describe an analytics workload on Azure

Identify Microsoft cloud services for real-time analytics

In an age where decisions must often be made at the blink of an eye, the role of real-time analytics has become paramount. The ability to rapidly sift through vast streams of data, distill meaningful insights, and act on them instantaneously can often mean the difference between seizing an opportunity or missing it entirely. But beyond the buzzwords, what does real-time analytics truly entail, especially when you’re navigating the vast offerings of the Azure ecosys-tem? This section will guide you through Azure’s real-time analytics technologies, demystifying their capabilities and applications and setting you on a course to harness their full potential. From understanding the prowess of Azure Stream Analytics to grasping the nuances of Azure Synapse Data Explorer and Spark structured streaming, you’re about to get to the heart of instant data processing and analytics.

■ Stream processing platforms: At the center stage of real-time analytics are stream processing platforms. A stalwart example is Azure Stream Analytics, which you can use to ingest, process, and analyze data as it flows. To visualize its power, consider moni-toring a vast power grid, instantly detecting surges, and redirecting power to prevent outages. Just like the grid managers, you can harness Azure Stream Analytics to react immediately to your business’s data.

■ Azure Synapse Data Explorer: This isn’t just another tool—it’s your window into the massive streams of data you’re dealing with. With Azure Synapse Data Explorer you can

120 CHAPTER 4 Describe an analytics workload on Azure

query, visualize, and explore your data in real time. It’s like having a magnifying glass over a rushing river of data, where you can pick out and examine individual drops (or data points) as they flow by.

■■ Spark Structured Streaming: An integral part of the Apache Spark ecosystem, Spark Structured Streaming facilitates scalable and fault-tolerant stream processing of live data streams. Imagine standing amidst a bustling stock market, with traders shouting orders and prices fluctuating wildly. Now, imagine you could process, aggregate, and make sense of all that data in real time. That’s the magic Spark Structured Streaming brings to the table. Figure 4-16 shows you streaming lines of data converging into structured blocks of information.

FIGURE 4-16 Streaming data converging into structured datasets

■■ Message brokers: Azure Event Hubs stands tall as a premier message broker. As you navigate the labyrinth of real-time data, you’ll realize the critical role of these brokers in ensuring data is delivered reliably and promptly to the systems that process them. It’s the backbone, the silent carrier ensuring every piece of data reaches its destination.

■■ NoSQL databases: In the realm of real-time data, traditional databases can become bottlenecks. This is where powerhouses like Cosmos DB shine. Designed for breakneck speeds and unmatched scalability, they provide the storage that might be required for the deluge of real-time data. If you’ve ever wondered how global social media platforms can show trending topics within seconds of an event unfolding, NoSQL databases are a big part of that answer.

■■ Data visualization tools: The journey from data to decision is completed when insights are visualized and made actionable. Power BI serves as a beacon here, integrating with real-time analytics platforms to deliver live data dashboards. These aren’t just numbers and graphs; they’re the pulse of your operations, showcased in real time.

The ecosystem of real-time analytics is vast and ever-evolving. As you delve deeper, be pre-pared to witness the symphony of technologies working in unison, each playing its unique note in the grand composition of real-time insights. Each technology, be it Azure Stream Analytics,

Skill 4.2 Describe consideration for real-time data analytics CHAPTER 4 121

Azure Synapse Data Explorer, or Spark Structured Streaming, has its own nuances, applications, and potentials.

Azure Stream Analytics – Describe an analytics workload on Azure

Azure Stream Analytics

In today’s data-driven world, the need to react immediately to unfolding events has never been greater. Picture yourself on the trading floor, where milliseconds can decide millions. Or consider a bustling metropolis where urban sensors constantly monitor traffic, air quality, and energy consumption. Azure Stream Analytics is Microsoft’s answer to the challenges of real-time data ingestion, processing, and analytics.

Azure Stream Analytics is a real-time event data processing service that you can use to har-ness the power of fast-moving streams of data. But what does it really mean for you?

WHY AZURE STREAM ANALYTICS?

Azure Stream Analytics brings the following tools to your toolkit:

■■ Seamless integration: Azure Stream Analytics beautifully integrates with other Azure services. Whether you’re pulling data from IoT Hub, Event Hub, or Blob Storage, Stream Analytics acts as your cohesive layer, processing and redirecting the data to databases, dashboards, or even other applications, as shown in Figure 4-17.

■■ SQL-based query language: You don’t need to be a programming wizard to harness Azure Stream Analytics. If you’re familiar with SQL, you’re already ahead of the curve. Stream Analytics employs a SQL-like language, allowing you to create transformation queries on your real-time data.

FIGURE 4-17  Azure Stream Analytics

■■ Scalability and reliability: One of the hallmarks of Azure Stream Analytics is its abil-ity to scale. Whether you’re processing a few records or millions every second, Stream Analytics can handle it. More so, its built-in recovery capabilities ensure that no data is lost in the case of failures.

122 CHAPTER 4     Describe an analytics workload on Azure

■■ Real-time dashboards: Azure Stream Analytics is not just about processing; it’s also about visualization. With its ability to integrate seamlessly with tools like Power BI, you can access real-time dashboards that update as events unfold.

■■ Time windowing: One of the stand-out features you’ll appreciate is the ease with which you can perform operations over specific time windows—be it tumbling, sliding, or hopping. For instance, you might want to calculate the average temperature from IoT sensors every five minutes; Stream Analytics has got you covered.

Tumbling window in stream processing refers to a fixed-duration, nonoverlapping interval used to segment time-series data. Each piece of data falls into exactly one window, defined by a distinct start and end time, ensuring that data groups are mutu-ally exclusive. For instance, with a 5-minute tumbling window, data from 00:00 to 00:04 would be aggregated in one window, and data from 00:05 to 00:09 in the next, facilitat-ing structured, periodic analysis of streaming data.

Sliding window in stream processing is a type of data analysis technique where the window of time for data aggregation “slides” continuously over the data stream. This means that the window moves forward by a specified slide interval, and it overlaps with previous windows. Each window has a fixed length, but unlike tumbling windows, sliding windows can cover overlapping periods of time, allowing for more frequent analysis and updates. For example, if you have a sliding window of 10 minutes with a slide interval of 5 minutes, a new window starts every 5 minutes, and each window overlaps with the previous one for 5 minutes, providing a more continuous and overlapping view of the data stream.

Hopping window in stream processing is a time-based window that moves forward in fixed increments, known as the hop size. Each window has a specified duration, and the start of the next window is determined by the hop size rather than the end of the previ-ous window. This approach allows for overlaps between windows, where data can be included in multiple consecutive windows if it falls within their time frames. For example, with a window duration of 10 minutes and a hop size of 5 minutes, a new window starts every 5 minutes, and each window overlaps with the next one for a duration determined by the difference between the window size and the hop size.

■■ Anomaly detection: Dive into the built-in machine learning capabilities to detect anomalies in your real-time data streams. Whether you’re monitoring web clickstreams or machinery in a factory, Azure Stream Analytics can alert you to significant deviations in patterns.

As a practical example to truly appreciate the potential of Azure Stream Analytics, consider a smart city initiative. Urban sensors, spread across the city, send real-time data about traf-fic, energy consumption, and more. Through Azure Stream Analytics, this data is ingested in real time, processed to detect any irregularities such as traffic jams or power surges, and then passed on to Power BI dashboards that city officials monitor. The officials can then take imme-diate action, such as rerouting traffic or adjusting power distribution.

Skill 4.2 Describe consideration for real-time data analytics     CHAPTER 4      123

In summary, Azure Stream Analytics is a tool for those yearning to transform raw, real-time data streams into actionable, meaningful insights. And as you delve deeper into its features and integrations, you’ll realize that its possibilities are vast and ever-evolving.

Stream Processing – Core Data Concepts

Stream Processing

Instead of processing groups of data at scheduled intervals as you would with batch processing, stream processing performs actions on data in real time as it is generated. The proliferation of connected applications and IoT sensor devices in recent years has led to a boom in the amount of data sources that can stream data. Organizations that leverage data streams are able to innovate at an on-the-go pace, allowing them to instantly respond to the needs of their customers.

You can think of a stream of data as a continuous flow of data from some source, also known as a message producer. Each piece of data in a stream is often referred to as an event or a message and typically arrives in an unstructured or semi-structured format such as JSON. The following list includes some examples of stream processing:

  • An e-commerce company analyzing click-stream data as consumers are browsing the company’s website to provide product recommendations in real time
  • Fitness trackers streaming heart rate and movement data to a mobile app and providing real-time updates of the wearer’s workout efficiency
  • Financial institutions tracking stock market changes in real time and automatically making portfolio decisions as stock prices change
  • Oil companies monitoring the status of pipelines and drilling equipment

While these examples include the same transformation activities as many batch processes, they have vastly shorter latency requirements.

Stream processing is just one step in designing a real-time data processing solution. The following logical components will need to be considered when designing a real-time solution:

  • Real-time message ingestion—The architecture must include a way to capture and store real-time messages regardless of the technology that is creating the stream of data. Message brokers such as Azure Event Hubs, Azure IoT Hub, and Apache Kafka are used to ingest millions of events per second from one or many message producers. These technologies will then queue messages before sending them to the next appropriate step in the architecture. Most of the time this will be a processing engine of some type, but some solutions will require sending the raw messages to a long-term storage solution such as Azure Blob Storage or ADLS for future batch analysis.
  • Stream processing—Stream processing engines are the compute platforms that process, aggregate, and transform data streams. Technologies such as Azure Functions, Azure Stream Analytics, and Azure Databricks Structured Streaming can create time-boxed insights data that is queued in a real-time message broker. These technologies will then write the results to message consumers such as an analytical data store or a reporting tool that can display real-time updates.
  • Analytical data store—Processed real-time data can be written to databases such as Azure Synapse Analytics, Azure Data Explorer, and Azure Cosmos DB that power analytical applications.
  • Analysis and reporting—Instead of being written to an analytical data store first, processed real-time data can be published directly from the stream processing engine to report applications like Power BI.

Describe data visualization in Microsoft Power BI – Describe an analytics workload on Azure

Skill 4.3 Describe data visualization in Microsoft Power BI

Dive into the transformative world of data visualization with Microsoft Power BI, a tool that not only brings your data to life but also empowers you to extract insights with unparalleled ease and finesse. As you delve deeper into this segment, imagine the vast swathes of data, currently sitting in spreadsheets or databases and metamorphosing into vibrant charts, intricate graphs, and interactive dashboards. With Power BI, you can tailor every detail of your visualizations to your precise needs.

Picture a dashboard where sales metrics, customer demographics, and operational efficien-cies merge seamlessly, with each visual element telling its part of the larger story, as shown in Figure 4-20. That’s the promise of Power BI, a canvas where data finds its voice. And while the visual elements captivate, remember that beneath them lie robust analytical capabilities. Want to drill down into a specific data point? Curious about trends over time? Power BI is more than up to the task, offering you both the broad view and the minute details.

In this section, you’ll encounter vivid examples that underscore the versatility and power of Power BI. From crafting simple bar charts to designing multidimensional maps, you’ll learn the art and science of making data dance to your tune.

And while our guidance here is comprehensive, Power BI’s expansive capabilities mean

there’s always more to explore. Consider referring to Microsoft’s official resources for deeper dives, advanced tutorials, and community-driven insights. Let’s embark on this enlightening journey, ensuring that, by its end, you’re not just a data analyst but also a data storyteller.

FIGURE 4-20  Power BI interactive dashboard

Skill 4.3 Describe data visualization in Microsoft Power BI    CHAPTER 4     127

This skill covers how to:

  • Identify capabilities of Power BI
  • Describe features of data models in Power BI

EXPLORING SCHEMAS – Describe an analytics workload on Azure

EXPLORING SCHEMAS

These are the different types of schemas:

■■ Star schema: One of the most prevalent schema designs in Power BI, the star schema consists of a central fact table surrounded by dimension tables. The fact table houses quantitative data (like sales amounts), while dimension tables contain descriptive attributes (like product names or customer details). This structure, resembling a star as shown in Figure 4-23, ensures streamlined data queries and optimal performance.

FIGURE 4-23  Star schema

  • Snowflake schema: An evolution of the star schema, the snowflake schema sees dimension tables normalized into additional tables, as shown in Figure 4-24. This schema can be more complex but offers a more granulated approach to data relation-ships, making it apt for intricate datasets.

Skill 4.3 Describe data visualization in Microsoft Power BI    CHAPTER 4       131

FIGURE 4-24 Snowflake schema

DimEmployee

A PRACTICAL SCENARIO

Imagine running an online bookstore. You have tables for orders, customers, books, and authors. Using the star schema, the Orders table sits at the center as your fact table, containing transaction amounts. The other tables serve as dimensions, with the Books table containing a foreign key linking to the Authors table, establishing a many-to-one relationship.

Harnessing the potential of table relationships and choosing the right schema in Power BI isn’t just a technical endeavor; it’s an art form. By understanding and correctly implementing these relationships, you’re crafting a tapestry where data flows seamlessly to offer insights that are both deep and interconnected.

Hierarchies – Describe an analytics workload on Azure

Hierarchies

Hierarchies in Power BI allow you to layer different fields in a structured order, offering a mul-tilevel perspective on your data. At a basic level, think of hierarchies as ladders of information, where each rung offers a more granular view than the last.

For instance, in a time hierarchy, you might start with years and descend to months, then weeks, and, finally, days. Each level represents a deeper dive into your data, allowing for

132 CHAPTER 4   Describe an analytics workload on Azure

detailed drill-down analysis. As shown in Figure 4-25, you are able to view your total yearly sales and then drill down to see a more detailed breakdown of your yearly sales by month.

FIGURE 4-25  Power BI hierarchies

WHY HIERARCHIES MATTER

In the realm of data analytics within Power BI, hierarchies represent a fundamental and sophisticated mechanism for organizing and dissecting complex datasets. These structured frameworks are not merely for organizational clarity; they serve as critical tools for enhanc-ing analytical depth and navigational efficiency. Hierarchies in Power BI facilitate a multilay-ered approach to data examination, providing a powerful means to dissect, understand, and visualize data in a methodical and meaningful way. The following are some essential facets of hierarchies that underscore their significance in professional and technical data analysis:

■■ Efficient data exploration: With hierarchies, you can seamlessly navigate between different levels of data. This efficiency facilitates intuitive data exploration, letting you zoom in on details or pull back to view broader trends.

■■ Enhanced visualizations: Hierarchies bring a dynamic dimension to visualizations. Whether it’s a column chart or a map, the ability to drill down through hierarchical levels enriches the visual story, making it more interactive and engaging.

■■ Consistent analysis framework: Hierarchies provide a structured framework for analy-sis. By establishing a clear order of fields, they ensure consistency in how data is viewed and analyzed across reports and dashboards.

Data categorization – Describe an analytics workload on Azure

Data categorization

Data categorization in Power BI involves assigning a specific type or category to a data column, thereby providing hints to Power BI about the nature of the data. This categorization ensures that Power BI understands and appropriately represents the data, especially when used in visu-als or calculations.

WHY DATA CATEGORIZATION MATTERS

Data categorization in Power BI is pivotal for extracting maximum value from your datasets, impacting everything from visualization choices to data integrity. It enables Power BI to pro-vide tailored visual suggestions, enhances the effectiveness of natural language queries, and serves as a critical tool for data validation. Here’s why categorizing your data correctly matters:

  • Enhanced visualization interpretation: By understanding the context of your data, Power BI can auto-suggest relevant visuals. Geographical data, for instance, would prompt map-based visualizations, while date fields might suggest time-series charts.
  • Improved search and Q&A features: Power BI’s Q&A tool, which allows natural language queries, leans on data categorization. When you ask for “sales by city,” the tool knows to reference geographical data due to the categorization of the City column.
  • Data validation: Categorization can act as a form of data validation. By marking a column as a date, any nondate values become evident, highlighting potential data quality issues.

Skill 4.3 Describe data visualization in Microsoft Power BI    CHAPTER 4    135

COMMON DATA TYPES IN POWER BI

In Power BI, the clarity and accuracy of your reports hinge on understanding the core data types at your disposal. Each data type serves a specific purpose, shaping how information is stored, analyzed, and presented. The following are common data types:

  • Text: Generic textual data, from product names to descriptions
  • Whole number: Numeric data without decimal points, like quantities or counts
  • Decimal number: Numeric data with decimal precision, suitable for price or rate data
  • Date/time: Fields that have timestamps, including date, time, or both

THE POWER OF QUICK MEASURES – Describe an analytics workload on Azure

THE POWER OF QUICK MEASURES

Quick Measures in Power BI streamlines the analytical process by offering a suite of predefined calculations. Here is a list of the key benefits Quick Measures brings to Power BI:

■■ Efficiency: You no longer need to remember or construct intricate DAX formulas for common calculations. Quick Measures offers a library of these, ready to be deployed.
■■ Consistency: By using standardized formulas, you ensure consistency in your metrics, which is especially beneficial if sharing reports or datasets across teams.
■■ Learning tool: For those new to DAX or Power BI, Quick Measures can act as an educa-tional tool, offering insights into how specific formulas are constructed.

POPULAR QUICK MEASURES

Power BI’s Quick Measures feature offers a range of popular calculations designed to enhance data analysis. The following are the key measures:

■■ Time intelligence: Gathering year-to-date, quarter-to-date, month-over-month changes, and running totals
■■ Mathematical operations: Calculating percentages, differences, or products of columns
■■ Statistical measures: Calculating averages, medians, or standard deviations

■■ Aggregations: Summing, counting, or finding the minimum or maximum of a column based on certain conditions

A PRACTICAL WALK-THROUGH

Imagine you’re analyzing sales data and you want to understand month-over-month growth for a particular product.

Instead of manually creating a DAX formula to compute this, follow these steps:

  1. In a table or matrix visual, right-click a numerical column, like Sales, and select “New quick measure,” as shown in Figure 4-26.

Skill 4.3 Describe data visualization in Microsoft Power BI     CHAPTER 4 137

  1. From the Quick Measures dialog box, choose “Month-over-month change” from the Time Intelligence category.
  2. Follow the prompts, selecting the appropriate fields (e.g., Sales and Date).
  3. Once added, the new quick measure will compute the month-over-month growth for sales, dynamically adjusting based on filters or slicers applied to your report.
    By employing Quick Measures, you’ve saved precious time and ensured accuracy, letting Power BI handle the complexities of DAX on your behalf.
    When embarking on the Power BI journey, you’ll find many tools designed to make your analytical process smoother and more efficient. Quick Measures stands as a testament to this, offering you a shortcut to insights without compromising on depth or accuracy.

News and commentary about the exam objective updates – DP-900 Microsoft Azure Data Fundamentals Exam Updates

News and commentary about the exam objective updates

The updates to the DP-900 exam objectives effective February 1, 2024, reveal a few noteworthy changes and refinements compared to the previous version. The following is commentary on each of the updates:

Audience Profile

■■ Before & After Update: The target audience remains consistent. The exam is aimed at candidates new to working with data in the cloud, requiring familiarity with core data concepts and Microsoft Azure data services.

150 CHAPTER 5 DP-900 Microsoft Azure Data Fundamentals Exam Updates

Describe Core Data Concepts (25–30%)

■■ Before & After Update: This section remains largely unchanged, focusing on representing data (structured, semi-structured, unstructured), data storage options, and common data workloads (transactional, analytical). The roles and responsibilities associated with these workloads are also consistently covered.

Identify Considerations for Relational Data on Azure (20–25%)

■■ Before & After Update: Both versions cover relational concepts, including features of relational data, normalization, SQL statements, and common database objects. A notable change is the explicit mention of the “Azure SQL family of products” in the updated objectives, offering a clearer focus on specific Azure services.

Describe Considerations for Working with Non-Relational Data on Azure (15–20%)

■■ Before & After Update: This section remains consistent in both versions, covering Azure storage capabilities (Blob, File, Table storage) and Azure Cosmos DB features. The emphasis on understanding Azure’s storage solutions and Cosmos DB’s use cases and APIs continues to be a crucial part of this section.

Describe an Analytics Workload on Azure (25–30%)

■■ Before Update: This section previously included details on Azure services for data warehousing, real-time data analytics technologies (Azure Stream Analytics, Azure Synapse Data Explorer, Spark Structured Streaming), and data visualization in Power BI.

■■ After Update: The updated objectives maintain the focus on large-scale analytics, data warehousing, and real-time data analytics but have removed specific mentions of technologies like Azure Stream Analytics, Azure Synapse Data Explorer, and Spark Structured Streaming. Instead, there’s a broader reference to “Microsoft cloud services for real-time analytics,” suggesting a more general approach. The section on Power BI remains similar, emphasizing its capabilities, data models, and visualization options.

General Observations:

■■ The updates indicate a shift toward a more generalized and possibly up-to-date over-view of Azure services, especially in the analytics workload section.
■■ The explicit mention of the Azure SQL family of products under relational data shows an emphasis on Azure-specific services.
■■ Overall, the changes seem to align the exam more closely with current Azure offerings and trends in cloud data management without significantly altering the core content or focus areas of the exam.

These updates suggest a continued emphasis on ensuring that candidates have a well-rounded understanding of Azure’s data services, both relational and non-relational, along with a solid grasp of analytical workloads as they pertain to Azure’s environment.

News and commentary about the exam objective updates CHAPTER 5 151

Describe Types of Core Data Workloads – Core Data Concepts

Describe Types of Core Data Workloads

The volume of data that the world has generated has exploded in recent years. Zettabytes worth of data is created every year, the variety of which is seemingly endless. Competing in a rapidly changing world requires companies to utilize massive amounts of data that they have only recently been exposed to. What’s more is that with the use of edge devices that allow Internet of Things (IoT) data to seamlessly move between the cloud and local devices, companies can make valuable data-driven decisions in real time.

It is imperative that organizations leverage data when making critical business decisions. But how do they turn raw data into usable information? How do they decide what is valuable and what is noise? With the power of cloud computing and storage costs growing cheaper and cheaper every year, it’s easy for companies to store all the data at their disposal and build creative solutions that combine a multitude of different design patterns. For example, modern data storage and computing techniques allow sports franchises to create more sophisticated training programs by combining traditional statistical information with real-time data captured from sensors that measure features such as speed and agility. E-commerce companies leverage click-stream data to track a user’s activity while on their website, allowing them to build custom experiences for customers to reduce customer churn.

The exponential growth in data and the number of sources organizations can leverage to make decisions have put an increased focus on making the right solution design decisions. Deciding on the most optimal data store for the different types of data involved and the most optimal analytical pattern for processing data can make or break a project before it ever gets started. Ultimately, there are four key questions that need to be answered when making design decisions for a data-driven solution:

  • What value will the data powering the solution provide?
  • How large is the volume of data involved?
  • What is the variety of the data included in the solution?
  • What is the velocity of the data that will be ingested in the target platform?