Empowering Fabric Lakehouse Users Microsoft Fabric allows the developers to create delta tables in the Lakehouse. However, the automation of copying, updating, and deleting of data files in the Lakehouse might not be possible. How can we empower the end users to make these changes? Business Problem Today, we are going to cover two new tools from Microsoft that are in preview. First, the one lake explorer allows users to access files in the Lakehouse like they were in windows explorer. Second, the data wrangler extension for Visual Studio Code…
Category: Fabric
Articles related to Microsoft’s newest service.
Thread 05 – Data Engineering with Fabric
Data Presentation Layer Microsoft Fabric allows the developer to create delta tables in the lake house. The bronze tables contain multiple versions of the truth and the silver tables a cleaned up, single version of the truth. How can we combine the silver tables into a relational model for consumption from the gold layer? Business Problem Our manager at adventure works has asked us to use a metadata driven solution to ingest CSV files from external storage into a Microsoft Fabric. A typical medallion architecture will be used in the…
Thread 04 – Data Engineering with Fabric
Metadata Driven Pipelines What is a metadata driven pipeline? Wikipedia defines metadata as “data that provides information about other data”. As a developer, we can create a non parameterized pipeline and/or notebook to solve a business problem. However, if we have to solve the same problem a hundred times, the amount of code can get unwieldly. A better way to solve this problem is to store metadata in the delta lake. This data will drive how the Azure Data Factory and Spark Notebooks execute. Business Problem Our manager has asked…
Thread 03 – Data Engineering with Fabric
Full versus Incremental Loads The loading of data from a source system to target system has been well documented over the years. My first introduction to an Extract, Transform and Load program was DTS for SQL Server 7.0 in 1998. In a data lake, we have a bronze quality zone that supposed to represent the raw data in a delta file format. This might include versions of the files for auditing. In the silver quality zone, we have a single version of truth. The data is de-duplicated and cleaned up.…
Thread 02 – Data Engineering with Fabric
Managing Files and Folders What is a data lake? It is just a bunch of files organized by folders. Keeping these files organized prevents your data lake from becoming a data swamp. Today, we are going to learn about a python library that can help you. Business Problem Our manager has given us weather data to load into Microsoft Fabric. We need to create folders in the landing zone to organize these files by both full and incremental loads. How can we accomplish this task? Technical Solution This use case…
Thread 01 – Data Engineering with Fabric
Managed Vs Unmanaged Tables Microsoft Fabric was release to the general availability on November 15th, 2024. I will be writing a quick post periodically in 2024 to get you up to speed on how to manipulate data in the lake house using spark. I really like the speed of the starter pools in Microsoft Fabric. A one to ten node pool will be available for consumption in less than 10 seconds. Read all about this new compute from this learn page. Business Problem Our manager has given us weather data…