1st things 1st, Data Lake Planning!

In today's world, data is king. Businesses are generating more data than ever before, and they need a way to store, manage, and analyze it all. A data lake is a centralized repository for all of an organization's data, both structured and unstructured. It can be a valuable tool for businesses of all sizes, but it's important to plan carefully before implementing a data lake.


What data? How to store? How to manage? How to analyze?

Once you've considered all of these factors, you can start planning your data lake.


Here are some of the things to consider when planning a data lake:

  • What data will you store in the data lake? This will depend on your business needs. Some common types of data that are stored in data lakes include:

    • Operational data from your business systems

    • Data from external sources, such as social media, sensors, and the internet of things

    • Data that is not yet ready for analysis, such as raw data or data that needs to be cleaned and transformed

  • How will you store the data? Data lakes can be stored on-premises or in the cloud. There are a variety of storage options available, so you'll need to choose the one that best meets your needs.

  • How will you manage the data? Data lakes can be complex to manage, so it's important to have a plan in place. Some of the things you'll need to consider include:

    • Data governance: How will you ensure that the data in your data lake is accurate, secure, and compliant?

    • Data security: How will you protect your data from unauthorized access, use, or disclosure?

    • Data quality: How will you ensure that the data in your data lake is accurate and complete?

  • How will you analyze the data? Data lakes can be used for a variety of analytics tasks, such as:

    • Data exploration: This is the process of getting to know your data and identifying patterns and trends.

    • Data visualization: This is the process of creating charts, graphs, and other visuals to help you understand your data.

    • Machine learning: This is the process of using algorithms to learn from data and make predictions.


Once you've considered all of these factors, you can start planning your data lake.

It's important to remember that data lakes are not a one-size-fits-all solution. The best way to plan a data lake is to work with a qualified data architect who can help you design a solution that meets your specific needs.

Note that your data structure could span more than 1 physical storage account. In some scenarios 3 data lakes are advisable.


Example of a data lake structure

|Raw 
|-Landing
|--Log
|---{Application Name}
|--Master and Reference
|---{Source System}
|--Telemetry
|---{Source System}
|----{Application}
|--Transactional
|---{Source System}
|----{Entity}
|-----{Version}
|------Delta
|-------{date (ex. rundate=2019-08-22)}
|------Full
|-Conformance
|--Log
|---{Application Name}
|--Master and Reference
|---{Source System}
|--Telemetry
|---{Source System}
|----{Application}
|--Transactional
|---{Source System}
|----{Entity}
|-----{Version}
|------Delta
|-------Input
|--------{date (ex. rundate=2019-08-22)}
|-------Output
|--------{date (ex. rundate=2019-08-22)}
|-------Error
|--------{date (ex. rundate=2019-08-22)}
|------Full
|-------Input
|--------{date (ex. rundate=2019-08-22)}
|-------Output
|--------{date (ex. rundate=2019-08-22)}
|-------Error
|--------{date (ex. rundate=2019-08-22)}
|Enriched
|-Standardized
|--Log
|---{Application Name}
|--Master and Reference
|---{Source System}
|--Telemetry
|---{Source System}
|----{Application}
|--Transactional
|---{Source System}
|----{Entity}
|-----{Version}
|------General
|--------{date (ex. rundate=2019-08-22)}
|-------Sensitive
|--------{date (ex. rundate=2019-08-22)}
|Curated
|-{Data Product}
|---{Entity}
|----{Version}
|-----General
|-------{date (ex. rundate=2019-08-22)}
|------Sensitive
|-------{date (ex. rundate=2019-08-22)}

 

Any questions? Feel free to contact us!

Previous
Previous

Introduction to Microsoft Fabric!