How To Avoid Data Lake Crocodiles
Data lakes offer a powerful way to store massive volumes of structured and unstructured data—from IoT sensors to AI-generated output. In his recent Forbes article, technology journalist Adrian Bridgwater explores the benefits and risks of this approach, warning that without proper controls, data lakes can become dangerous waters. With everything flowing into one place, a single access point can expose sensitive data—especially if legacy systems or poorly tracked databases are involved. Add in AI tools that indiscriminately consume data, and you risk exposing personally identifiable information (PII) or duplicating flawed data across systems.
To avoid these “data lake crocodiles,” Bridgwater advises organizations to consider the following:
- Structure the lake using models like Microsoft’s Medallion Architecture to organize data by quality and sensitivity.
- Mask or synthesize data for development and testing—keeping real customer data safe while enabling realistic scenarios.
- Segment environments, creating separate, secure “shallow ends” for non-production use.
- Control AI access to ensure it doesn’t pull or reveal data it shouldn’t.
Data lakes are valuable tools—but only when managed properly. By combining thoughtful architecture, strong governance, and safe testing practices, businesses can unlock the full potential of their data—without becoming prey to lurking risks.



