Fortune 500 Energy Client

Cloud Transformation | Data Engineering | Data Science

The Challenge

This client’s on-premises enterprise data warehouse (EDW) had been stretched to its limits, no longer being able to sustain additional load. Increased adoption of Internet of Things (IoT) devices rated to greatly increase the amount of data being captured and analyzed, and the hardware supporting the EDW was reaching its End of Life. Leadership decided to migrate from their EDW and create a cloud-based data lake.

The Solution

Deployed an AWS Storage Gateway solution for receipt of EDW data (batch and Change Data Capture) from Informatica, as well as a multi-stage AWS S3 data lake with geographically-specific access to raw and processed data. Developed a customizable serverless data validation system to ingest a variety of files and formats into the data lake, and cataloged metadata using AWS Glue and used serverless ETL to convert CSV into self-describing Parquet formats. Created an Amazon Workspaces VDI solution for data scientists to interface with AWS SageMaker for Machine Learning, and presented the processed and scoped data to business intelligence tooling using AWS Athena.

Successful Results

With the work performed here the organization is prepared to ingest, process, analyze, and report upon existing and imminent data in a cost-effective and elastic way which separates storage and compute concerns in an on-demand model.

Technologies Used

  • AWS
  • AWS Storage Gateway
  • AWS S3
  • AWS Lambda
  • AWS Step Functions
  • AWS DynamoDB
  • AWS Athena AWS EMR
  • Apache Parquet
  • AWS Workspaces
  • AWS SageMaker
  • Python | Informatica