Kate Drogaieva: Data Engineer Portfolio

 

Data Warehouse

Insurance Data Warehouse Modeling

                            

About

GitHub

Data Lake

Parsing XML

Levenshtein distance in Data Analysis and Load

 

DevOps

 

 

AWS Redshift schema changes deployment

       CI/CD pipeline to deploy DW schema changes in Redshift based on AWS CodePipeline, CodeBuild, FlyWay and JUnit for testing

GitHub

 

Snowflake schema changes & DBT deployment

       CI/CD pipeline based on GitHub actions, FlyWay and DBT  test

GitHub

 

Data Pipelines

 

 

Pentaho Data Integration ETL

       Extracting from AWS Aurora into staging tables in MS SQL Server, transforming in SQL and Pentaho Data Integration, loading into AWS Redshift with post processing in stored procedures

About

GitHub

 

Matillion ELT

       Extracting from AWS Aurora loading into Redshift via Fivetran; transforming in Matillion and Redshift Stored Procedures

GitHub

 

 

dbt_scd2_plus Slowly Changing Dimension Type 2 (scd2) Custom Materialization dbt Package

       This DBT package provides a materialization that builds advanced version of slowly changing       dimension type 2 (scd2)

GitHub

 

 

Airflow + DBT + Postgres

       Simplified version of insurance policy transactions modeling and transforming in Postgres database

GitHub

DBT SCD2 from historical data and incremental changes not in order

 

Snowflake Pipe + Tasks + Stored Procedures

       Simplified version of insurance policy transactions modeling and transforming in Snowflake

GitHub

 

Snowflake Pipe + Dynamic Tables

GitHub

 

Snowflake Pipe + External Iceberg Tables + Tasks + Stored Procedures

GitHub

 

Airflow + DBT Python Model (Snowpark) + Snowflake Match Pattern Recognition

       Downloading market daily data from Yahoo! Finance's API, reporting growing stocks using structural breaks in DBT Python model and Snowflake SQL.

GitHub

 

Airflow + Web Scraping + Rest API + JSON parsing using Snowflake and DBT macro

       Loading stocks fundamental data from GuruFocus using Rest API based on recent changes detected from scrapped webpages.

GitHub

 

 

 

Data Feeds and Analysis (advanced SQL)

 

    Enterprise Rate Indication System

     Combining data from three different transactional systems, implementing several levels of data aggregations, applying capping, cumulative multiplication.

About

GitHub

   

    Modeling Data

     Datasets for modeling insurance rates across California for Auto, Home, and Landlord products.

About

GitHub

Modeling Data in Property and Casualty Insurance

   

   Snowflake Cortex functions

    Helpdesk tickets surveys monthly summaries and sentiments

GitHub

 

   Snowflake Cortex functions

     Helpdesk tickets surveys monthly summaries and sentiments

GitHub

   Snowflake Timeseries Forecasting

    Helpdesk tickets monthly forecast

GitHub

 

Data Governance

Implementation of Atlan Data catalog

 

Data Catalog

Case Study

 

Tableau: Product Performance Dashboards

50+ very complex calculations based on analysis transactional data, analytical function, different levels of aggregation and sophisticated capping rules. The calculation is performed in Redshift store procedures, views and Tableau dashboards.

  

About

GitHub

Tableau Public Workbook

 

 

 

 

 

 

 

Looker Studio: Incidents Management Dashboards

Helpdesk tickets trends and executive summaries

  

Looker Studio

 

 

Python and Machine Learning

 

AI and Transformers (Headlines generation model)

GitHub

                      

Hugging Face

Water Peril Claims Research with XGB and GLM models

About

GitHub

Auto Insurance Risk Classification and Claim Prediction

About

GitHub

 

Other

·        AWS Sage Maker Machine Learning Experiments Automation                      

·        Designing Machine Learning Experiment and Interpreting Experimental Results

·        Slowly Changing Dimension type 2 in Property and Casualty Insurance Company Data Warehouse

·        What Women Talk About (Topic Modeling)

 

 

LinkedIn

GitHub

Tableau Public