⬅️ [Previously] Data Extraction: Write Python code to extract weather data from the OpenWeather API


🧬 Current Weather Data Transformation:

In this chapter, we'll transform the raw weather data fetched from the Weather API into a clean and structured format suitable for storage in our Data Warehouse (BigQuery). This involves cleaning the data, selecting relevant information, and converting it into Pandas DataFrames - a tabular format ideal for relational databases like BigQuery.

Understanding the Data and Transformation Approach

Data Snapshot and Challenges

Below is an example of the raw current weather data we're working with:

Screenshot 2024-04-14 at 11.54.18 PM.png

The retrieved weather data is currently stored in Python dictionaries. These dictionaries may contain a mix of:

Our goal is to convert these dictionaries into well-structured Pandas DataFrames for efficient storage and analysis in BigQuery.

Transformation Approach:

Here's the approach we'll take to transform the data:

  1. Flattening Nested Structures: If a dictionary key (e.g., "base", "cod") holds a simple value, it becomes a column in the DataFrame.
  2. Exploding Nested Dictionaries: If a key holds a nested dictionary (e.g., "coord"), we "explode" it into separate columns. Each key-value pair in the nested dictionary becomes its own column prefixed with the original key. For instance, the "coord" key becomes "coord_lon" and "coord_lat".