I am currently trying to transform a netcdf file into a dataframe. I am using the following data: https://www.ufz.de/export/data/2/248980_SMI_SM_L02_Oberboden_monatlich_1951-2020_inv.nc
The file contains the following information:
<class 'netCDF4._netCDF4.Variable'> int32 time(time) standard_name: time long_name: time units: days since 1951-01-16 00:00:00 calendar: standard axis: T unlimited dimensions: time current shape = (840,) filling off <class 'netCDF4._netCDF4.Variable'> int32 easting(easting) axis: X unlimited dimensions: current shape = (175,) filling off <class 'netCDF4._netCDF4.Variable'> int32 northing(northing) axis: Y unlimited dimensions: current shape = (225,) filling off <class 'netCDF4._netCDF4.Variable'> float32 SMI(time, northing, easting) long_name: soil moisture index units: - _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: time current shape = (840, 225, 175) filling off <class 'netCDF4._netCDF4.Variable'> float64 lat(northing, easting) long_name: latitude units: degrees_north _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: current shape = (225, 175) filling off <class 'netCDF4._netCDF4.Variable'> float64 lon(northing, easting) long_name: longitude units: degrees_east _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: current shape = (225, 175) filling off
My goal is to filter the SMI values (which also contain information on time, northing, easting) and transform them into a dataframe.
My current code is the following:
import os
from matplotlib import pyplot as plt
import pandas as pd
import netCDF4
import numpy as np
import xarray as xr
# Define directory
os.chdir('C:/Users/Documents/Project/ClimateRisks')
dp = xr.open_dataset('SMI_Oberboden.nc')
dp = dp.SMI
m2 = dp.to_dataframe()
m2 = m2.dropna()
print(m2.head(15))
The outcome is the following:
SMI
time northing easting
1951-01-16 5238000 4360000 0.445849
4364000 0.473440
4368000 0.309218
5242000 4364000 0.365326
4368000 0.426184
4372000 0.344188
4376000 0.284556
5246000 4364000 0.390772
4368000 0.521810
4372000 0.586828
4376000 0.344797
4380000 0.394820
5250000 4356000 0.470163
4360000 0.619951
4364000 0.540267
The issue is that the final dataframe is only one column (SMI), while the rest (time, northing, easting) is not considered as columns. My goal is to have four columns (time, northing, easting, SMI) ultimately. As I am still new to this, I would really appreciate your help.