0
votes

I am currently trying to transform a netcdf file into a dataframe. I am using the following data: https://www.ufz.de/export/data/2/248980_SMI_SM_L02_Oberboden_monatlich_1951-2020_inv.nc

The file contains the following information:

<class 'netCDF4._netCDF4.Variable'> int32 time(time) standard_name: time long_name: time units: days since 1951-01-16 00:00:00 calendar: standard axis: T unlimited dimensions: time current shape = (840,) filling off <class 'netCDF4._netCDF4.Variable'> int32 easting(easting) axis: X unlimited dimensions: current shape = (175,) filling off <class 'netCDF4._netCDF4.Variable'> int32 northing(northing) axis: Y unlimited dimensions: current shape = (225,) filling off <class 'netCDF4._netCDF4.Variable'> float32 SMI(time, northing, easting) long_name: soil moisture index units: - _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: time current shape = (840, 225, 175) filling off <class 'netCDF4._netCDF4.Variable'> float64 lat(northing, easting) long_name: latitude units: degrees_north _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: current shape = (225, 175) filling off <class 'netCDF4._netCDF4.Variable'> float64 lon(northing, easting) long_name: longitude units: degrees_east _FillValue: -9999.0 missing_value: -9999.0 unlimited dimensions: current shape = (225, 175) filling off

My goal is to filter the SMI values (which also contain information on time, northing, easting) and transform them into a dataframe.

My current code is the following:

import os
from matplotlib import pyplot as plt
import pandas as pd
import netCDF4
import numpy as np
import xarray as xr

# Define directory
os.chdir('C:/Users/Documents/Project/ClimateRisks')

dp = xr.open_dataset('SMI_Oberboden.nc')
dp = dp.SMI


m2 = dp.to_dataframe()
m2 = m2.dropna()
print(m2.head(15))

The outcome is the following:

                                  SMI
time       northing easting          
1951-01-16 5238000  4360000  0.445849
                    4364000  0.473440
                    4368000  0.309218
           5242000  4364000  0.365326
                    4368000  0.426184
                    4372000  0.344188
                    4376000  0.284556
           5246000  4364000  0.390772
                    4368000  0.521810
                    4372000  0.586828
                    4376000  0.344797
                    4380000  0.394820
           5250000  4356000  0.470163
                    4360000  0.619951
                    4364000  0.540267

The issue is that the final dataframe is only one column (SMI), while the rest (time, northing, easting) is not considered as columns. My goal is to have four columns (time, northing, easting, SMI) ultimately. As I am still new to this, I would really appreciate your help.

1

1 Answers

1
votes

You just need to reset the index. So change the second last line to:

m2 = m2.dropna().reset_index()