0
votes

I am trying to pass data into a python script in Data Lake Analytics.

I've stripped this back to show the error clearly. I understand the python isn't actually doing anything... :-)

I have a very simple table

@FormattedCasinoData =
    SELECT int.Parse(UserID) AS [UserID],
           int.Parse(ModelID) AS [ModelID],
           float.Parse(Value) AS [Value]
    FROM @CasinoData
    WHERE UserID != "UserID"
    ORDER BY UserID
    FETCH 1000 ROWS;

So the table format is int, int, float.

When i try to run this

REFERENCE ASSEMBLY [ExtPython];

DECLARE @myScript = @"
def usqlml_main(df):
    return df
";

@pythonOutput  =
    REDUCE @FormattedCasinoData ON [UserID]
    PRODUCE [UserID] int, [ModelID] int, [Value] float
    USING new Extension.Python.Reducer(pyScript:@myScript);

OUTPUT @pythonOutput
  TO @"adl://mydatalake.azuredatalakestore.net/myFolder/PythonOutput20171208.csv"
  USING Outputters.Csv();

I get the following error:

"Python returned dataframe schema (System.Int32, System.Int32, System.Double) does match U-SQL schema (System.Int32, System.Int32, System.Single)"

Any idea why the U-SQL schema is expecting System.Single for the third column, when i have explicitly defined "float" in the output.

Thanks!

1

1 Answers

1
votes

Sorry for the late reply. This must have slipped through.

In C#, float is a synonym to System.Single (see https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/float).

You should specify double as your target type in your reducer's schema.