0
votes

I am trying to mock a dataframe creation in python using MockMagic() but looks like, some part of the code is failing due to unsupported comparison in MagicMock when called from unit test function.

This is my testcase.py code

sys.modules["pyspark.sql"] = MagicMock()

def test_process_batch():
    df = (
        [
            (1, "foo"),
            (2, "bar"),
        ],
        ["id", "label"]
    )
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    spark_df = spark.createDataFrame(df)

    process_batch(spark_df, "123")
    assert True

In the main program, the code for process_batch() contains a line to compare the dataframe count to a number like below.

def process_batch(data_frame, batchId):
    """Process streamed batch dataframe"""

    if (data_frame.count() > 0):
     ...

Unit test failed with following error.

[CPython38-test] =================================== FAILURES ===================================
[CPython38-test] ______________________________ test_process_batch ______________________________
[CPython38-test] 
[CPython38-test]     def test_process_batch():
[CPython38-test]         df = (
[CPython38-test]             [
[CPython38-test]                 (1, "foo"),
[CPython38-test]                 (2, "bar"),
[CPython38-test]             ],
[CPython38-test]             ["id", "label"]
[CPython38-test]         )
[CPython38-test]         from pyspark.sql import SparkSession
[CPython38-test]         spark = SparkSession.builder.getOrCreate()
[CPython38-test]         spark_df = spark.createDataFrame(df)
[CPython38-test]     
[CPython38-test] >       process_batch(spark_df, "123")
[CPython38-test] 
[CPython38-test] test/test_cia_optics_ingestion_glue_spark_streaming.py:54: 
[CPython38-test] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[CPython38-test] 
[CPython38-test] data_frame = <MagicMock name='mock.SparkSession.builder.getOrCreate().createDataFrame()' id='140188327287056'>
[CPython38-test] batchId = '123'
[CPython38-test] 
[CPython38-test]     def process_batch(data_frame, batchId):
[CPython38-test]         """Process streamed batch dataframe"""
[CPython38-test]     
[CPython38-test] >       if (data_frame.count() > 0):
[CPython38-test] E       TypeError: '>' not supported between instances of 'MagicMock' and 'int'
[CPython38-test] 

Can you please guide me how to overcome this situation ?