0
votes

I've connected to a table in BigQuery of reviews left by customers. When I import this data I see some of the values getting weirdly repeated. Below is an example of one of these values getting repeated. In the source table there's only one review that starts with "Worst insurance services". This is just one example, other values get repeated.

enter image description here

The created_datetime_review, id and rating are all correct for the row, it's just the review that's wrong.

The aggregation on all fields is set to "Don't summarize". I've made this in a fresh report so the reviews data is the only table imported and there's no transformations or formula applied at all in DAX or the Query Editor. model

When I look in the Data tab I see the same repeating values repeating values in Data

However in Query Editor if I filter the review column for begins with "Worst insurance services" I get only one row (which is correct). So I can't understand what's changing between then and this table. enter image description here

Anyone know how to stop this duplication? I've tried refreshing the data, making a new workbook, using Direct Query instead Import, all without success.

1

1 Answers

0
votes

I eventually figured out it is caused by the length of text strings being >=512 characters long.

The column limited to 511 characters shows the correct text, but when the limit is 512 characters any values that hit this limit start getting duplicated and overwriting other rows in the column. The amount of other rows, and which ones, that get overwritten seems to be pretty random. enter image description here

I was initially using version 2.71 (July 2019) and have updated to the latest version, 2.75.5649.961 64-bit (November 2019). The bug is present in both.