5
votes

I'm trying to store some tweets in Cassandra Database using Python and DataStax driver ( Python -> Cassandra ).

Everything works well, but there's something that I can't understand. How to insert a row without null value ?

As example,

CREATE TABLE tweets (

id_tweet text PRIMARY KEY,
texttweet text,
hashtag text,
url text,
)

If I want to insert a row without url value, it's working but in Cassandra I'll see "null" in url column.

I check this doc :

http://datastax.github.io/python-driver/getting_started.html#passing-parameters-to-cql-queries

So I tried 2 differents ways :

First one, I create the String as a full String, and execute it.

requete = "insert into Tweets(id_tweet,texttweet,hashtag,url) values ('%s','%s','%s','%s')"%(id_tweet,texttweet,hashtag,url)
session.execute(requete)

Or
I send parameters in the execute function.

requete2 = "insert into Tweets(id_tweet,texttweet,hashtag,url) values ('%s','%s','%s','%s')"
session.execute(requete2,(id_tweet,id_texttweet,hashtag,url))

Problem is, the 2differents ways give me null value if i get no URL or Hashtag in my tweet as example.

Is it possible to not see the column if it's empty in a row, like I see in lot of tutorials ?

enter image description here

Thanks.

1

1 Answers

11
votes

This is something you can do if you are using Cassandra 2.2 or later. In Cassandra 2.2 the concept of 'UNSET' was introduced. This allows you to use the same statement to insert a row, even if you don't want to provide some of the values, here's how you would do it:

from cassandra.query import UNSET_VALUE
ps = session.prepare("insert into tweets(id_tweet,texttweet,hashtag,url) values (?,?,?,?)")
session.execute(ps, ("id", "hello world!", UNSET_VALUE, UNSET_VALUE));

This would indicate to cassandra that you don't want to insert these values as null, rather they should be ommitted all together so no 'null' values (internally these are tombstones) are inserted into cassandra.

On your side, I think you would need to do some preprocessing logic to convert any incoming None values into UNSET_VALUE. The pre 2.2 solution would be to adjust your query based on what columns are absent, i.e insert into tweets(id_tweet,texttweet) values (?,?) if hashtag and url are None.

On the retrieval end, there should technically be away to distinguish between null and unset values (I'll look into this), but I don't think such a mechanism exists in the python driver. I'll open up a ticket if its possible to do in the protocol but the feature isn't present in the driver. EDIT: It doesn't look like cassandra differentiates between values that were explicitly set to null (which are marked internally as tombstones) and those that were never set when returning data.

You can read more about 'UNSET' and other 2.2 features in the python driver in this blog post.