2
votes

I put some log files into sql tables through Spark and my schema looks like this:

|-- timestamp: timestamp (nullable = true) 
|-- c_ip: string (nullable = true) 
|-- cs_username: string (nullable = true) 
|-- s_ip: string (nullable = true) 
|-- s_port: string (nullable = true) 
|-- cs_method: string (nullable = true) 
|-- cs_uri_stem: string (nullable = true) 
|-- cs_query: string (nullable = true) 
|-- sc_status: integer (nullable = false) 
|-- sc_bytes: integer (nullable = false) 
|-- cs_bytes: integer (nullable = false) 
|-- time_taken: integer (nullable = false) 
|-- User_Agent: string (nullable = true) 
|-- Referrer: string (nullable = true) 

As you can notice I created a timestamp field which I read is supported by Spark (Date wouldn't work as far as I understood). I would love to use for queries like "where timestamp>(2012-10-08 16:10:36.0)" but when I run it I keep getting errors. I tried these 2 following sintax forms: For the second one I parse a string so Im sure Im actually pass it in a timestamp format. I use 2 functions: parse and date2timestamp.

Any hint on how I should handle timestamp values?

Thanks!

1) scala> sqlContext.sql("SELECT * FROM Logs as l where l.timestamp=(2012-10-08 16:10:36.0)").collect

java.lang.RuntimeException: [1.55] failure: ``)'' expected but 16 found 

SELECT * FROM Logs as l where l.timestamp=(2012-10-08 16:10:36.0) 
                                                  ^ 

2) sqlContext.sql("SELECT * FROM Logs as l where l.timestamp="+date2timestamp(formatTime3.parse("2012-10-08 16:10:36.0"))).collect

java.lang.RuntimeException: [1.54] failure: ``UNION'' expected but 16 found 

SELECT * FROM Logs as l where l.timestamp=2012-10-08 16:10:36.0 
                                                 ^ 
4

4 Answers

6
votes

I figured that the problem was the precision of the timestamp first of all and also the string that I pass representing the timestamp has to be casted as a String

So this query works now:

sqlContext.sql("SELECT * FROM Logs as l where cast(l.timestampLog as String) <= '2012-10-08 16:10:36'")
5
votes

You forgot the quotation marks.

Try something with this syntax:

L.timestamp = '2012-07-16 00:00:00'

Alternatively, try

L.timestamp = CAST('2012-07-16 00:00:00' AS TIMESTAMP)
2
votes

Cast the string representation of the timestamp to timestamp. cast('2012-10-10 12:00:00' as timestamp) Then you can do comparison as timestamps, not strings. Instead of:

sqlContext.sql("SELECT * FROM Logs as l where cast(l.timestamp as String) <= '2012-10-08 16:10:36'")

try

sqlContext.sql("SELECT * FROM Logs as l where l.timestamp <= cast('2012-10-08 16:10:36' as timestamp)")
0
votes

Sadly this didn't work for me. I am using Apache Spark 1.4.1. The following code is my solution:

Date date = new Date();

String query = "SELECT * FROM Logs as l where l.timestampLog <= CAST('" + new java.sql.Timestamp(date.getTime()) + "' as TIMESTAMP)";

sqlContext.sql(query);

Casting the timestampLog as string did not throw any errors but returned no data.