0
votes

I have two structurally identical tables in a Cassandra database. They are both in the same keyspace. I need to move data from one table to another. I create a standard CSV file using COPY/TO and now I want to upload the content using COPY/FROM inside of another Cassandra table. However, I get the following error:

Failed to import 1926 rows: AttributeError - 'NoneType' object has no attribute 'is_up',  given up after 1 attempts
Exceeded maximum number of insert errors 1000

What did I use?

  1. cqlsh 5.0.1
  2. Cassandra 3.11.2
  3. CQL spec 3.4.4
  4. Native protocol v4

I used such command to create CSV file in my local machine:

COPY "keyspace_1"."table_1" (column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9,column_10,column_11,column_12,column_13,column_14,column_15) TO 'test.csv' WITH delimiter=';' AND header=TRUE;

The command above created the CSV file without any problem. I don't have a lot of data. The first table has only 1926 entries. A sample of the first 5 lines of the CSV file used to import:

column_1;column_2;column_3;column_4;column_5;column_6;column_7;column_8;column_9;column_10;column_11;column_12;column_13;column_14;column_15
a83aaa26-2f0d-11eb-9330-af4bd388f154;a829040d-2f1d-11eb-9a4c-0b934b0a1818;791d6ed2-e5ec-4860-a165-e25b77dcb075;69f2f19a-3647-4719-abea-315fcba0c29b;2020-11-25 12:56:38.676+0000;;False;True;True;Hello!;2020-11-25 12:56:38.676+0000;;;;
a83aaa26-2f0d-11eb-9330-af4bd388f154;ea7d7c94-2f1c-11eb-a27a-0b934b0a1818;c0bc8368-644b-4238-b629-773f7f3163d8;69f2f19a-3647-4719-abea-315fcba0c29b;2020-11-25 12:51:20.466+0000;;False;False;True;dddd;2020-11-25 12:51:20.467+0000;;;;
a83aaa26-2f0d-11eb-9330-af4bd388f154;e702d2d4-2f1c-11eb-ae91-0b934b0a1818;791d6ed2-e5ec-4860-a165-e25b77dcb075;69f2f19a-3647-4719-abea-315fcba0c29b;2020-11-25 12:51:14.625+0000;;True;True;True;d;2020-11-25 12:51:14.625+0000;;;;
a83aaa26-2f0d-11eb-9330-af4bd388f154;e45d01eb-2f1c-11eb-b7a1-0b934b0a1818;791d6ed2-e5ec-4860-a165-e25b77dcb075;69f2f19a-3647-4719-abea-315fcba0c29b;2020-11-25 12:51:10.187+0000;;True;True;True;1;2020-11-25 12:51:10.187+0000;;;;
a83aaa26-2f0d-11eb-9330-af4bd388f154;7da3e5ae-2f0f-11eb-87a2-5120df6c4a8a;791d6ed2-e5ec-4860-a165-e25b77dcb075;69f2f19a-3647-4719-abea-315fcba0c29b;2020-11-25 11:15:14.385+0000;;True;True;True;123;2020-11-25 11:15:14.385+0000;;;;

After that, I run the second command which must upload the content to the second table:

COPY "keyspace_1"."table_2" (column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9,column_10,column_11,column_12,column_13,column_14,column_15) FROM 'test.csv' WITH delimiter=';' AND header=TRUE;

What can be the cause of this problem and how to solve it?

I created the first table with the CQL query below. The second table has the same structure.

create table table_1 (
    column_1 uuid,
    column_2 timeuuid,
    column_3 uuid,
    column_4 uuid,
    column_11 text,
    column_14 uuid,
    column_12 uuid,
    column_13 uuid,
    column_15 text,
    column_10 boolean,
    column_8 boolean,
    column_9 boolean,
    column_5 timestamp,
    column_6 timestamp,
    column_7 timestamp,
    primary key (
        column_1,
        column_2
    )
) with clustering order by (
    column_2 desc
);

EDIT 1:

I used such command in terminal:

dsbulk load -url '/my_path/data.csv' -h '"my_host"' -port my_port -k 'keyspace_1' -t 'table_1' -header true -delim ';' -m '0=column_1,1=column_2,2=column_3'

ERROR message:

[driver] Error connecting to Node(endPoint=my_host/x.xxx.xx.xxx:xxxx, hostId=null, hashCode=7edbe679), trying next node (ConnectionInitException: [driver|control|id: 0x7bfdbb2f, L:/xxx.xxx.x.xx:xxxxx - R:my_host/x.xxx.xx.xxx:xxxx] Protocol initialization request, step 1 (OPTIONS): unexpected failure (com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Lost connection to remote peer))
Operation LOAD_20210211-073148-547063 failed: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=my_host/x.xxx.xx.xxx:xxxx, hostId=null, hashCode=7edbe679): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [driver|control|id: 0x7bfdbb2f, L:/xxx.xxx.x.xx:xxxxx - R:my_host/x.xxx.xx.xxx:xxxx] Protocol initialization request, step 1 (OPTIONS): unexpected failure (com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Lost connection to remote peer)].
   Suppressed: [driver|control|id: 0x7bfdbb2f, L:/xxx.xxx.x.xx:xxxxx - R:my_host/x.xxx.xx.xxx:xxxx] Protocol initialization request, step 1 (OPTIONS): unexpected failure (com.datastax.oss.driver.api.core.connection.ClosedConnectionException: Lost connection to remote peer).
     Caused by: Lost connection to remote peer.
2
Can you provide the schema? or create table statementMikeJPR
Hello! Can you check my post one more time pls? I added the CQL statement that I used to create tables.Nurzhan Nogerbek

2 Answers

2
votes

You will want to download the pem file

curl https://certs.secureserver.net/repository/sf-class2-root.crt -O

Create the truststore

openssl x509 -outform der -in /home/ec2-user/sf-class2-root.crt -out temp_file.der
keytool -import -alias new-cassandra -keystore /home/ec2-user/cassandra_truststore.jks -file temp_file.der -storepass amazon -noprompt

Set the environment variables

ENV javax.net.ssl.trustStore=/home/ec2-user/cassandra_truststore.jks
ENV javax.net.ssl.trustStorePassword=amazon

Configure the driver

dsbulk {
   monitoring.csv = true
   batch.maxBatchStatements=1
   batch.mode=DISABLED
   log.maxErrors=-1
}

datastax-java-driver {
  basic {
   request.consistency = "LOCAL_QUORUM"
   request.default-idempotence = true
   }
  advanced {
      schema.enabled = false
      token-map.enabled = false
      prepared-statements.prepare-on-all-nodes = false
      prepared-statements.reprepare-on-up = false
      advanced.netty.io-group.size = 64

      reconnect-on-init = true
      reconnection-policy {
        base-delay = 100 milliseconds
        max-delay = 1 seconds
      }

      ssl-engine-factory {
         class = DefaultSslEngineFactory
         hostname-validation = false
      }
       connection {
          pool {
             local.size = 3
             remote.size = 1
          }
        }
      }
    }

}

execute the command

dsbulk load -f default-ks-bulk-loader.conf -k aws -t smoketest -url random.csv --driver.basic.contact-points '["cassandra.us-east-1.amazonaws.com:9142"]' -m "0=id,1=event,2=data" -u mike -p pass
1
votes

From error it looks like that there is a problem with connection.

But really, it's better not to use cqlsh's COPY command - it has a lot of problems, not handling corner cases, etc. There is a tool called DSBulk (open source) that is specially written for loading/unloading big amounts of data into/from Cassandra/DSE.

To unload data, just execute:

dsbulk unload -k keyspace1 -t table1 -url path_to_unload

and to load the same data:

dsbulk unload -k keyspace2 -t table2 -url path_to_unload

if the tables are identical, you may not need to specify any additional parameters - everything will be handled automatically.

P.S. DSBulk is heavily optimized for loading/unloading operations, and has a lot of options, including import/export from/to compressed files, providing the custom queries, etc.

There is a series of blog posts about DSBulk, that could provide more information & examples: 1, 2, 3, 4, 5, 6