42
votes

I am trying to load a simple text file instead of standard input in Kafka. After downloading Kafka, I performed the following steps:

Started zookeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties

Started Server

bin/kafka-server-start.sh config/server.properties

Created a topic named "test":

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Ran the Producer:

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 
Test1
Test2

Listened by the Consumer:

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Test1
Test2

Instead of Standard input, I want to pass a data file or even a simple text file to the Producer which can be seen directly by the Consumer. Any help would really be appreciated. Thanks!

4

4 Answers

92
votes

You can pipe it in:

kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
--new-producer < my_file.txt

Found here.

From 0.9.0:

kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic < my_file.txt
11
votes
$ kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic < my_file.txt

worked for me in Kafka-0.9.0

7
votes

Here are few ways which are little more generalised but may be overkill for simple file

tail

tail -n0 -F my_file.txt | kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic

Explanation

  1. tail reads from the end of the file as it grows or logs are being added to it continuously
  2. -n0 indicates outputlast 0 lines so only new line is selected
  3. -F follows the file by name instead the descriptor, hence it works even if it is rotated

syslog-ng

options {                                                                                                                             
    flush_lines (0);                                                                                                                
    time_reopen (10);                                                                                                               
    log_fifo_size (1000);                                                                                                          
    long_hostnames (off);                                                                                                           
    use_dns (no);                                                                                                                   
    use_fqdn (no);                                                                                                                  
    create_dirs (no);                                                                                                               
    keep_hostname (no);                                                                                                             
};

source s_file {
    file("path to my-file.txt" flags(no-parse));
}


destination loghost {
    tcp("*.*.*.*" port(5140));
} 

consuming

nc -k -l 5140 | kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic

Explanation(from man nc)

-k' Forces nc to stay listening for another connection after its current connection is completed. It is an error to use this option without the -l option.

-l' Used to specify that nc should listen for an incoming connection rather than initiate a connection to a remote host. It is an error to use this option in conjunction with the -p, -s, or -z options. Additionally, any timeouts specified with the -w option are ignored.

Ref

Syslog-ng

1
votes
echo "Hello" | kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic