2
votes

I am running Apache Airflow 1.8 and trying to add connections via command line interface for a hive Client Wrapper. However trying to run the command

airflow connections -a --conn_id HIVE_CONN2 --conn_uri hive_cli://hiveserver/default

Commandline reports success but the Conn Type is not set correctly in the Airflow UI and connection wont work.

I think the error is related to _ in the uri prefix(scheme). I have confirmed the urlparse function to split the uri doesnt allow for underscores in the models.py.

Other than setting it manually in the UI is there another approach to add connections to Airflow - is this a defect ? Airflow should not use underscores for connection types to avoid this issue.

2
Is there any reason why you need to set the conn type. As far as I can determine, all that setting the type does is show/hide certain fields in the UI. In fact, you can access the connection as if it were a hive_id or not without any effect on your code. - Daniel Lee
Hi Daniel - I am a little confused about what you mean - the connection type is used to determine the type of connection to make / use with one of the Airflow Sensors - in this case its a hive connection using the HivePartitionSensor. Our Airflow reset script creates all the connections and configurations expect those that include an underscore in the conn_uri. The code or DAGS use the conn_id as part of the HivePartitionSensor to check a certain hive partition exists. These checks fail to run if the connection type is not set correctly. - user193616
@DanielLee The conn_type is used by the code that determines which hook to use. See get_hook github.com/apache/incubator-airflow/blob/master/airflow/… - Davos

2 Answers

4
votes

This has been fixed in Airflow 1.9.0 with the addition of some extra arguments to the connections sub command:

airflow connections -a --conn_id hive_cli_test --conn_type hive_cli --conn_host something/something
[2018-08-09 10:28:41,377] {__init__.py:51} INFO - Using executor SequentialExecutor

        Successfully added `conn_id`=hive_cli_test : hive_cli://:@something/something:
2
votes

You're right.

The conn_type is used to determine which hook to use as an interface to an external data source / sink.

conn_type is either extracted from the URI as you've specified correctly above, or from a connection created in the UI (and stored in the connection table in the Meta DB).

In your case, the conn_type is extracted from the supplied URL using the parse_from_uri method in models.py, which sets the conn_type from the scheme returned by the urlparse method. https://github.com/apache/incubator-airflow/blob/master/airflow/models.py

According to https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse the scheme is extracted from the first part of the URI.

And as you found, the urlparse method doesn't return a scheme when there's an underscore in the url before the ://.

e.g. verify this, try variations on this URI with and without the underscore:

from urllib.parse import urlparse
[print(v) for v in urlparse("hive_cli://hiveserver/default")]

It works slightly differently if you use beeline, as it will create a JDBC connection, but if you're not using beeline (I can see you aren't because it would be part of the --conn_extra in the command) then it runs a subprocess.

Following the code, ultimately the hive_cli type is run as a subprocess.Popen, i.e. directly on the airflow machine ( or worker), not via JDBC or some other connection.

https://github.com/apache/incubator-airflow/blob/master/airflow/hooks/hive_hooks.py#L208

So therefore it doesn't really need a URL-type connection string, it's just using that format to shoe-horn into the airflow connections --con-uri option. Since it doesn't get pieced back together as a URL, then the choice to call it hive_cli appears arbitrary, and doesn't work from the airflow cli. This all works when you use the UI because it constructs a connection by specifying that conn_type from the UI form.

It's a bug, the type name should be changed from hive_cli to hivecli, or something else that is descriptive and compatible with urlparse.