1
votes

I'm planning to use PyYAML for a configuration file. Some of the items in that configuration file are Python tuples of tuples. So, I need a convenient way to represent them. One can represent Python tuples of tuples as follows using PyYAML

print yaml.load("!!python/tuple [ !!python/tuple [1, 2], !!python/tuple [3, 4]]")

However, this is not convenient notation for a long sequence of items. I think it should be possible to define a custom tag, like python/tuple_of_tuples. I.e. something like

yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]")

See my first attempt to define this below, by mimicking how python/tuple is defined, and trying to do similar subclassing. It fails, but gives an idea what I am after, I think. I have a second attempt that works, but is a cheat, since it just calls eval.

If I can't find anything better I'll just use that. However, YAML is intended as a replacement for ConfigObj, which uses INI files, and is considerably less powerful than YAML, and I used the same approach (namely eval) for tuples of tuples. So in that respect it will be no worse.

A proper solution would be most welcome.

I have a couple of comments on my first solution.

  1. I'd have thought that the constructor construct_python_tuple_of_tuples would return the completed structure, but in fact it seems to return an empty structure as follows

    ([], [])
    

    I traced the calls, and there seems to be a lot of complicated stuff happening after construct_python_tuple_of_tuples is called.

    The value that is returned is a tuple of lists of integers, so quite close to the desired result. So, the structure must be completed later.

    The line with

    tuple([tuple(t) for t in x])
    

    was my attempt to coerce the list of tuples to a tuple of tuples, but if I return that from construct_python_tuple_of_tuples, then the resulting call to yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]") is just

    ((),())
    
  2. Not sure what is with the

    yaml.org,2002
    

    Why 2002?

First attempt

import yaml
from yaml.constructor import Constructor

def construct_python_tuple_of_tuples(self, node):
     # Complete content of construct_python_tuple
     # is
     # return tuple(self.construct_sequence(node))

     print "node", node
     x = tuple(self.construct_sequence(node))
     print "x", x
     foo = tuple([tuple(t) for t in x])
     print "foo", foo
     return x

Constructor.construct_python_tuple_of_tuples =
construct_python_tuple_of_tuples

Constructor.add_constructor(
         u'tag:yaml.org,2002:python/tuple_of_tuples',
         Constructor.construct_python_tuple_of_tuples)

y = yaml.load("!!python/tuple_of_tuples [[1,2], [3,4]]")
print "y", y, type(y)
print y[0], type(y[0])
print y[0][0], type(y[0][0])

The results are

node SequenceNode(tag=u'tag:yaml.org,2002:python/tuple_of_tuples',
value=[SequenceNode(tag=u'tag:yaml.org,2002:seq',
value=[ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'1'),
ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'2')]),
SequenceNode(tag=u'tag:yaml.org,2002:seq',
value=[ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'3'),
ScalarNode(tag=u'tag:yaml.org,2002:int', value=u'4')])])

x ([], [])

foo ((), ())

y ([1, 2], [3, 4]) <type 'tuple'>

y[0] [1, 2] <type 'list'>

y[0][0] 1 <type 'int'>

Second attempt

import yaml
from yaml import YAMLObject, Loader, Dumper

class TupleOfTuples(YAMLObject):
    yaml_loader = Loader
    yaml_dumper = Dumper

    yaml_tag = u'!TupleOfTuples'
    #yaml_flow_style = ...

    @classmethod
    def from_yaml(cls, loader, node):
        import ast
        print "node", node
    print "node.value", node.value, type(node.value)
        return ast.literal_eval(node.value)

    @classmethod
    def to_yaml(cls, dumper, data):
        return node

t = yaml.load("!TupleOfTuples ((1, 2), (3, 4))")
print "t", t, type(t)

The results are:

node ScalarNode(tag=u'!TupleOfTuples', value=u'((1, 2), (3, 4))')
node.value ((1, 2), (3, 4)) <type 'unicode'>
t ((1, 2), (3, 4)) <type 'tuple'>
1
Just loading lists as usual, then converting the lists to tuples before passing the configuration on to the rest of the code is not acceptable, I suppose?user395760
@delnan Lists of lists of integers would probably be Ok. Is that something that is easier to do?Faheem Mitha
It's trivial, as YAML supports lists and integers natively ;-)user395760
@delnan So it does. I wonder why it can't handle tuples of tuples as well. Maybe I can manage with lists of lists. I'll try it.Faheem Mitha

1 Answers

2
votes

To start with question 2 first: 2002 was the year this kind of tag was introduced in the Sep 1, 2002 version of the YAML 1.0 draft

Question 1 is more complicated. If you do:

from __future__ import print_function

import yaml

lol = [[1,2], [3,4]]  # list of lists
print(yaml.dump(lol))

you get (A):

[[1, 2], [3, 4]]

But actually this is short for (B):

!!seq [
  !!seq [
    !!int "1",
    !!int "2",
  ],
  !!seq [
    !!int "3",
    !!int "4",
  ],
]

which is short for (C):

!<tag:yaml.org,2002:seq> [
  !<tag:yaml.org,2002:seq> [
    !<tag:yaml.org,2002:int> "1",
    !<tag:yaml.org,2002:int> "2",
  ],
  !<tag:yaml.org,2002:seq> [
    !<tag:yaml.org,2002:int> "3",
    !<tag:yaml.org,2002:int> "4",
  ],
]

A, B and C all load to the original list of list, because the seq(uence) is a built in type.

I don't think that extending the syntax of yaml (with e.g. () indicating a tuple would be a good idea. To minimize tags you reduce your example to:

yaml_in = "!tuple [ !tuple [1, 2], !tuple [3, 4]]"

and add a constructor:

yaml.add_constructor("!tuple", construct_tuple)

but this pushes the problem to creating the construct_tuple function. The one for a sequence (in constructor.py) is:

def construct_yaml_seq(self, node):
    data = []
    yield data
    data.extend(self.construct_sequence(node))

But you cannot just replace the [] in there with () as changing the tuple by extending it will not work (the reason for this two step creation, with a yield, is e.g. to allow circular references in complex types like sequence and mapping).

You should define a Tuple() class that behaves like a list until "locked" (which you would do at the end of the contruction), and from then on it should behave like a tuple (i.e. no more modification). The following does so without subclassing yaml.YAMLObject, so you have to explicitly provide and register the constructor and representer for the Class.

class Tuple(list):

    def _lock(self):
        if hasattr(self, '_is_locked'):
            return
        self._is_locked = True
        self.append = self._append
        self.extend = self._extend

    def _append(self, item):
        raise AttributeError("'Tuple' object has no attribute 'append'")

    def _extend(self, items):
        raise AttributeError("'Tuple' object has no attribute 'extend'")

    def __str__(self):
        return '(' + ', '.join((str(e) for e in self)) + ')'

    # new style class cannot assign something to special method
    def __setitem__(self, key, value):
        if getattr(self, '_is_locked', False):
            raise TypeError("'Tuple' object does not support item assignment")
        list.__setitem__(self, key, value)

    def __delitem__(self, key, value):
        if getattr(self, '_is_locked', False):
            raise TypeError("'Tuple' object does not support item deletion")
        list.__delitem__(self, key, value)

    @staticmethod
    def _construct_tuple(loader, data):
        result = Tuple()
        yield result
        result.extend(loader.construct_sequence(data))
        result._lock()

    @staticmethod
    def _represent_tuple(dumper, node):
        return dumper.represent_sequence("!tuple", node)

# let yaml know how to handle this
yaml.add_constructor("!tuple", Tuple._construct_tuple)
yaml.add_representer(Tuple, Tuple._represent_tuple)

With that in place you can do:

yaml_in = "!tuple [ !tuple [1, 2], !tuple [3, 4]]"
#yaml_in = "!tuple [1, 2]"

data = yaml.load(yaml_in)
print(data)
print(data[1][0])
print(type(data))

to get:

((1, 2), (3, 4))
3
<class '__main__.Tuple'>

This is not a real tuple, but it doesn't allow list-like actions. The following activities all throw the appropriate error:

# test appending to the tuple,
try:
    data.append(Tuple([5, 6]))
except AttributeError:
    pass
else:
    raise NotImplementedError
# test extending the tuple,
try:
    data.extend([5, 6])
except AttributeError:
    pass
else:
    raise NotImplementedError
# test replacement of an item
try:
    data[0] = Tuple([5, 6])
except TypeError:
    pass
else:
    raise NotImplementedError
# test deletion of an item
try:
    del data[0]
except TypeError:
    pass
else:
    raise NotImplementedError

And finally you can do:

print(yaml.dump(data, default_flow_style=True))

for the following output:

!tuple [!tuple [1, 2], !tuple [3, 4]]

If you really want !tuple [[1, 2], [3, 4]] to create a Tuple of Tuples, you can do so by keeping context state in the the Baseloader class of yaml and overriding the method that construct python object from the sequences to Tuples or lists depending on context. That would probably have to be a stack of context states, to allow for nested use of !tuple as well as non-nested use, and some explicit overriding to get lists within tuples when using !!seq as tag.


I might have not checked Tuple() for completeness, and only implemented the restrictions that tuple has compared to list that immediately came to mind.
I tested this with my enhanced version of PyYAML: ruamel.yaml, but this should work the same in PyYAML itself.