1
votes

My question is similar to this unanswered question: SQLAlchemy commits makes float to be rounded

I have a text file of data that looks like this:

#file camera date mjd focus error
ibcy02blq UVIS1 08/03/09   55046.196630   0.57857   0.55440
ibcy02bnq UVIS1 08/03/09   55046.198330  -0.15000   0.42111
ibcy03j8q UVIS1 08/11/09   55054.041650  -0.37143   0.40802
ibcy03jaq UVIS1 08/11/09   55054.043350  -0.91857   0.51859
ibcy04m4q UVIS1 08/18/09   55061.154900  -0.32333   0.52327
ibcy04m6q UVIS1 08/18/09   55061.156600  -0.24867   0.66651
ibcy05b7q UVIS1 09/05/09   55079.912670   0.64900   0.58423
ibcy05b9q UVIS1 09/05/09   55079.914370   0.82000   0.50202
ibcy06meq UVIS1 10/02/09   55106.909840  -0.09667   0.24016

But once I read it into my MySQL database it looks like this:

+------+-----------+--------+------------+---------+----------+
| id   | filename  | camera | date       | mjd     | focus    |
+------+-----------+--------+------------+---------+----------+
| 1026 | ibcy02blq | UVIS1  | 2009-08-03 | 55046.2 |  0.57857 |
| 1027 | ibcy02bnq | UVIS1  | 2009-08-03 | 55046.2 |    -0.15 |
| 1028 | ibcy03j8q | UVIS1  | 2009-08-11 |   55054 | -0.37143 |
| 1029 | ibcy03jaq | UVIS1  | 2009-08-11 |   55054 | -0.91857 |
| 1030 | ibcy04m4q | UVIS1  | 2009-08-18 | 55061.2 | -0.32333 |
| 1031 | ibcy04m6q | UVIS1  | 2009-08-18 | 55061.2 | -0.24867 |
| 1032 | ibcy05b7q | UVIS1  | 2009-09-05 | 55079.9 |    0.649 |
| 1033 | ibcy05b9q | UVIS1  | 2009-09-05 | 55079.9 |     0.82 |
| 1034 | ibcy06meq | UVIS1  | 2009-10-02 | 55106.9 | -0.09667 |
| 1035 | ibcy06mgq | UVIS1  | 2009-10-02 | 55106.9 |  -0.1425 |
+------+-----------+--------+------------+---------+----------+

The mjd column is being truncated and I'm not sure why. I understand that there are floating point precision errors for something like 1/3 but this looks more like some type of rounding is being implemented.

Here is the code I use to ingest the data into the database:

def make_focus_table_main():
    """The main controller for the make_focus_table 
    module."""
    logging.info('Process Starting')
    filename_list = glob.glob('/grp/hst/OTA/focus/source/FocusModel/UVIS*FocusHistory.txt')
    logging.info('Found {} files'.format(len(filename_list)))
    for filename in filename_list:
        logging.info('Reading data from {}'.format(filename))
        output_list = []
        with open(filename, 'r') as f:
            data = f.readlines()
            for line in data[1:]:
                line = line.split()
                output_dict = {}
                output_dict['filename'] = line[0]
                output_dict['camera'] = line[1]
                output_dict['date'] = datetime.strptime(line[2], '%m/%d/%y')
                output_dict['mjd'] = float(line[3])
                output_dict['focus'] = float(line[4])
                output_list.append(output_dict)
        logging.info('Beginning bulk insert of records.')
        engine.execute(Focus.__table__.insert(), output_list)
        logging.info('Database insert complete.')
    logging.info('Process Complete')

I've used pdb to check that the values are not being truncated prior to being passed to the database (i.e. Python/SQLAlchemy is not performing the rounding). I can verify this in the INSERT command SQLAlchemy issues:

2014-04-11 13:08:20,522 INFO sqlalchemy.engine.base.Engine INSERT INTO focus (filename, camera, date, mjd, focus) VALUES (%s, %s, %s, %s, %s)
2014-04-11 13:08:20,602 INFO sqlalchemy.engine.base.Engine (
    ('ibcy02blq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19663, 1.05778), 
    ('ibcy02bnq', 'UVIS2', datetime.datetime(2009, 8, 3, 0, 0), 55046.19833, 1.32333), 
    ('ibcy03j8q', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04165, 1.57333), 
    ('ibcy03jaq', 'UVIS2', datetime.datetime(2009, 8, 11, 0, 0), 55054.04335, 0.54333), 
    ('ibcy04m4q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1549, -1.152), 
    ('ibcy04m6q', 'UVIS2', datetime.datetime(2009, 8, 18, 0, 0), 55061.1566, -1.20733), 
    ('ibcy05b7q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91267, 2.35905), 
    ('ibcy05b9q', 'UVIS2', datetime.datetime(2009, 9, 5, 0, 0), 55079.91437, 1.84524)  
    ... displaying 10 of 1025 total bound parameter sets ...  
    ('ichl05qwq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05103, -2.98), 
    ('ichl05qxq', 'UVIS2', datetime.datetime(2014, 4, 2, 0, 0), 56749.05177, -3.07))
2014-04-11 13:08:20,959 INFO sqlalchemy.engine.base.Engine COMMIT

Here is how the column is defined in my SQLAlchemy classes:

class Focus(Base):
    """ORM for the table storing the focus measurement information."""
    __tablename__ = 'focus'
    id = Column(Integer(), primary_key=True)
    filename = Column(String(17), index=True, nullable=False)
    camera = Column(String(5), index=True, nullable=False)
    date = Column(Date(), index=True, nullable=False)
    mjd = Column(Float(precision=20, scale=10), index=True, nullable=False)
    focus = Column(Float(15), nullable=False)
    __table_args__ = (UniqueConstraint('filename', 'camera', 
                      name='focus_uniqueness_constraint'),)

Here is the SQL that's logged from SQLAlchemy with echo=True when I create the table:

CREATE TABLE focus (
    id INTEGER NOT NULL AUTO_INCREMENT, 
    filename VARCHAR(17) NOT NULL, 
    camera VARCHAR(5) NOT NULL, 
    date DATE NOT NULL, 
    mjd FLOAT(20) NOT NULL, 
    focus FLOAT(15) NOT NULL, 
    PRIMARY KEY (id), 
    CONSTRAINT focus_uniqueness_constraint UNIQUE (filename, camera)
)

So far, so good. But here's what I see MySQL with a SHOW CREATE TABLE focus;:

CREATE TABLE `focus` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `filename` varchar(17) NOT NULL,
  `camera` varchar(5) NOT NULL,
  `date` date NOT NULL,
  `mjd` float NOT NULL,
  `focus` float NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `focus_uniqueness_constraint` (`filename`,`camera`),
  KEY `ix_focus_filename` (`filename`),
  KEY `ix_focus_mjd` (`mjd`),
  KEY `ix_focus_date` (`date`),
  KEY `ix_focus_camera` (`camera`)
) ENGINE=InnoDB AUTO_INCREMENT=1193 DEFAULT CHARSET=latin1

Somehow the FLOAT definition changed! Is this some type of MySQL configuration setting? I'm just running this on my local host right now, but if this is a configuration setting I'm concerned about the portability of this code onto a production server if I continue to use floats. I could just switch to a decimal column type as I've seen in other SO questions since I need exact values but I would like to understand what's going on here.


Update: Just to expand a little on two-bit-alchemist's answer, here is how it changes my query:

> SELECT ROUND(mjd,10) FROM focus LIMIT 10;
+------------------+
| ROUND(mjd,10)    |
+------------------+
| 55046.1953125000 |
| 55046.1992187500 |
| 55054.0429687500 |
| 55054.0429687500 |
| 55061.1562500000 |
| 55061.1562500000 |
| 55079.9140625000 |
| 55079.9140625000 |
| 55106.9101562500 |
| 55106.9101562500 |
+------------------+
10 rows in set (0.00 sec)

Notice that all the decimal precision is still there. I had no idea SELECT was rounding values but I guess this makes sense if you think about how a floating point representation works. It uses the full bytes allocated for that number, how many decimals you display is arbitrary up to the full length of the float:https://stackoverflow.com/a/20482699/1216837

Specifying the precision only seems to affect if it's stored as a double or a single: http://dev.mysql.com/doc/refman/5.0/en/floating-point-types.html.

But, what's also interesting/annoying is that I have to worry about this same thing when issuing a SELECT from the SQLAlchemy layer:

query = psf_session.query(Focus).first()
print query.filename, query.mjd, query.focus

Gives me bcy02blq 55046.2 1.05778 so the values are still being rounded. Again, this makes sense because SQLAlchemy is just issuing SQL commands anyway. All in all this is motivating me to switch to a DECIMAL column type: http://dev.mysql.com/doc/refman/5.0/en/fixed-point-types.html

1
Are you sure the definition changed? I'm not familiar enough with MySQL's implementation to know by just looking at this, but I wouldn't assume from this that it had changed. I assume that MySQL for whatever reason only gives you 6 significant digits when you SELECT a float column, and to get a different value you need to ROUND to a different number of digits.Two-Bit Alchemist
How do you "read" the data into the database? Using SA? Mind sharing the code? Or enable SQL statement loggging (engine.echo = True) when doing so in order to see what INSERT statements are generated.van
@van, please see my edits.ACV
@Two-BitAlchemist, Sorry, I'm not following you. Are you saying this is just a "printing" issue and I can use ROUND to see additional digits?ACV
@ACV It's just an educated guess (I do not use MySQL personally lately) but it appears that all your values were printed with exactly six digits (except two where the sixth was .0 so it was left off), and I think based on your CREATE TABLE statement the internal representation should be quite a bit longer. I'm wondering if you would get different results using, e.g., ROUND(mjd, 3).Two-Bit Alchemist

1 Answers

2
votes

It looks like all your values were printed with exactly six digits (except where .0 was left off in a couple of places). While I can't find any documentation on this, I suspect this is simply a default MySQL behavior for displaying float values in the context of a SELECT statement.

Based on the CREATE TABLE statement you provided, the internal representation is correct, so you need only add something like ROUND(mjd, 3) to your statement, with the first argument being the fields to round and the last being the number of digits to round to (which can be longer than what is displaying now).