6
votes

I am storing data on journal papers from PubMed in Postgres, and I want to store the publication date of the paper. However, sometimes the date is just a year (see the PubDate field), sometimes it is a month and a year, and sometimes it is a day, month and year.

What's a sensible way to store this data in my database?

If I use a Postgres date field, I obviously need to specify day and month as well, but I'll need to invent them in some cases, and I don't like the idea of losing the information about the date's precision.

For my purposes I think I will only need the year, but I don't know that to be true for all possible future purposes.

Perhaps I should just have text fields for day, month and year, and then I can always convert them to a date field in the future if I need them?

2
Yes, your suggestion at the end seems very reasonable to me. You could also add a CHECK constraint to check that if day is not null, then month is not null too. An alternative would be to create a date column and a second column to indicate the accuracy of the date column.redneb

2 Answers

4
votes

I would store it as date and store the precision as well.

For example:

CREATE TYPE date_prec AS ENUM ('day', 'month', 'year');

CREATE TABLE pub (
   pub_id integer PRIMARY KEY,
   pub_date date NOT NULL,
   pub_date_prec date_prec NOT NULL
);

Then you can query the table like this:

SELECT pub_id, date_trunc(pub_date_prec::text, pub_date)::date FROM pub;

to ignore any “random” day and month values in pub_date.

0
votes

You can have a Date field and have another field - lets say 'dateflag' of type integer. You have have values such as 1 for use full date, 2 for use date and month, 3 for use year only. This way while retrieving the data you would know what to do. Of course for opt 2 and 3 you will need to put in a dummy date/month besides the year while doing the insert.

Your idea of 3 fields is perfectly valid too - but it will require lots of checking to be sure you maintain the sanctity of the date (Like if day is there - then month has to be there, rules for value of days and months etc).