1
votes

I have a PostgreSQL database that stores species occurrence data. One of the fields is a true date field that indicates the date of an occurrence.

However, especially with historic data, we do not have precise information. Sometimes, for instance, it will only contain the month or year. Others, it may be a season and the year. Etc.

In the past, we handled this with a "dummy" date and additional remarks in other fields. For instance, for March 2010, we may default to 2010-03-01.

However, one of our biologists has raised some concerns over this. She doesn't like the dummy data at all.

I have searched high and low to see if Postgres would allow entering only a month and year or just a year in a date field. Conversely, whether it could have and out-of-range dates to flag no data, e.g. 2010-03 to only capture month and year or 2010 for just year or 2010-03-00 to indicate the day is missing. I am coming up empty. (This comes close: How to store dates with different levels of precision in postgres?)

Any suggestions? Am I overlooking something? Has this been addressed elsewhere before (if so, I didn't find it)?

2
What are the concerns that were raised about the current data?GMB
Yeah, if the information is not there then I'm not sure that any system is going to improve that. It comes down to taking the information you have or if it is not complete(whatever that is) then declaring it NULL. At some point someone is going to have to decide how to slot the data.Adrian Klaver

2 Answers

3
votes

You could convert the column to the daterange type. For a precise date, both bounds of the range would be the same date. For a month, the low bound could be the 1st of the month, and the upper bound would be the last of the month.

More info here: https://www.postgresql.org/docs/current/rangetypes.html

1
votes

Postgres is only going to allow valid date/datetimes in date/timestamp fields. So something like 2010-03-00 will not work. One solution would be to use a text field and do date conversion on the fly. Another would be to do what you are doing now and have another field(varchar) that serves as a template for the date. Something like:

occurrence_fld     date_template
2020-06-21         season:summer
2020-08-01         format:yyyy-mm
2020-08-31         format:yyyy-mm-dd

You could then create a function that uses the template to create dates based on the tag(season, format). As an example of a query that uses the 'format' template:

select to_char('2020-08-01'::date, split_part('format:yyyy-mm', ':', 2));
 to_char 
---------
 2020-08

select to_char('2020-08-31'::date, split_part('format:yyyy-mm-dd', ':', 2));
  to_char   
------------
 2020-08-31