2
votes

There is a controversial discussion of Repositories, their use and layout on stackoverflow and throughout the web. I am confused about how to implement the data access abstraction (e.g. database) behind a Repository efficiently.

I am not using an ORM tool/framework as I want to see the gritty details myself. At the moment I am using DAO objects to access a (mysql) database and provide Business Objects (domain objects). Associations given by foreign keys in the database tables are resolved and loaded within the DAO of the respective object (no lazy loading currently). As I don't want to use my database DAOs directly in the business logic I considered a Repository to be a good further abstraction. I got stuck when implementing sophisticated queries like GetEmployeesByShopAndPosition() in the Repository: I see two possibilities to implement this:

  1. Brute Force: Use the Employee DAO and load all employees as business objects (including associated shops/positions) from the database into the Repository's collection. Iterate through collection and return those Employees working in the given Shop and Position.
  2. Efficient: Implement a database query that joins the concerned tables and returns only wanted Employees by where clause in the EmployeeDAO.

The first approach uses the collection nature a Repository should actually have, but seems to be very inefficient. The second approach generates a bloated DAO but is much more efficient.

My questions:

  1. What is to prefer here or how is it done in practice?
  2. Am I wrong and the Repository should not be used in conjunction with DAOs and the database related code can go directly in the Repository?
  3. As a Repository deals with aggregates, should it actually assemble the associated foreign keys to build (full) Business Objects instead of the DAOs I am currently using?

I know this topic is not black/white as the involved design patterns can also be implemented in different ways, but I guess there are some guidelines that should not be broken or confused to get seperation of concerns and Persistence Ignorance (PI).

2
While I understand the desire to avoid using an ORM tool/framework, I do think you are making things much harder on yourself by not looking at their documentation and examples. PHP's Doctrine 2 ORM (doctrine-orm.readthedocs.org/en/latest/tutorials/…) is easy to understand and should answer most of your questions.Cerad

2 Answers

6
votes

You are actually asking quite a number of questions here so I'll try to keep the answers as terse as possible :)

Repositories return an Aggregate Root or an Entity. Some are quite adamant that repositories only return ARs and that is fine and will always suffice.

There are two types of repositories (as nicely descibed by Vaughn Vernon in his Implementing Domain-Driven Design book):

  • collection-oriented
  • persistence-oriented

You'll probably come across and use the persistence-oriented more often. This is probably where the confusion comes in w.r.t. DAO. A DAO may, of course, return a business object but it is probably going to return more than that.

Your query example is where a DAO may be more appropriate. So in the domain-driven design field you'll come across CQRS (command/query responsibility segregation) quite often. It boils down to not querying your domain.

You should have a thin, dedicated, query layer that returns results in the most appropriate format (but not entities). In c# I use things like DataTable, DataRow, string and sometimes a complex DTO if required.

A repository is only concerned with ARs, e.g.:

  • Get
  • Add
  • Remove

Repositories basically use a logical DAO of sorts (ADO.NET, ORM --- I try to avoid ORMs).

The second bit about retrieving an AR with the associated foreign keys: an AR should never contain a reference to another AR. Entities and Value Objects are fine. For associated ARs either use an ID or a Value Object to represent the foreign AR. The AR may consist of a complex structure itself but do not confuse ownership with containment. An OrderLine is contained in an Order. A Customer owns an Order. So Order will have a OrderLine collection but not a reference to a Customer object (rather ID/VO).

The Order/OrderLine example illustrates why we wouldn't query the domain. When we want a list of orders between a given start and end date we are probably not interested in all the order data and certainly not in the order lines. So no sense in loading these aggregates. This is where, when querying the domain, nasty things like lazy-loading creep in. Lazy-loading should not exist IMHO :) --- a simple query layer would suffice here.

Hope that helps.

1
votes
  1. Use the power of the database whenever you can. That's what an ORM would try to do -- get as much filtering, sorting, etc. as possible inside the SQL query.

  2. I see little value in having both Repositories and DAO's. They both abstract out the persistent storage. If you want to go without an ORM, you'll typically handle the database query generation part in the concrete Repository implementation. But in a collaborative application, this also leaves other complex things to implement, such as change tracking, transaction management, etc.

  3. Full in the sense of one whole Aggregate, yes. But as a modelling good practice, try not to store (and thus rehydrate) references to other Aggregate Roots in an Aggregate.

That's from a "classical" DDD perspective, but if you want to go the CQRS route (some will say it's the only sensible one ;), definitely follow @EbenRoux's advice.