0
votes

I have an application in CakePHP 3.5.13. I have baked a legacy database which has not been written according to Cake's naming conventions.

One part of the application has a table of 255,693 rows called substances. There are associated CAS numbers which have been put in a table called cas and a mapping between these 2 tables called cas_substances.

I am attempting to use CakePHP's ORM to write a query that searches for a given CAS.

I can't seem to get the query I want to execute written in the ORM, even though the MySQL equivalent for it is pretty simple. Let's say I was searching for all substance ID's which have a CAS that contained "1234" the query would look like this in MySQL:

SELECT DISTINCT( s.id ) FROM substances s 
JOIN cas AS cas 
ON ( (cas.value LIKE '%1234%') ) 
JOIN cas_substances AS cassub 
ON (s.id = cassub.substance_id AND cassub.cas_id = cas.id)

Running this directly on the database (through Navicat) gives me 63 rows in 0.39 seconds - expected.

So in attempting to write this in Cake, I have configured my Table classes as follows:

// src/Model/Table/CasTable.php
public function initialize(array $config)
{
    parent::initialize($config);

    $this->setTable('cas');
    $this->setDisplayField('value');
    $this->setPrimaryKey('id');

    $this->belongsToMany('Substances', [
        'foreignKey' => 'cas_id',
        'targetForeignKey' => 'substance_id',
        'joinTable' => 'cas_substances'
    ]);
}


// src/Model/Table/CasSubstancesTable.php
public function initialize(array $config)
{
    parent::initialize($config);

    $this->setTable('cas_substances');
    $this->setDisplayField('id');
    $this->setPrimaryKey('id');

    $this->belongsTo('Cas', [
        'foreignKey' => 'cas_id',
        'joinType' => 'INNER'
    ]);
    $this->belongsTo('Substances', [
        'foreignKey' => 'substance_id',
        'joinType' => 'INNER'
    ]);
}


// src/Model/Table/SubstancesTable.php
public function initialize(array $config)
{
    parent::initialize($config);

    $this->setTable('substances');
    $this->setDisplayField('name');
    $this->setPrimaryKey('id');

    $this->belongsToMany('Cas', [
        'foreignKey' => 'substance_id',
        'targetForeignKey' => 'cas_id',
        'joinTable' => 'cas_substances'
    ]);
    // ...
 }

Then in the Controller I am attempting to get the distinct (MySQL equivalent DISTINCT()) substances.id:

// Begin the query
$query = $Substances->find()->select(['id' => 'id'])->distinct();

Then modify the query to filter for my CAS:

$query = $query->contain('Cas', function ($q) {
    return $q->where(['Cas.value' => '%'.$this->request->getData('cas_number').'%']);
});

When I attempt to output the results using debug($query->all()) it is giving me a PHP fatal error:

Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes)

On closer inspection, it appears that my condition to filter the query based on CAS is not being applied. If I do debug($query->all()->count()) it is giving me 255,693 - the whole substances table without any filtering.

There are a few problems I have:

  1. How do I write this query to filter the associated data? My work here is based on the Passing Conditions to Contain part of the documentation.

  2. I'm concerned about how much data is being returned. If I run the MySQL equivalent of that query, it is giving me back just substances.id which is what I want. Cake is producing large objects - I know this is because of how the ORM works - but surely there are memory implications here? I need to write the results of my query to another table. How is using the ORM better (or easier) than just writing vanilla SQL then doing CREATE TABLE tmp_table AS . $sql_select_string (where $sql_select_string is the SELECT statement given earlier)?

1
why do you put the conditions in the on clause instead of putting in the where clause? Did you run your query directly in the DB? How much time does it takes?arilia
I'm not sure what you mean. I'm trying to follow the documentation. The first example under the linked section mentions "filter them by conditions" so I figured that would be an appropriate thing to try? The original query (in plain MySQL) runs in 0.39 seconds and returns 63 rows. The Cake one doesn't run at all because PHP runs in to a Fatal Error due to it using too much memory.Andy
I mean that I have never seen putting the LIKE alone inside the ON clause. I'm not sure but seems that you need to use matching method in your situation where - from what I understand - you want all substances that have al least one CAS number like %1234%arilia
Yes, your assumption about the data I want is correct. I'm unclear why you're referring to an ON clause or where this appears in my code? I have $q->where which I understood to be equivalent to a WHERE SQL condition? If you could post an answer which shows the basics of doing a search on a related table, that would give me a great start. At the moment I can't even get the query to execute due to the memory issue, and possibly the fact it's not even considering my CAS filter condition.Andy
@arilia Basically all I'm trying to do is write the equivalent of my "vanilla" MySQL query in Cake's ORM syntax, and get it to return the data in the same format - just a list of substance ID's.Andy

1 Answers

1
votes

why your code runs out of memory

when you use contain this tells cake to retrieve all the records and their related ones

In other words your code will get 255,693 rows of substances, and for every row its Cas numbers, but only the ones matching the LIKE

Instead you want to retrieve all and only the records that have matching Cas number

one possible solution

Seems you need matching method here

$cas_number = $this->request->getData('cas_number');
$query = $Substances->find()
    ->select(['id' => 'Substances.id'])
    ->distinct()
    ->matching('Cas', function ($q) use ($cas_number) {
        return $q->where([
             'Cas.value LIKE' => '%'.$cas_number.'%'
        ]);
     });

in this way cake joins the three tables and performs the search

Usually this query gives duplicate records and you have to group by to filter them. In this case you are using DISTINCT that will do the job

this will give you a query that sounds like

SELECT DISTINCT Substances.id AS `id` 
FROM substances Substances
INNER JOIN cas_substances CasSubstances 
ON Substances.id = CasSubstances.substance_id 
INNER JOIN cas Cas 
ON (
    Cas.value like %1234% 
    AND Cas.id = CasSubstances.cas_id
)

see the manual here

A simpler solution

since you need just the ids you can simply do

$Substances->Cas->find()
    ->where([
         'Cas.value LIKE' => '%'.$cas_number.'%'
    ])
    ->contain(['CasSubstances'])
    ->select(['id' => 'CasSubstances.substance_id'])
    ->distinct();

this will save you one join