0
votes

I have the most typical problem. My new company has an age old oracle database. The tables are defined with primary keys only (no foreign keys or unique keys or ..). Now, the creator of the schema was retired. It is now getting difficult to reverse engineer to know the connection between tables.

To add more to this problem, we identified an sql join between two tables (A,B) columns which have different names. It was found the the column of table A is a foreign key of table B column. These column names are quite different.

My question is, is there a way to generate ERD based on the content of the columns?

For e.g. I would like to search the content across database to see if there are tables with columns having the similar content and suggest that it could be a foreign key to the table. I know this is really bad situation but just to give a try.

There are more than 40 tables as such the relations has to be identified. Doing a manual reverse engineering is time taking and need more efforts.

Just want to know a way if we can identify the relationship between tables in this kind of database efficiently.

The only good part is that the primary keys are not just integers. They are randomly generated identifiers with length above 8 characters. These values are used in other tables for connection

1
SQL Developer Data Modeler can infer relationships based on column names, but I know of no such tool that would interrogate the data itself looking for patterns. You ONLY have 40 tables. Imagine having 400-4000 and consider yourself lucky. - thatjeffsmith
For only 40 tables I'm guessing the best option will be to do it manually table by table. Searching for (or building) an automated solution would probably cost more time. - Geert Bellekens

1 Answers

0
votes

I like to do this kind of thing in 3 steps. First, we build a list of all the tables/columns we want to work with. From your requirements, this is every pair of columns (on different tables) in your schema which are VARCHAR2 with more than 8 characters.

select tc1.table_name t1, tc1.column_name c1,
    tc2.table_name t2, tc2.column_name c2
from user_tab_cols tc1
join user_tab_cols tc2
    on tc2.table_name > tc1.table_name -- only pick 1 of each twin match
where tc1.data_type = 'VARCHAR2' and tc1.data_length > 8
  and tc2.data_type = 'VARCHAR2' and tc2.data_length > 8

Then use that to generate some dynamic SQL

with test_cols as (
  select tc1.table_name t1, tc1.column_name c1,
      tc2.table_name t2, tc2.column_name c2
  from user_tab_cols tc1
  join user_tab_cols tc2
      on tc2.table_name > tc1.table_name -- only pick 1 of each twin match
  where tc1.data_type = 'VARCHAR2' and tc1.data_length > 8
    and tc2.data_type = 'VARCHAR2' and tc2.data_length > 8)
select 'select ''' || t1 || '.' || c1 || ' = ' || t2 || '.' || c2 || '''
    as candidate
from ' || t1 || '
join ' || t2 || ' on ' || t1 || '.' || c1 || ' = ' || t2 || '.' || c2 || '
group by 1
having count(1) > 0;'
from test_cols;

Run that, and copy/paste the output in a new SQL script:

SET HEADING OFF
SET FEEDBACK OFF
--copy/paste your result from step 2 here:
select 'MYTABLE1.COL1 = MYTABLE2.COL1'
    as candidate
from MYTABLE1
join MYTABLE2 on MYTABLE1.COL1 = MYTABLE2.COL1
group by 1
having count(1) > 0;
... a bunch more select statements

And run the whole thing as a script. Output is all the columns which had at least 1 matching value between them -- good candidates to look at for foreign keys.

MYTABLE1.COL3 = MYTABLE2.COL5
MYTABLE1.COL3 = MYTABLE2.COL6
etc