5
votes

I am a Hobby Xojo-User. I wanna import a Gedcom-File to my Program, espacially to a SQLite-Database.

Structure of the Database

Tables

Persons

 - ID: Integer
 - Gender: Varchar // M, F or U
 - Surname: Varchar
 - Givenname: Varchar

Relationships

 - ID: Integer
 - Husband: Integer
 - Wife: Integer

Children

 - ID: Integer
 - PersonID: Integer
 - FamilyID: Integer
 - Order: Integer

PersonEvents

 - ID: Integer
 - PersonID: Integer
 - EventType: Varchar // e.g. BIRT, DEAT, BURI, CHR
 - Date: Varchar
 - Description: Varchar
 - Order: Integer

RelationshipEvents

 - ID: Integer
 - RelationshipID: Integer
 - EventType: Varchar // e.g. MARR, DIV, DIVF
 - Date: Varchar
 - Description: Integer
 - Order: Integer

I wrote a working Gedcom-Line-Parser. He splits a single Gedcomline into:

 - Level As Integer
 - Reference As String // optional
 - Tag As String
 - Value As String // optional

I load the Gedcom-File via TextInputStream (working fine). No i need to parse every Line.

Gedcom-Individual-Sample

0 @I1@ INDI
1 NAME George /Clooney/
2 GIVN George
2 SURN Clooney
1 BIRT
2 DATE 6 MAY 1961
2 PLAC Lexington, Fayette County, Kentucky, USA

You'll see, the Level-Numbers shows us a "Tree-Structure". So i thought it would be the best and simplest way to parse the File into separated Objects (PersonObj, RelationshipObj, EventObj etc.) into a JSONItem, because there its easy to get the Childs of a Node. Later on, i can simple read the Nodes, Child-Nodes to create the Database-Entries. But i don't know how to create such an Algorithm.

Can anyone help my please?

1
As this question is rather complex and individual and probably requires some back-and-forth discussion, I think this is better asked in the Xojo forum. - Thomas Tempelmann
Hi Thomas Tempelmann ;) I done this many times before, but looks like, no one is really interested in this area :/ Thats why i'm asking here. Maybe you can gave me some more input? - Genealogy
I don't even understand where your difficulties are. One one hand, you can describe the algorithm, yet you say you cannot code it. There are so many things you may or may not know, and it's a bit too much to ask that I assume you don't know anything and I provide a complete solution here, spending maybe half an hour on it all. - Thomas Tempelmann
Ok, i spent the whole night to write and discribe a structure and a simple Parser. It's working fine But i think, if there will be a Gedcom File with 10000 or more Lines, the App will freeze. I'll informe You! - Genealogy
Yes, if you have something working and need ideas to optimize or fix it, this is where people are more willing to help as they can see what you've done exactly. - Thomas Tempelmann

1 Answers

3
votes

To parse the Gedcom lines with a good speed, try these ideas:

Read the entire file into a String and split the lines up:

dim f as FolderItem = ...
dim fileContent as String = TextInputStream.Open(f).ReadAll
fileContent = fileContent.DefineEncoding (Encodings.WindowsLatin1)
dim lines() as String = ReplaceLineEndings(fileContent,EndOfLine).Split(EndOfLine)

Parse every line using RegEx to extract its 3 columns

dim re as new RegEx
re.SearchPattern = "^(\d+) ([^ ]+)(.*)$"
for each line as String in lines
  dim rm as RegExMatch = re.Search (line)
  if rm = nil then
    // nothing found in this line. Is this correct?
    break
    continue // -> onward with next line
  end
  dim level as Integer = rm.SubExpressionString(1).Val
  dim code as String = rm.SubExpressionString(2)
  dim value as String = rm.SubExpressionString(3).Trim
  ... process the level, code and value
next

The RegEx search pattern means that it looks for the start of the line ("^"), then for one or more digits ("\d"), a blank, one or more non-blank chars ("[^ ]"), and finally any more chars (".") before the end of the string ("$"). The parentheses around each of these groups is for extracting their results with SubExpression() then.

The check for rm = nil hits whenever the line does not contain at least a number, a blank and at least one more character. If the Gedcom file is malformed or has blank lines, this may be the case.

Hope this helps.