Custom File Parser

Question

I am building a parser for a custom pipe delimited file format and I am finding my code to be very bulky, could someone suggest better methods of parsing this data?

The file's data is broken down by a line delimited by a pipe (|), each line starts with a record type, followed by an ID, followed by different number of columns after.

Ex: CDI|11111|OTHERDATA|somemore|other

CEX001|123131|DATA|data

. I am splitting by pipe, then grabbing the first two columns, and then using a switch checking the first line and calling a function that will parse the remaining into an object purpose built for that record type. I would really like a more elegant method.

    public Dictionary<string, DataRecord> Parse()
    { 
        var data = new Dictionary<string, DataRecord>();

        var rawDataDict = new Dictionary<string, List<List<string>>>();
        foreach (var line in File.ReadLines(_path))
        {
            var split = line.Split('|');
            var Id = split[1];
            if (!rawDataDict.ContainsKey(Id))
            {
                rawDataDict.Add(Id, new List<List<string>> {split.ToList()});
            }
            else
            {
                rawDataDict[Id].Add(split.ToList());
            }
        }

        rawDataDict.ToList().ForEach(pair =>
        {
            var key = pair.Key.ToString();
            var values = pair.Value;

            foreach (var value in values)
            {

                var recordType = value[0];

                switch (recordType)
                {
                    case "CDI":
                        var cdiRecord = ParseCdi(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key, CdiRecords = new List<CdiRecord>() {  cdiRecord }
                            });
                        }
                        else
                        {
                            data[key].CdiRecords.Add(cdiRecord);
                        }
                        break;
                    case "CEX015":
                        var cexRecord = ParseCex(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key,
                                CexRecords = new List<Cex015Record>() { cexRecord }
                            });
                        }
                        else
                        {
                            data[key].CexRecords.Add(cexRecord);
                        }
                        break;
                    case "CPH":
                        CphRecord cphRecord = ParseCph(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key,
                                CphRecords = new List<CphRecord>() { cphRecord }
                            });
                        }
                        else
                        {
                            data[key].CphRecords.Add(cphRecord);
                        }
                        break;
                }
            }
        });

        return data;
    }

You could use the filehelper csvparser, and set the delimeter to | and then just read each line and work with it with a switch statement for the record type.. or whatever suits your fancy — BugFinder
FileHelper is made for a single file of the same format isn't it? I have multiple record types in a single file. — Adam Reed
That's even more bloated than what I have :(. Maybe I've found the best solution :(. — Adam Reed

Kevin Smith Kevin Smith · Accepted Answer · 2016-12-08T14:42:00

Try out FileHelper, here is your exact example - http://www.filehelpers.net/example/QuickStart/ReadFileDelimited/

Given you're data of

CDI|11111|OTHERDATA|Datas
CEX001|123131|DATA
CCC|123131

You could create a class to model this to allow FileHelpers to parse the delimited file:

[DelimitedRecord("|")]
public class Record
{
    public string Type { get; set; }

    public string[] Fields { get; set; }
}

Then we could allow FileHelpers to parse in to this object type:

var engine = new FileHelperEngine<Record>();
var records = engine.ReadFile("Input.txt");

After we've got all the records loaded in to Record objects we can use a bit of linq to pull them in to their given types

var cdis = records.Where(x => x.Type == "CDI")
                .Select(x => new Cdi(x.Fields[0], x.Fields[1], x.Fields[2])
                .ToArray();

var cexs = records.Where(x => x.Type == "CEX001")
                .Select(x => new Cex(x.Fields[0], x.Fields[1)
                .ToArray();

var cccs = records.Where(x => x.Type == "CCC")
                .Select(x => new Ccc(x.Fields[0])
                .ToArray();

You could also simplify the above using something like AutoMapper - http://automapper.org/

Alternatively you could use ConditionalRecord attributes which will only parse certain lines if they match a given criteria. This will however be slower the more record types you have but you're code will be cleaner and FileHelpers will be doing most of the heavy lifting:

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CDI")]
public class Cdi
{
    public string Type { get; set; }

    public int Number { get; set; }

    public string Data1 { get; set; }

    public string Data2 { get; set; }

    public string Data3 { get; set; }
}

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CEX001")]
public class Cex001
{
    public string Type { get; set; }

    public int Number { get; set; }

    public string Data1 { get; set; }
}

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CCC")]
public class Ccc
{
    public string Type { get; set; }

    public int Number { get; set; }
}


            var input =
            @"CDI|11111|Data1|Data2|Data3
CEX001|123131|Data1
CCC|123131";

var CdiEngine = new FileHelperEngine<Cdi>();
var cdis = CdiEngine.ReadString(input);


var cexEngine = new FileHelperEngine<Cex001>();
var cexs = cexEngine.ReadString(input);

var cccEngine = new FileHelperEngine<Ccc>();
var cccs = cccEngine.ReadString(input);

Custom File Parser

2 Answers