0
votes

I have a String that I want to split by the whitespaces to store in a dictionary of words (simple enough). However, I also want each of word's index and length.

So far, I just have a Dictionary of the words and in which order they were found....

 private Dictionary<int,String> makeDictionary(String ASCII)
    {
        string[] t = ASCII.Split(new[] { ' ' },
           StringSplitOptions.RemoveEmptyEntries);
        Dictionary<int, string>  aDictionary = new Dictionary<int, string>();
        for (int i = 0; i < t.Length; i++)
        {
            t[i] = stripSymbolsFromString(t[i]);

            if (!aDictionary.ContainsValue(t[i]) && t[i] != "")
            {
                aDictionary.Add(i, t[i]);
            }
        }
        return aDictionary;
    }

Does anyone have any idea how I can use .Split() while keeping the indexes, or will I have to use a different technique of concatenation? As someone posted below, Using Regex will give the index of the match.

EDIT: I do not need the length. As someone pointed out, I can just get it from the string. I will just need the starting index of the word.

EDIT2: I will ignore duplicate words.

EDIT3: Here is an example of a string I would be using:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

So the first couple elements would be

[0]=>Lorum,

[6]=>Ipsum,

[12]=>is

where the number 0,6,12 is the original index of the word within the String

2
are you talking about retaining the Ordinal Position when the strings are being split..? also if you can paste an example of the string you are try ing to split..MethodMan
Do you mean index as in position in the original string? Or position within the split array? Also, your dictionary seems to be backwards. It ought to use string as the key, which avoids needing to use .ContainsValue in every iteration.Matt Burland
I agree with Matt the dictionary should be Dictionary<string, int> aDictionary = new Dictionary<string, int>();MethodMan
How do you expect to search or retrieve these? Technically, the split will keep the information, based off of what I am reading. array.Length - item[i] gives its relation, similar to .Last() would be essentially array.Lenth -1...Austin T French
How do you expect to store index and length in List<int>? And length is just a property of String. Is index the index of the word or the character position?paparazzo

2 Answers

2
votes
string s = "abc def ghijkl mno abc";

var words = Regex.Matches(s, @"[^ ]+").Cast<Match>()
                .Select(m => new
                {
                    Str = m.Value,  //OR Length = m.Value.Length
                    Offset = m.Index
                })
                .ToList();

You can further process the words to form a dictionary

var dict = words.GroupBy(w => w.Str)
                .ToDictionary(g => g.Key, g => g.Select(x => x.Offset).ToList());
0
votes

Regex

Match Class

Match has index and length