2
votes

I am searching for the most efficient way to grab a set amount of words (in order) from a string.

So let's say I have a paragraph of text:

"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

I want to be able to grab a variable amount of words at random positions in the paragraph. So if 5 words were wanted an example of some outputs could be:

  • "release of Letraset sheets containing"
  • "Lorem Ipsum is simply dummy"
  • "only five centuries, but also"

What would be the best way of going about doing this?

5

5 Answers

5
votes

Split the data up by spaces to get a list of words, then find a random place to select the words from (at least 5 words from the end), and then join the words back together.

private static readonly Random random = new Random();
public static void Main(string[] args)
{
    var data =
        "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
    Console.WriteLine(GetRandomWords(data, 5));
    Console.ReadLine();
}

private static string GetRandomWords(string data, int x)
{
    // Split data into words.
    var words = data.Split(' ');
    // Find a random place to start, at least x words back.
    var start = random.Next(0, words.Length - x);
    // Select the words.
    var selectedWords = words.Skip(start).Take(x);
    return string.Join(" ", selectedWords);
}

Example output:

the 1960s with the release
PageMaker including versions of Lorem
since the 1500s, when an
leap into electronic typesetting, remaining
typesetting, remaining essentially unchanged. It
1
votes

For sequential variations, I would do these:

  1. Put them in Array of words by split(' ')
  2. Generate a random value from 0 to length of Array minus 5 by Random
  3. Put them in a sentence, gives some spaces.

VB version + testing result

(This might be what you are more interested in)

Imports System
Imports System.Text

Public Module Module1
    Public Sub Main()
        Dim str As String = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
        Console.WriteLine(GrabRandSequence(str))
        Console.WriteLine(GrabRandSequence(str))
        Console.WriteLine(GrabRandSequence(str))
        Console.ReadKey()
    End Sub

    Public Function GrabRandSequence(inputstr As String)
        Try
            Dim words As String() = inputstr.Split(New Char() {" "c})
            Dim index As Integer
            index = CInt(Math.Floor((words.Length - 5) * Rnd()))
            Return [String].Join(" ", words, index, 5)

        Catch e As Exception
            Return e.ToString()
        End Try
    End Function    
End Module

Result

enter image description here

C# version

string[] words = input.Split(' '); //Read 1.
int val = (new Random()).Next(0, words.Length - 5); //Read 2.
string result = string.Join(" ", words, val, 5); //Read 3. improved by Enigmativy's suggestion

Additional try

For random variations, I would do these:

  1. Clean up all unnecessary characters (., etc)
  2. Put them in a List by LINQ split(' ')
  3. Select Distinct among them by LINQ (optional, to avoid result like Lorem Lorem Lorem Lorem Lorem)
  4. Generate 5 distinct random values from 0 to size of List by Random (repeat the picking when not distinct)
  5. Pick the words according to random values from the List
  6. Put them in a sentence, gives some spaces.

Warning: the sentence may not make any sense at all!!


C# version (only)

string input = "the input sentence, blabla";
input = input.Replace(",","").Replace(".",""); //Read 1. add as many replace as you want
List<string> words = input.Split(' ').Distinct.ToList(); //Read 2. and 3.
Random rand = new Random(); 
List<int> vals = new List<int>();

do { //Read 4.
    int val = rand.Next(0, words.Count);
    if (!vals.Contains(val))
        vals.Add(val);
} while (vals.Count < 5);

string result = "";
for (int i = 0; i < 5; ++i)
    result += words[vals[i]] + (i == 4 ? "" : " "); //read 5. and 6.

Your result is in the result

0
votes
string sentense = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
            string[] wordCollections = sentense.Split(' ');
            Random rnd = new Random();
            int randomPos=rnd.Next(0, wordCollections.Length);
            string grabAttempt1 = String.Join(" ", wordCollections.ToArray(), randomPos, 5);
// Gives you a random string of 5 words             
            randomPos = rnd.Next(0, wordCollections.Length);
            string grabAttempt2 = String.Join(" ", wordCollections, randomPos, 5);
// Gives you another random string of 5 words
0
votes
        string input = "Your long sentence here";
        int noOfWords = 5;

        string[] arr = input.Split(' ');

        Random rnd = new Random();
        int start = rnd.Next(0, arr.Length - noOfWords);

        string output = "";
        for(int i = start; i < start + noOfWords; i++)
            output += arr[i] + " ";

        Console.WriteLine(output);
0
votes

This might do the trick for you

    private void pickRandom()
    {
        string somestr = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
        string[] newinp = somestr.Split(' ');
        Random rnd = new Random();
        int strtindex = rnd.Next(0, newinp.Length - 5);
        string fivewordString = String.Join(" ", newinp.Skip(strtindex).Take(5).ToArray());
    }