1
votes

I am currently trying to produce a scatterplot of a .txt file that is structured like this in 25 rows:

age income weight

33       63      180

25       72      220 

however, when I try to convert it to a csv and then produce a scatterplot with the following code:

my_input <- read.csv2('dataInput.txt', sep = '\t', header = T)

plot(x = my_input$ageX, y = my_input$weightY)

I get an error message. I also notice that there is now a period between 'age' 'income' and 'weight', which I don't understand since I would expect to get a comma between them. the error message is as follows:

Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf

Any ideas on how to actually get a scatterplot of the data?

Edit: executing

head(my_input)

age. income. weight
1  56     63     185
2  38     72     156
3  28     75     178
4  49     59     205
5  69     65     235
6  19     70     195

Edit:

str(my_input)

age.income.weight: Factor w/ 18 levels "56  63     185",..: 1 2 3 4 5 6 7 8 9 10 ...
summary(my_input)
age.income.weight

 56     63     185: 1     
 38     72     156: 1     
 28     75     178: 1     
 49     59     205: 1     
 69     65     235: 1     
 19     70     195: 1     
 (Other)          :19     
1
sure, its up nownoobmaster69
@dc37 Thanks, that seems to produce a scatterplot, although i'm not sure its providing the right output.noobmaster69
Well, when looking at the scatterplot I am getting a perfect straight line plot, but when looking at the numbers they are not 1:1 so I would not expect to get such a straight line, but more of a curved line or something.noobmaster69
@dc37 Picture has been added.noobmaster69
Your image show that your data were not plotted. You should check that your data are in a numerical format using str(my_input)dc37

1 Answers

1
votes

Based on your edits in your question, you have an issue in the loading of your txt file. While checking the structure of your text file, it appears that there is no consistent spacing between each row and columns.

So, one way to get it to work is to create the dataframe from scratch by read it using readLines:

my_input <- readLines("crime_input.txt")
my_input <- unlist(strsplit(my_input," "))

Now you see that the file contains a lot of space:

> my_input
  [1] "age"    "income" "crimes" "16"     ""       ""       ""       ""       "63"     ""       ""       ""      
 [13] ""       "23"     "18"     ""       ""       ""       ""       "72"     ""       ""       ""       ""      
 [25] "25"     "18"     ""       ""       ""       ""       "75"     ""       ""       ""       ""       "22"    
 [37] "19"     ""       ""       ""       ""       "59"     ""       ""       ""       ""       "16"     "19"    
 [49] ""       ""       ""       ""       "65"     ""       ""       ""       ""       "19"     "19"     ""      
 [61] ""       ""       ""       "70"     ""       ""       ""       ""       "19"     "20"     ""       ""      
 [73] ""       ""       "78"     ""       ""       ""       ""       "18"     "21"     ""       ""       ""      
 [85] ""       "35"     ""       ""       ""       ""       "11"     "21"     ""       ""       ""       ""      
 [97] "53"     ""       ""       ""       ""       "15"     "23"     ""       ""       ""       ""       "28"    
[109] ""       ""       ""       ""       ""       "9"      "27"     ""       ""       ""       ""       "56"    
[121] ""       ""       ""       ""       "16"     "28"     ""       ""       ""       ""       "52"     ""      
[133] ""       ""       ""       "14"     "29"     ""       ""       ""       ""       "63"     ""       ""      
[145] ""       ""       "25"     "30"     ""       ""       ""       ""       "46"     ""       ""       ""      
[157] ""       "17"     "30"     ""       ""       ""       ""       "55"     ""       ""       ""       ""      
[169] "19"     "31"     ""       ""       ""       ""       "29"     ""       ""       ""       ""       ""      
[181] "8"      "32"     ""       ""       ""       ""       "55"     ""       ""       ""       ""       "22"    
[193] "32"     ""       ""       ""       ""       "62"     ""       ""       ""       ""       "25"    

So, we can convert everything to numeric, remove NA and get:

my_input <- as.numeric(my_input)
my_input <- my_input[!is.na(my_input)]

To get:

> my_input
 [1] 16 63 23 18 72 25 18 75 22 19 59 16 19 65 19 19 70 19 20 78 18 21 35 11 21 53 15 23 28  9 27 56 16 28 52 14
[37] 29 63 25 30 46 17 30 55 19 31 29  8 32 55 22 32 62 25

Finally, we can fill a matrix with this vector:

my_input <- matrix(my_input, nrow = 3, ncol = length(my_input)/3)
> my_input
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]   16   18   18   19   19   19   20   21   21    23    27    28    29    30    30    31    32    32
[2,]   63   72   75   59   65   70   78   35   53    28    56    52    63    46    55    29    55    62
[3,]   23   25   22   16   19   19   18   11   15     9    16    14    25    17    19     8    22    25

Now, we can transpose the matrix, transform as a data.frame and add colnames:

my_input <- as.data.frame(t(my_input))
colnames(my_input) <- c("age","income","crimes")

And finally, you get:

> head(my_input)
   age income crimes
1   16     63     23
2   18     72     25
3   18     75     22
4   19     59     16
5   19     65     19
6   19     70     19

And if you check the format of my_input:

> str(my_input)
'data.frame':   18 obs. of  3 variables:
 $ age   : num  16 18 18 19 19 19 20 21 21 23 ...
 $ income: num  63 72 75 59 65 70 78 35 53 28 ...
 $ crimes: num  23 25 22 16 19 19 18 11 15 9 ...

So, now, you can plot it:

my_input = my_input[order(my_input$age),]
plot(x = my_input$age, y = my_input$crimes, type = "b")

enter image description here

Now, you can work with this file. Hope it helps you to solve this issue.