0
votes

I have a data frame, wherein I've created new variables (which are 'cleaned' versions of the originals). When I subset the data frame, these new variables don't seem to be in the subsetted data frame. Do I need to create these new variables again? Or is there a way to ensure that they are in the subsetted data frame.

A little more detail: I have 'attach'-ed a data frame 'x'.

newdf <- subset (x, (income %in% c('<20000')))

(Income is cleaned version of another variable, and is a factor variable.) So the new data frame should contain only those with income less than 20000.

This seems to work, and does, indeed give me a new data frame with the correct number of observations.

However, when I try to do

freq (newdf$income) 

I get:

Error in plot.window(xlim, ylim, log = log, ...) : need finite 'xlim' values In addition: Warning messages: 1: In min(w.l) : no non-missing arguments to min; returning Inf 2: In max(w.r) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf

Thanks!

1
Can you provide some sample data?Wimpel
Please post a dataset example using dput(head(data, 20)). And sample code, like this it's impossible for anyone to help you.Rui Barradas

1 Answers

0
votes

Use table() function to check frequency or count of income in the new subsetted data frame.

As far as I understand you want to subset your dataset having just one of the levels of income variable i.e. <20000 in the newdf and then checking no. of observations with income <20000 in the newdf

Implementing same in iris dataset

dim(iris)
# [1] 150   5  

table(iris$Species)

# setosa versicolor  virginica 
#     50         50         50 


newdf <- subset(iris, Species %in% "virginica")
dim(newdf)
# [1] 50  5

table(newdf$Species)

# setosa versicolor  virginica 
#      0          0         50 

Another Example

df <- data.frame(a = 1:9, b = as.factor(rep(c("<100","<200", "<300"), each = 3)))
df
#   a    b
# 1 1 <100
# 2 2 <100
# 3 3 <100
# 4 4 <200
# 5 5 <200
# 6 6 <200
# 7 7 <300
# 8 8 <300
# 9 9 <300

table(df$b)

# <100 <200 <300 
#    3    3    3 

newdf <- subset(df, b %in% "<300")
newdf
#   a    b
# 7 7 <300
# 8 8 <300
# 9 9 <300

table(newdf$b)

# <100 <200 <300 
#    0    0    3