1
votes

Consider this funny example

mytib <- tibble(text = c('i can see clearly now',
                         'the rain is gone'),
                myweight = c(1.7, 0.005)) 
# A tibble: 2 x 2
  text                  myweight
  <chr>                    <dbl>
1 i can see clearly now    1.7  
2 the rain is gone         0.005

I know how to create a dfm weighted by the docvars myweight. I proceed as follows:

dftest <- mytib %>% 
  corpus() %>% 
  tokens() %>% 
  dfm()

dftest * mytib$myweight 

Document-feature matrix of: 2 documents, 9 features (50.0% sparse).
2 x 9 sparse Matrix of class "dfm"
       features
docs      i can see clearly now   the  rain    is  gone
  text1 1.7 1.7 1.7     1.7 1.7 0     0     0     0    
  text2 0   0   0       0   0   0.005 0.005 0.005 0.005

However the issue is that I cannot use neither topfeatures nor colSums.

How can sum the values in every column then?

> dftest*mytib$myweight %>% Matrix::colSums(.)
Error in base::colSums(x, na.rm = na.rm, dims = dims, ...) : 
  'x' must be an array of at least two dimensions

Thanks!

2
colSums(dftest * mytib$myweight) works for me. - Jason Mathews
crazy stuff. why?? - ℕʘʘḆḽḘ

2 Answers

3
votes

Sometimes the %>% operator harms rather than helps. This works:

colSums(dftest * mytib$myweight)
##      i     can     see clearly     now     the    rain      is    gone 
##  1.700   1.700   1.700   1.700   1.700   0.005   0.005   0.005   0.005 

Also consider using dfm_weight(x, weights = ...) if you have a vector of weights for each feature. The operation above will recycle your weights to make it work the way you want, but you should understand why (in R, because of recycling and because of its column-major order).

1
votes

It would because of operator precedence. If we check the ?Syntax, special operators have higher precedence compared to multiplication (*)

...
%any%   special operators (including %% and %/%)  ###
* / multiply, divide   ###
...

Wrap the expression inside parens and it should work

(dftest*mytib$myweight) %>% 
       colSums
#     i     can     see clearly     now     the    rain      is    gone 
#   1.700   1.700   1.700   1.700   1.700   0.005   0.005   0.005   0.005