0
votes

I have the following list of HTML inputs. The list has a nested structure -

  1. Level 1 contains the names of the inputs (e.g. input1).
  2. Level 2 contains some information about each input - name, attribs, children
  3. Level 3 branches off children, which is a list of length 2 - the first element contains information about the input's label and the second contains information about the type of input. Since I need the input labels, I need to extract the first element of this list for each input.

The list:

library(purrr)

inputs = list(
  input1 = list(
    name = 'div', 
    attribs = list(class = 'form-group'), 
    children = list(list(name = 'label', 
                         attribs = list(`for` = 'email'), 
                         children = list('Email')), 
                    list(
                      list(name = 'input', 
                           attribs = list(id = 'email', type = 'text'), 
                           children = list()))
                    )))

str(inputs)
List of 1
 $ input1:List of 3
  ..$ name    : chr "div"
  ..$ attribs :List of 1
  .. ..$ class: chr "form-group"
  ..$ children:List of 2
  .. ..$ :List of 3
  .. .. ..$ name    : chr "label"
  .. .. ..$ attribs :List of 1
  .. .. .. ..$ for: chr "email"
  .. .. ..$ children:List of 1
  .. .. .. ..$ : chr "Email"
  .. ..$ :List of 1
  .. .. ..$ :List of 3
  .. .. .. ..$ name    : chr "input"
  .. .. .. ..$ attribs :List of 2
  .. .. .. .. ..$ id  : chr "email"
  .. .. .. .. ..$ type: chr "text"
  .. .. .. ..$ children: list()

I am able to do this using keep() and has_element :

label = input %>% 
  map_depth(2, ~keep(., ~has_element(., 'label'))) %>%
  map('children') %>%
  flatten %>% 
  map('children') %>%
  flatten

Output:

str(label)
List of 1
 $ input1: chr "Email"

When I was looking through the purrr help pages, keep seemed to be the function I was after but I still had to use map and flatten twice to get to the label, which seems clumsy. So I was wondering if there is a more direct way to achieve the same output? I am not so much interested in the solution as I am in the thought process behind working with nested lists like these.

2
I noticed that your second "list" is subset an additional amount, was this intentional? Right now it is: { {n,a,c={ { {n,a,c} , {{n,a,c}} } } } }. Did you mean { {n,a,c={ { {n,a,c} , {n,a,c} } } } }?Marian Minar
@Marian I'm not sure I understand the notation in the comment but the actual list contains N inputs, i.e. input1, input2, ..., inputN, each with a sublist of the same structure as the one in my question. Sorry I hadn't been clearer in my post.user51462

2 Answers

1
votes

If every input has the same structure, then you don't need keep, which is used to remove list elements that don't meet some condition. Instead, you can just map through with pluck like this. Of course, this method removes all the other data relevant to each input. You may want to do something different if the end goal is "rectangling", i.e. getting all the information for each input in a flat structure.

library(purrr)

inputs = list(
  input1 = list(
    name = 'div', 
    attribs = list(class = 'form-group'), 
    children = list(
      list(
        name = 'label', 
        attribs = list(`for` = 'email'), 
        children = list('Email')
      ), 
      list(
        list(
          name = 'input', 
          attribs = list(id = 'email', type = 'text'), 
          children = list()
        )
      )
    )
  )
)

inputs %>%
  map(~ pluck(., "children", 1, "name"))
#> $input1
#> [1] "label"

Created on 2019-06-14 by the reprex package (v0.3.0)

1
votes

Try:

map(inputs, "children") %>% map_depth(2, "children")

Output:

$input1
$input1[[1]]
$input1[[1]][[1]]
[1] "Email"


$input1[[2]]
NULL