I would like to retrieve information from wikidata and store it in a dataframe. For the sake of simplicity I am going to assume that I want to get the genre of the following movies and then filter those that belong to sci-fi:
movies = c("Star Wars Episode IV: A New Hope", "Interstellar",
"Happythankyoumoreplease")
I know there is a package called WikidataR
. If I am not wrong, and according to its vignettes there are two commands that may be useful: find_item
and find_property
allow you to retrieve a set of Wikidata items or properties where the aliase or descriptions match a particular search term. Apparently they are great for me, so I thought of doing something like
for (i in movies) {
info = find_item(i)
}
This is what I get from each item:
> find_item("Interstellar")
Wikidata item search
Number of results: 10
Results:
1 Interstellar (Q13417189) - 2014 US science fiction film
2 Interstellar (Q6057099)
3 interstellar medium (Q41872) - matter and fields (radiation) that exist in the space between the star systems in a galaxy;includes gas in ionic, atomic or molecular form, dust and cosmic rays. It fills interstellar space and blends smoothly into the surrounding intergalactic space
4 space colonization (Q686876) - concept of permanent human habitation outside of Earth
5 rogue planet (Q167910) - planetary-mass object that orbits the galaxy directly
6 interstellar cloud (Q1054444) - accumulation of gas, plasma and dust in a galaxy
7 interstellar travel (Q834826) - term used for hypothetical manned or unmanned travel between stars
8 Interstellar Boundary Explorer (Q835898)
9 starship (Q2003852) - spacecraft designed for interstellar travel
10 interstellar object (Q2441216) - astronomical object in interstellar space, such as a comet
>
Unfortunately, the information that I get from find_item (see below) has two problems:
- it is not a dataframe with all wikidata information of the item I am searching but a list of what seems to be metadata (wikidata's id, link...).
- it does not have the information I need (wikidata's properties from each particular wikidata item).
Similarly, find_property
provides metadata of a certain property. find_property("genre")
retrieves the following information:
> find_property("genre")
Wikidata property search
Number of results: 4
Results:
1 genre (P136) - a creative work's genre or an artist's field of work (P101). Use main subject (P921) to relate creative works to their topic
2 radio format (P415) - describes the overall content broadcast on a radio station
3 sex or gender (P21) - sexual identity of subject: male (Q6581097), female (Q6581072), intersex (Q1097630), transgender female (Q1052281), transgender male (Q2449503). Animals: male animal (Q44148), female animal (Q43445). Groups of same gender use "subclass of" (P279)
4 gender of a scientific name of a genus (P2433) - determines the correct form of some names of species and subdivisions of species, also subdivisions of a genus
This has similar problems:
- it is not a dataframe
- it just stores metadata about the property
- I don't find any way to link each property with each object in
movies
vector.
Is there any way to end up with a dataframe containing the genre's of those movies? (or a dataframe with all wikidata's information which I will have to manipulate in order to filter or select my desired data?)