4
votes

I would like to subscribe to an RSS/XML feed from Google News that captures the following query:

Articles mentioning "studie" (German for "study"), written in German, emanating from any country.

I'm using https://news.google.com/rss/search, but for this example, it's easier to see the UI output at https://news.google.com/search, so I'll use the latter URL base in this example.

Now, in the XML API reference, Google mentions four different parameters that influence either language or country:

  • hl (host language): the language that the end user is assumed to be typing in. I.e., an English-language speaker types "study," and Google assumes that term is in English and then machine-translates the results back to English. For me, navigating to will redirect a URL with hl=en-US (full URL is https://news.google.com/?hl=en-US&gl=US&ceid=US:en).

  • gl: boosts search results whose country of origin matches the parameter value. The default in my web browser is gl=US.

  • lr (language restrict): restricts search results to documents written in a particular language

  • cr (country restrict): restricts search results to documents originating in a particular country

Based on all of the above, that would imply a URL of*:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

That attempt, however, fails miserably; it shows English-language results from the U.S., and it 302 redirects to:

https://news.google.com/search?q=study&lr=lang_de&hl=en-US&gl=US&ceid=US:en

So, to that end:

  • How can I properly structure URL parameters to capture 'Articles mentioning "studie" (German for "study"), written in German, from any country.'?
  • What the heck is ceid and why is it documented absolutely nowhere by Google?

* I.e.:

>>> import urllib.parse
>>> urllib.parse.parse_qs('q=study&hl=en-US&lr=lang_de')                                                                                                     
{'q': ['study'], 'hl': ['en-US'], 'lr': ['lang_de']}

Related but not resolving any of this:

3
Have you tried this in postman or curl ?Edward Aung
Just Python and in a browser @EdwardAung. (Which both allow redirects by default.) Would you suspect curl would produce different behavior?Brad Solomon
The linked documentation mentions that the client, output, and cx parameters are all requiredEzphares
Yes, but that's for Google Custom Search Engine @Ezphares. news.google.com doesn't seem to require thoseBrad Solomon
If the documentation is only valid for Custom Search then I would expect any information on hl and lr to also be valid only in that contextEzphares

3 Answers

0
votes

I'm using the following URL, it works for me:

https://news.google.com/rss?q=studie&hl=de-DE&gl=DE&ceid=DE:de

you can also search in topics, please refer to this answer: URL format for Google News RSS feed

0
votes

I know nothing about the RSS interface but as for the standard news UI maybe this can be of use:

ceid (country:language) is Google's news filter, so lr (which Google news seem to ignore) and cr are restricted even further by only sifting through the news defined by the news filter. For US news in English it's ceid=US:en and for news in Great Britian it's ceid=GB:en. Source: https://rapidapi.com/apigeek/api/google-search3/details

NOTE: If you don't specify a ceid, one will be applied based on your current position. Also, Google news doesn't seem to care at all about the lr parameter: it sticks to the language of ceid and that's it. Based on your query: Articles mentioning "studie" (German for "study"), written in German, emanating from any country, I would suggest a value of DE:de, however you may find the ceid parameter somewhat constricting regarding "emanating from any country", but there's nothing you can do about that. Google news is based on the concept that every place has their own news feed, and "emanating from any country" sounds an awful lot like "all the news from all places on Earth", and there's no such Google news. "World" news is as you know not quite the same thing. If you need to have no restrictions at all regarding country of production/publication, you'll be better off looking for another outlet. In the Google universe, an advanced Google search proper applying a restriction when the document was published for freshness is probably impossible to beat.

The four other parameters involved in your search are:

hl, host(interface) language: hl=de
gl, boost country of origin: gl=DE
lr, restrict results to language: lr=de
cr, restrict results to country: none

There's two mistakes in the suggested search string:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

q=studie, not study, and
lr=de, not lang_de.

However, Google news doesn't care about the lr parameter: it sticks to the language of ceid. Also, hl is always set to the language of ceid and gl is set to the country part, and I recommend a ceid of DE:de for your query.

So the search string for DE:de becomes:

https://news.google.com/search?q=studie&hl=de&gl=DE&ceid=DE:de

Also to add to the Library of Congress link given by Sreeram Nair, there's no country codes given there. You can find country codes here:

• the ISO 3166-1 alpha-2 (2-letter country) standard, https://en.m.wikipedia.org/wiki/ISO_3166-1_alpha-2

You may also find this document with language codes easier to read on a mobile:

• List of ISO 639-1 (language) codes https://en.m.wikipedia.org/wiki/List_of_ISO_639-1_codes

Sources: Wikipedia articles

• the software term Locale, https://en.m.wikipedia.org/wiki/Locale_(computer_software)

• the ISO 639 (language) standard, https://en.m.wikipedia.org/wiki/ISO_639

-1
votes

The New URL for Google New RSS is changed. You can use the following format for fetching. Also examples can be seen here.

usage: gnrss2opml.py [-h] [-o OUTPUT] [-c COUNTRY] [-l LANGUAGE] [-s]
                     [-t [TOPIC [TOPIC ...]]] [-g [LOCATION [LOCATION ...]]]
                     [-q [QUERY [QUERY ...]]]

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file name (default: print to stdout)
  -c COUNTRY, --country COUNTRY
                        country / Google News edition (default: us)
  -l LANGUAGE, --language LANGUAGE
                        language (default: en)
  -s, --stories         include Top Stories
  -t [TOPIC [TOPIC ...]], --topics [TOPIC [TOPIC ...]]
                        list of topics, will be converted to uppercase
                        (default: WORLD NATION BUSINESS TECHNOLOGY
                        ENTERTAINMENT SPORTS SCIENCE HEALTH)
  -g [LOCATION [LOCATION ...]], --locations [LOCATION [LOCATION ...]]
                        list of geographic locations (default: None)
  -q [QUERY [QUERY ...]], --queries [QUERY [QUERY ...]]
                        list of search queries (default: None)

EDIT1:

The 2 letter language code and country code can be specified in the argument.

Get the codes from here