I downloaded a Wikipedia dump from http://download.wikipedia.com/enwiki/latest/enwiki-latest-pages-articles.xml.bz2. Unzip it to enwiki.xml
and run php importDump.php < enwiki.xml
. It takes almost 2 days to finish. Somehow my local mediawiki has much less articles/pages/categories than the online wiki does.
select count(*) from page;
only gives me 691716
. Another good example is page United States is missing on my local mediawiki.
I also tried to export a small xml from https://en.wikipedia.org/wiki/Special:Export and use importDump.php to insert the xml into MySQL. The result looks fine. No page is missing.
1. Am I downloading the wrong Wikipedia image, or something went wrong with the import process when the xml is huge?
I've also tried both mwdumper.jar and the perl script according to this question on Stackoverflow. Even though I alter the page table to have the page_counter column, all the articles are missing their content. Every page is saying:
There is currently no text in this page.
2. Is mwimport.perl and mwdumper.jar out of date?
3. Where could I get the complete Wikipedia dump and how could I import the dump into MySQL correctly?
Thank you.