0
votes

I am reading a large file separately since pd.read_csv usually causes error and shut down the kernal in ipython notebook when reading large csv file.

However, the skiprow function does not work in my case have updated the pandas to the newest version to 0.20.1 but the skiprows function still does not work.

In the following part, I would like to skip the first 2 rows and read only 2nd to 6th rows. but failed to skip the first 2 rows by using skiprows in pd.read_csv.

def read(path, header):
    df= pd.read_csv(path, nrows=6, engine='python')
    df1= pd.read_csv(path, skiprows=2, nrows=6, engine='python' )
    df.columns= header    

    print df.shape
    print df1.shape
    return df

and the results turns out to be

(6, 26)
(6, 26)

which shows that the skiprows does not work at all.. have googled but did not see anyone having the same problem as me.. I am wondering if I have missed some important part that cause this problem.

Thanks in advance.


added information:

the first 7 rows of my csv files :

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) ウイザードリイ・外伝4 (管理:4366),4988606101009,998,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) スーパードラッケン (管理:3701),4906571521028,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) サンダーバード  (管理:9347),4988110900051,498,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"

20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(FC) ガンサイト (管理:8853),4988602564624,198,1,17302,2511,2161,16899,16904,16908,,,,,shopping,game_and_toy,video_game,retro_game,nes,software,,,,"

"


  20151201000000,b616e9b1f0b488ed2aacf08b6165fc4f76f664aeae46c20c49b7e1e2c81e5f71-ee42bb396f6f56f518c5b04df271c1f173c0bcf13496294464b8d87d3ee17945,(SFC) プリンセスメーカー (管理:4201),4904880133802,298,1,17297,2511,2161,16899,16900,16903,,,,,shopping,game_and_toy,video_game,retro_game,super_famicom,software,,,,"

it is very dirty and a redundant line ", " occurs any one of the two rows..

1
What does your file look like? - EdChum
hi, i would like to read the line from 3 to 6 by skipping the first 2 lines. - Leigh Tsai
Please add a sample of your CSV file - Jan Trienes
@EdChum hello I added the information to the article. thanks! - Leigh Tsai
The problem was clarified by @Kyle and the question is closed. thanks! - Leigh Tsai

1 Answers

1
votes

nrows is from the starting offset, not from the begining of the file. You want nrows=4.