0
votes

I have a following .txt file:

Mark1[Country1]
type1=1 type2=5 
type1=1.50 EUR type2=21.00 EUR 
Mark2[Country2]
type1=2 type2=1 type3=1 
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
....

I am trying to input it in my SAS program, so that it would look something like that:

  Mark  Country   Type  Count   Price

1 Mark1 Country1  type1   1     1.50 
2 Mark1 Country1  type2   5     21.00 
3 Mark1 Country1  type3   NA     NA 
4 Mark2 Country2  type1   2     197.50 
5 Mark2 Country2  type2   2     201.00 
6 Mark2 Country2  type3   1     312.50 

Or maybe something else, but i need it to be possible to print two way report

       Country1   Country2 
Type1    ...        ...  
Type2    ...        ...   
Type3    ...        ...  

But the question is how to read that kind of txt file:

  1. read and separate Mark1[Country1] to two columns Mark and Country;
  2. retain Mark and Country and read info for each Type (+somehow ignoring type1=, maybe using formats) and input it in a table. Maybe there is a way to use some kind of input templates to achive that or nasted queries.
2
So sometimes there is one word after the equal sign (type1=1) and somethings there are more (type1=1.50 EUR). Is there any pattern to when you have one type of string and when you have the other? Do you always have three lines per group?Tom
@Tom Yes it is exactly three lines per group. First with mark and country, second with count for each type, and third line with prices.adeline

2 Answers

2
votes

You can specify the name of variable with the DLM= option on the INFILE statement. That way you can change the delimiter depending on the type of line being read.

It looks like you have three lines per group. The first one have the MARK and COUNTRY values. The second one has a list of COUNT values and the third one has a list of PRICE values. So something like this should work.

data want ;
  length dlm $2 ;
  length Mark $8 Country $20 rectype $8 recno 8 type $10 value1 8 value2 $8 ;
  infile cards dlm=dlm truncover ;
  dlm='[]';
  input mark country ;
  dlm='= ';
  do rectype='Count','Price';
    do recno=1 by 1 until(type=' ');
      input type value1 @;
      if rectype='Price' then input value2 @;
      if type ne ' ' then output;
    end;
    input;
  end;
cards;
Mark1[Country1]
type1=1 type2=5 
type1=1.50 EUR type2=21.00 EUR 
Mark2[Country2]
type1=2 type2=1 type3=1 
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
;

Results:

Obs    Mark     Country     rectype    recno    type     value1    value2

  1    Mark1    Country1     Count       1      type1       1.0
  2    Mark1    Country1     Count       2      type2       5.0
  3    Mark1    Country1     Price       1      type1       1.5     EUR
  4    Mark1    Country1     Price       2      type2      21.0     EUR
  5    Mark2    Country2     Count       1      type1       2.0
  6    Mark2    Country2     Count       2      type2       1.0
  7    Mark2    Country2     Count       3      type3       1.0
  8    Mark2    Country2     Price       1      type1     197.5     EUR
  9    Mark2    Country2     Price       2      type2     201.0     EUR
 10    Mark2    Country2     Price       3      type3     312.5     EUR
2
votes

You have 3 name/value pairs, but the pairs are split between two rows. An unusual text file requiring creative input. The INPUT statement has a line control feature # to read relative future rows within the implicit DATA Step loop.

Example (Proc REPORT)

Read the mark and country from the current row (relative row #1), the counts from relative row #2 using #2 and the prices from relative row #3. After the name/value inputs are made for a given mark country perform an array based pivot, transposing two variables (count and price) at a time into a categorical (type) data form.

Proc REPORT produces a 'two-way' listing. The listing is actually a summary report (cells under count and price are a default SUM aggregate), but each cell has only one contributing value so the SUM is the original individual value.

data have(keep=Mark Country Type Count Price);
  attrib mark country length=$10;

  infile cards delimiter='[ ]' missover; 

  input mark country;

  input #2 @'type1=' count_1 @'type2=' count_2 @'type3=' count_3;
  input #3 @'type1=' price_1 @'type2=' price_2 @'type3=' price_3;

  array counts count_:;
  array prices price_:;

  do _i_ = 1 to dim(counts);
    Type = cats('type',_i_);
    Count = counts(_i_);
    Price = prices(_i_);
    output;
  end;
datalines;
Mark1[Country1]
type1=1 type2=5 
type1=1.50 EUR type2=21.00 EUR
Mark2[Country2]
type1=2 type2=1 type3=1 
type1=197.50 EUR type2=201.00 EUR type3= 312.50 EUR
;

ods html file='twoway.html';

proc report data=have;
  column type country,(count price);
  define type / group;
  define country / ' ' across;
run;

ods html close;

Output image

enter image description here

Combined aggregation

proc means nway data=have noprint;
  class type country;
  var count price;
  output out=stats max(price)=price_max sum(count)=count_sum;
run;

data cells;
  set stats;
  if not missing(price_max) then 
    cell = cats(price_max,'(',count_sum,')');
run;

proc transpose data=cells out=twoway(drop=_name_);
  by type;
  id country;
  var cell;
run;

proc print noobs data=twoway;
run;

enter image description here