slang-users mailing list

[2021 Date Index] [2021 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

Re: [slang-users] Using the csv module


Hi Morten,

Morten Bo Johansen <mbj@xxxxxxxxxxx> wrote:
> Hi
>
> I am dabbling a little with the csv module. I would like to
> create a few functions whereby I can calculate my daily intake
> of various nutrients from a csv data file and print out how
> these align to the official recommendations. I suspect the
> results from my diet will be appalling, but that's another
> matter ;)
>
> Of course, I need to be able get the data from the intersecting
> cells of rows and columns. The lines (very curtailed) of the
> data file look like this in an English version:
>
>   ,,,Waste,"Energy, kJ","Energy, kcal",Nitrogen-to-protein factor,"Nitrogen, total" ...
>   Number,Group,Name,%,kJ,kcal,-,g,g,g,g,g ...
>   1,Soft fruit,"Strawberry, raw",4,162,38,6.25,0.106 ...
>   2,Pome fruit,"Apple, raw, all varieties",10,233,55,6.25,0.043 ...
>   3,Tropical or subtropical fruit,"Banana, raw",41,396,93,6.25,0.183 ...
>   4,Root and tuber vegetables,"Potato, raw",25,326,77,6.25,0.324 ...

The first two lines appear to be a header that gives names to the
columns.  I think that it would be better to use a single header line
containing distinct column names:

  number,group,name,pct_waste,en_kJ,en_kcal,N2_protein_factor,N2_total,...
  1,Soft fruit,"Strawberry, raw",4,162,38,6.25,0.106 ...
  2,Pome fruit,"Apple, raw, all varieties",10,233,55,6.25,0.043 ...
  3,Tropical or subtropical fruit,"Banana, raw",41,396,93,6.25,0.183 ...
  4,Root and tuber vegetables,"Potato, raw",25,326,77,6.25,0.324 ...

The use of this single header is also consistent with RFC 4180
<https://tools.ietf.org/html/rfc4180>.

If you then read the file using

   csv = csv_readcol (csv_file; has_header);

then csv will be a struct with whose field names correspond to the
(lower-cased) column names specified in the file, e.g., csv.number
would be an array with values ["1","2","3","4"].

In the example you gave below, you want to know the number of kJ for
a raw strawberry.  To get this, you would use:

   strawberry_kj = csv.en_kj[wherefirst(csv.name == "Strawberry, raw")];

Or, if you want to get all the values associated with the strawberry
you can use:

   strawberry
     = struct_filter (csv, wherefirst(csv.name == "Strawberry, raw"); copy);

   strawberry_kj = strawberry.en_kj;
   strawberry_kcal = strawberry.en_kcal;
      .
      .

Does this help?  I think that it is far simpler than your approach.
Thanks,
--John

> So if I do:
>
>   variable csv_file = "food_data.csv";
>   variable csv = csv_decoder_new (csv_file);
>   variable datastruct = csv.readcol ();
>   variable colstruct = csv_readcol (csv_file; has_header);
>
> then I can get all the food items with
>
>   variable food_items = datastruct.col3;
>
> and all the nutritional items with	
>
>   variable nutr_items = get_struct_field_names (colstruct); % [""], lower case
>
> finding row and column numbers for particular items with e.g.
>
>   variable N = where (nutr_items == "energy, kj")[0];
>   variable F = where (food_items == "Strawberry, raw")[0];
>
> But how do I access the intersecting cell of N and F to get the value, 162?
> I don't understand csv.readrow. It seems that you can't read a particular
> numbered row with it? If I use it on this example, it just returns NULL.
> It also returns NULL with the example from csvfuns.hlp.
>
> In this example, the "energy kj" values would be in datastruct.col5 and so I
> could get the value, 162, with datastruct.col5[F], but this is not the way,
> I suppose.
>
> Thanks,
> Morten
>
> _______________________________________________
> For list information, visit <http://jedsoft.org/slang/mailinglists.html>.
>
_______________________________________________
For list information, visit <http://jedsoft.org/slang/mailinglists.html>.


[2021 date index] [2021 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]