jed-users mailing list

[2020 Date Index] [2020 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

[jed-users] Re: Concatenating arrays


John E. Davis <jed@xxxxxxxxxxx> wrote:

> Morten Bo Johansen <mbj@xxxxxxxxx> wrote:
> > Concatenating arrays can be quite slow. In casu, I want to
> > concatenate arrays with file contents obtained with fgetslines
> > () - which returns the file as an array of lines. I wonder if
> > converting these arrays to lists and then concatenate them
> > would be more efficient?
> >
> > But if so, I am looking for an "array_to_list" function. There is
> > one such function in datutils.sl on jedmodes, but it fails with
> > a buffer overflow error.
> 
> My feeling is that array_to_list would result in a performace hit that
> would defeat the purpose of using it.
> 
> > To describe my problem in detail, then I want to hash a
> > spellchecking wordlist of ~500.000 words, each on a line by
> > themselves, to complete from but also include other files to
> > complete from, e.g. in markup modes. Obtaining the array (a) of
> > the 500.000 lines with fgetslines and hashing them takes about
> > 1.5 seconds, but if I add just a tiny array (b) to (a) like in
> > c = [a, b], then the execution time more than doubles.
> 
> How do you intend to use the result?  For example, could your code use
> a list of arrays?  That is, instead of c = [a, b], could you code use
> c = {a, b}, which is a list of 2 arrays?

Hi all,

John provided some assistance to me with the above problem,
off-list. So to conclude this thread properly - dangling
threads are not good for list morale ;) - I will give you
a run-down of our discussion.

To recapitulate, my goal was to concatenate arrays of lines
from different files and hash the concatenated array for use
with a word completion function.

As it turned out the concatenation of arrays - c = (a, b, ..) -
was only very partially the problem in the slow execution time,
even if concatenating or joining lists is still much faster.

By using the slang profiler, slprof, which is part of slsh,
many hotspots in my code could be pinpointed and improved for
better speed. Using tic/toc does not provide the same accuracy
as slprof, as it does not compensate for additional overhead.

I attach here two test files, "test0.sl" and "test1.sl". The
former is my original code and the latter is my code improved
by John following the analysis from slprof's output. I also
attach the outputs, "prof-test0.dat" and "prof-test1.dat"
from the application of slprof on the two test files.
You can run slprof like this:

  slprof -l -o <out.dat> <infile.sl>

The input files to the "make_completions_hash()" function was a
large text file of 471.000 lines and a couple of smaller files.

From the data of the two profiles, one can readily see that in
my original code I ran the "strchop2()" function ~471.000
times, one for each line in the input files even if it was only
needed 101 times. That shaved off 2 seconds.

The "read_file_lines()" function in test1.sl was an alternative
to "fgetslines()" that stripped the "\n" characters so I could
avoid running strtrim() on all the lines, which further reduced
the execution time.

In a line like this

  lines = lines [where (0 != strncmp ("#", lines, 1))];

I always created a new array, even if some files did not
contain any comments ("#").

As you can see the execution time was brought down quite
remarkably - from 6.5324 secs to 0.5960 secs.

You might of course attribute the fact that the code could be
improved so much to my rather modest skill level in slang
programming, so the real object of this post probably lies more
in drawing attention to the slang profiler. The profiler is not
yet incorporated into jed, but the test files show some
examples of using it with jed specific code. I hope that
someone might find it useful.

  Morten
% Use a string delimiter instead of a character delimiter to chop up a
% string - from a JED post on jed-users
private define strchop2 (str, delim)
{
  variable list = {};
  variable len = strbytelen (delim);
  variable i0 = 1, i;
  
  while (i = is_substrbytes (str, delim, i0), i != 0)
  {
    list_append (list, substrbytes (str, i0, i-i0));
    i0 = i + len;
  }
  
  list_append (list, substrbytes (str, i0, -1));
  return list_to_array (list);
}

#ifexists SLSH_PATH_ENV
private define flush (x)
{
  message (x);
}
#endif

private variable F = NULL;
private variable Words;

%% Create the hash of the completions file to complete from
private define make_completions_hash (Completions_File)
{
#ifnexists SLSH_PATH_ENV
  % In slang_mode, if no completions file, then generate one
  if ((0 == file_status (Completions_File)) && ("slang" == detect_mode ()))
  {
    variable funs = _apropos ("Global", ".", 15);
    Words = funs[array_sort (funs)];
    () = write_string_to_file (strjoin (Words, "\n"), Completions_File);
  }
#endif
  
  variable
    fp = fopen (Completions_File, "r"),
    include_files = [""],
    line_array = [""],
    inc_lines = [""],
    val_array = [""],
    line_arr = [""],
    lines = [""],
    include_file = "",
    all_lines = {},
  line_arrays,
    line = "",
    key = "";
  
  if (fp == NULL)
    throw OpenError, "could not open completions file, $Completions_File"$;
  
  if (length (F)) % purge a possibly existing hash
    F = Assoc_Type[Array_Type];
  
  % tic ();
  flush ("hashing $Completions_File ..."$);
  lines = fgetslines (fp);
  () = fclose (fp);
  include_files = lines [where (0 == strncmp ("#INC", lines, 4))]; % include other files
  
  if (length (include_files))
  {
    foreach include_file (include_files)
    {
      include_file = strtok (include_file)[1];
      include_file = strtrim (include_file, "\"");
      fp = fopen (include_file, "r");
      
      if (fp == NULL)
      {
        flush ("Could not open \"$include_file\""$);
        sleep (2);
        continue;
      }
      
      inc_lines = fgetslines (fp);
      lines = [lines, inc_lines];
      () = fclose (fp);
    }
  }
  
  lines = strtrim (lines);
  lines = lines [where (0 != strncmp ("#", lines, 1))];
  
  if (any (is_substr (lines, "::")))
  {
    line_arrays = array_map (Array_Type, &strchop2, lines, "::");
    foreach line_array (line_arrays)
    {
      line_array = strtrim (line_array);
      key = line_array[0];
      val_array = line_array[[1:]];
      F[key] = val_array;
    }
    
    Words = assoc_get_keys (F);
  }
  else
    Words = lines;
  
#ifnexists SLSH_PATH_ENV
  define_blocal_var ("Words", Words);
#endif
  
  flush ("completing from $Completions_File ..."$);
  % vmessage ("%f", toc ());
}

define slsh_main ()
{
  make_completions_hash (".completions_text");
  
  % Note: Words contains all of the words defined in the input files
  % F contains only those words with non-empty values.
  % But, F has been defined such that F[ANYTHING] will produce
  % an array.
  vmessage ("Length of Words: %u, length of F: %u",
            length(Words), length(F));
}
% Use a string delimiter instead of a character delimiter to chop up a
% string - from a JED post on jed-users
private define strchop2 (str, delim)
{
  variable list = {};
  variable len = strbytelen (delim);
  variable i0 = 1, i;
  
  while (i = is_substrbytes (str, delim, i0), i != 0)
  {
    list_append (list, substrbytes (str, i0, i-i0));
    i0 = i + len;
  }
  
  list_append (list, substrbytes (str, i0, -1));
  return list_to_array (list);
}

#ifexists SLSH_PATH_ENV
private define flush (x)
{
  message (x);
}
#endif

private define read_file_lines (file)
{
  variable st = stat_file (file);
  if (st == NULL) return NULL;
  variable bytes;
  () = fread_bytes (&bytes, st.st_size, fopen (file, "r"));
  return strtok (bytes, "\n");	       %  use strtok remove sequences of \n chars
}

private variable F = NULL;
private variable Words;

%% Create the hash of the completions file to complete from
private define make_completions_hash (Completions_File)
{
#ifnexists SLSH_PATH_ENV
  % In slang_mode, if no completions file, then generate one
  if ((0 == file_status (Completions_File)) && ("slang" == detect_mode ()))
  {
    variable funs = _apropos ("Global", ".", 15);
    Words = funs[array_sort (funs)];
    () = write_string_to_file (strjoin (Words, "\n"), Completions_File);
  }
#endif
  
  variable
    include_files,
    inc_lines,
    lines,
    include_file,
    i, j;
  
  if ((F == NULL) || length (F)) % purge a possibly existing hash
    F = Assoc_Type[Array_Type, String_Type[0]];
  
  % tic ();
  flush ("hashing $Completions_File ..."$);
  lines = read_file_lines (Completions_File);
  if (lines == NULL)
    throw OpenError, "could not open completions file, $Completions_File"$;
  
  include_files = lines [where (0 == strncmp ("#INC", lines, 4), &i)]; % include other files
  lines = lines[i];		       % don't include #INC lines
  
  variable lines_list = {lines};
  foreach include_file (include_files)
  {
    include_file = strtok (include_file)[1];
    include_file = strtrim (include_file, "\"");
    lines = read_file_lines (include_file);
    if (lines == NULL)
	  {
      flush ("Could not open \"$include_file\""$);
      sleep (2);
      continue;
	  }
    
    i = where (strncmp (lines, "#", 1));
    if (length(i) != length(lines)) lines = lines [i];
    
    list_append (lines_list, lines);
  }
  
  lines = [__push_list(lines_list)];
  % lines = strtrim (lines);
  
  foreach i (where (is_substr (lines, "::"), &j))
  {
    variable line_array = strtrim (strchop2 (lines[i], "::"));
    variable key = line_array[0];
    F[key] = line_array[[1:]];
    lines[i] = key;
  }
  
  Words = lines;
  
#ifnexists SLSH_PATH_ENV
  define_blocal_var ("Words", Words);
#endif
  flush ("completing from $Completions_File ..."$);
  % vmessage ("%f", toc ());
}

define slsh_main ()
{
  tic();
  
  make_completions_hash (".completions_text");
  
  % Note: Words contains all of the words defined in the input files
  % F contains only those words with non-empty values.
  % But, F has been defined such that F[ANYTHING] will produce
  % an array.
  vmessage ("T=%S, Length of Words: %u, length of F: %u", toc(),
            length(Words), length(F));
  
  % print(Words[array_sort(Words)], "/tmp/4.dat");
}
# slprof profile report for:
#  /home/mojo/bin/slprof -l -o prof-test0.dat ./test0.sl


#----------------------------------------------------------------
#                  Function Call Profile Report
#----------------------------------------------------------------

#function                 ncalls      ms/call  totalselfms    totalsecs Function File
make_completions_hash          1    6531.9493    4501.3176       6.5319  471898 1887625 /home/mojo/./test0.sl
strchop2                  471896       0.0043    2030.6072       2.0306       0 2831979 /home/mojo/./test0.sl
slsh_main                      1    6532.4024       0.4531       6.5324       1       2 /home/mojo/./test0.sl
flush                          2       0.0123       0.0246       0.0000       0       2 /home/mojo/./test0.sl

#----------------------------------------------------------------
#                  Line by Line Profile Report
#----------------------------------------------------------------

#ncalls      ms/call totalselfms     totalsecs  Fcalls   Scalls File:line
      1    631.06179    631.06179      0.63106       0       0 /home/mojo/./test0.sl:89
      2    268.62279    537.24558      0.53725       0       0 /home/mojo/./test0.sl:84
 471896      0.00099    466.88256      0.46688       0       0 /home/mojo/./test0.sl:97
      1    391.84379    391.84379      0.39184       0       0 /home/mojo/./test0.sl:90
 471896      0.00080    379.15456      0.37915       0       0 /home/mojo/./test0.sl:99
 472097      0.00079    372.81917      0.37282       0       0 /home/mojo/./test0.sl:9
 471896      0.00078    369.86056      0.36986       0       0 /home/mojo/./test0.sl:15
      1   2386.15895    355.55177      2.38616  471896 2831979 /home/mojo/./test0.sl:94
      2    172.14379    344.28758      0.34429       0       0 /home/mojo/./test0.sl:83
 471896      0.00052    244.31156      0.24431       0       0 /home/mojo/./test0.sl:16
 471896      0.00047    219.77956      0.21978       0       0 /home/mojo/./test0.sl:100
      1    142.46779    142.46779      0.14247       0       0 /home/mojo/./test0.sl:103
 471896      0.00019     88.75056      0.08875       0       0 /home/mojo/./test0.sl:98
 471896      0.00015     70.91356      0.07091       0       0 /home/mojo/./test0.sl:6
 471896      0.00012     56.07456      0.05607       0       0 /home/mojo/./test0.sl:5
 471896      0.00005     22.29756      0.02230       0       0 /home/mojo/./test0.sl:7
      1      5.83379      5.83379      0.00583       0       0 /home/mojo/./test0.sl:92
      1      0.44979      0.44979      0.00045       0       0 /home/mojo/./test0.sl:124
    201      0.00080      0.16161      0.00016       0       0 /home/mojo/./test0.sl:11
      2      0.01229      0.02458      0.00002       0       0 /home/mojo/./test0.sl:22
      1      0.02379      0.02379      0.00002       0       0 /home/mojo/./test0.sl:64
    201      0.00011      0.02161      0.00002       0       0 /home/mojo/./test0.sl:12
      2      0.00779      0.01558      0.00002       0       0 /home/mojo/./test0.sl:74
      2      0.00729      0.01458      0.00001       0       0 /home/mojo/./test0.sl:85
      1      0.03007      0.01429      0.00003       1       1 /home/mojo/./test0.sl:112
      2      0.00479      0.00958      0.00001       0       0 /home/mojo/./test0.sl:72
      1      0.00879      0.00879      0.00001       0       0 /home/mojo/./test0.sl:66
      1      0.00679      0.00679      0.00001       0       0 /home/mojo/./test0.sl:60
      1      0.00579      0.00579      0.00001       0       0 /home/mojo/./test0.sl:43
      2      0.00229      0.00458      0.00000       0       0 /home/mojo/./test0.sl:73
      1      0.01307      0.00429      0.00001       1       1 /home/mojo/./test0.sl:63
      1      0.00279      0.00279      0.00000       0       0 /home/mojo/./test0.sl:65
      1      0.00179      0.00179      0.00000       0       0 /home/mojo/./test0.sl:44
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:95
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:51
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:56
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:54
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:68
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:49
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:45
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:46
      1      0.00079      0.00079      0.00000       0       0 /home/mojo/./test0.sl:26
      2      0.00029      0.00058      0.00000       0       0 /home/mojo/./test0.sl:76
      1   6531.94961      0.00029      6.53195  471899 4719606 /home/mojo/./test0.sl:118
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:70
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:50
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:47
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:48
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:53
      1     -0.00021     -0.00021     -0.00000       0       0 /home/mojo/./test0.sl:59
# slprof profile report for:
#  /home/mojo/bin/slprof -l -o prof-test1.dat ./test1.sl


#----------------------------------------------------------------
#                  Function Call Profile Report
#----------------------------------------------------------------

#function                 ncalls      ms/call  totalselfms    totalsecs Function File
make_completions_hash          1     596.2284     400.5660       0.5962     106     431 /home/mojo/./test1.sl
read_file_lines                3      64.8555     194.5666       0.1946       0      12 /home/mojo/./test1.sl
strchop2                     101       0.0105       1.0613       0.0011       0    1209 /home/mojo/./test1.sl
flush                          2       0.0173       0.0346       0.0000       0       2 /home/mojo/./test1.sl
slsh_main                      1     595.9904      -0.2380       0.5960       1       3 /home/mojo/./test1.sl

#----------------------------------------------------------------
#                  Line by Line Profile Report
#----------------------------------------------------------------

#ncalls      ms/call totalselfms     totalsecs  Fcalls   Scalls File:line
      3     64.25180    192.75539      0.19276       0       0 /home/mojo/./test1.sl:32
      1    179.83680    179.83680      0.17984       0       0 /home/mojo/./test1.sl:89
      2     14.14130     28.28260      0.02828       0       0 /home/mojo/./test1.sl:83
      1     11.18180     11.18180      0.01118       0       0 /home/mojo/./test1.sl:92
      3      0.55046      1.65139      0.00165       0       0 /home/mojo/./test1.sl:31
    101      0.01338      0.28997      0.00135     101    1209 /home/mojo/./test1.sl:94
    302      0.00087      0.26387      0.00026       0       0 /home/mojo/./test1.sl:9
    201      0.00088      0.17731      0.00018       0       0 /home/mojo/./test1.sl:11
    101      0.00113      0.11456      0.00011       0       0 /home/mojo/./test1.sl:96
    101      0.00112      0.11356      0.00011       0       0 /home/mojo/./test1.sl:16
    101      0.00101      0.10156      0.00010       0       0 /home/mojo/./test1.sl:15
      3      0.01246      0.03739      0.00004       0       0 /home/mojo/./test1.sl:28
    101      0.00033      0.03356      0.00003       0       0 /home/mojo/./test1.sl:97
      1      0.03180      0.03180      0.00003       0       0 /home/mojo/./test1.sl:119
    101      0.00027      0.02756      0.00003       0       0 /home/mojo/./test1.sl:95
    101      0.00024      0.02456      0.00002       0       0 /home/mojo/./test1.sl:5
    201      0.00012      0.02331      0.00002       0       0 /home/mojo/./test1.sl:12
      2      0.00930      0.01860      0.00002       0       0 /home/mojo/./test1.sl:22
      1      0.04115      0.01636      0.00004       1       1 /home/mojo/./test1.sl:105
    101      0.00014      0.01456      0.00001       0       0 /home/mojo/./test1.sl:6
      1      0.00880      0.00880      0.00001       0       0 /home/mojo/./test1.sl:67
      1      0.00880      0.00880      0.00001       0       0 /home/mojo/./test1.sl:59
      2      0.00380      0.00760      0.00001       0       0 /home/mojo/./test1.sl:73
      1      0.01515      0.00536      0.00002       1       1 /home/mojo/./test1.sl:62
      2      0.00230      0.00460      0.00000       0       0 /home/mojo/./test1.sl:74
    101      0.00005      0.00456      0.00000       0       0 /home/mojo/./test1.sl:7
      2     97.25805      0.00371      0.19452       2       8 /home/mojo/./test1.sl:75
      2      0.00130      0.00260      0.00000       0       0 /home/mojo/./test1.sl:86
      3      0.00080      0.00239      0.00000       0       0 /home/mojo/./test1.sl:29
      1      0.00180      0.00180      0.00000       0       0 /home/mojo/./test1.sl:35
      2      0.00080      0.00160      0.00000       0       0 /home/mojo/./test1.sl:76
      1      0.00080      0.00080      0.00000       0       0 /home/mojo/./test1.sl:100
      1      0.00080      0.00080      0.00000       0       0 /home/mojo/./test1.sl:58
      1      0.00080      0.00080      0.00000       0       0 /home/mojo/./test1.sl:70
      1      0.00080      0.00080      0.00000       0       0 /home/mojo/./test1.sl:64
      1      0.00080      0.00080      0.00000       0       0 /home/mojo/./test1.sl:68
      2      0.00030      0.00060      0.00000       0       0 /home/mojo/./test1.sl:84
      1    596.22877      0.00036      0.59623     107    1654 /home/mojo/./test1.sl:113
      1      0.05455      0.00036      0.00005       1       4 /home/mojo/./test1.sl:63
      1     -0.00020     -0.00020     -0.00000       0       0 /home/mojo/./test1.sl:71
      1     -0.27220     -0.27220     -0.00027       0       0 /home/mojo/./test1.sl:111

[2020 date index] [2020 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]