code parser to help checking

Place your projects here
Post Reply
Stuart
Posts: 136
Joined: Fri Feb 19, 2021 7:46 pm
Has thanked: 5 times
Been thanked: 20 times

code parser to help checking

Post by Stuart »

This little program will scan your code and identify all user-defined names (variables and procedures) and the lines on which they can be found.
It contains the name of the source file, which will need editing, and will also need some work doing to the list of basic keywords, as there are many and I have only included those used in this program. The listing is the result of running the program on itself. It is fine for relatively small code files, it gets indigestion if applied to very large files, either through lack of memory or by losing its connection to the browser. The dimension of varname$ can be reduced to suit your program if memory is a problem.

Code: [Local Link Removed for Guests]

file /parser.bas exists
1  'program to list all user-defined names in a basic program, with line numbers listed where they are used.
2  'use to identify unused names, misspelled names, to find references etc.
3  'the keyword list is not complete, use the first run to find keywords that need to be added to the list
4  'feel free to improve the code, there is plenty of scope to do so
5  dim varname$(400,2) 'rows of variable names, string of line numbers
6  dim keywordlist$(33) = "-","space$","dim","if","then","wlog","do","until", "loop","sub","select","case","end","while","else","file","read$", "len","mid$","to","str$","exists","lcase$","wend","for","next","exit","or","ramfree","and","endif"
7  keywordcount = 33
8  line$ = ""
9  result = 9
10  varcount = 0
11  fname$ = "/parser.bas"
12  foundvar= 0
13  if file.exists(fname$) then wlog "file "+fname$+" exists" 
14  i = 1
15  line$ = file.read$(fname$,i) 'line by line
16  line$ = lcase$(line$)
17  do while line$ <> "_eof_"
18     wlog str$(i)+"  "+line$
19     if len(line$)>1 then process_line
20     i = i + 1
21     line$ = file.read$(fname$,i) 'line by line
22     line$ = lcase$(line$)
23  loop
24  i = i - 1
25  wlog "lines read "+str$(i)
26  listvars
27  end
28  
29  sub process_line
30  extractedvar$ = ""
31  pos = 1
32  foundvar = 0
33  while pos <= len(line$)
34  thischar$ = mid$(line$,pos,1)
35  select case thischar$
36  case "'": pos = len(line$) 'skip
37  case "a" to "z", "a" to "z"
38           extractedvar$=extractedvar$ + thischar$
39           foundvar = 1
40  case "0" to "9","$","_"
41           if foundvar = 1 then extractedvar$=extractedvar$ + thischar$
42  case """":
43           pos = pos + 1
44           do while mid$(line$,pos,1)<> """"
45              pos = pos + 1
46           loop    'scan to closing quote
47  case else: '=() etc. etc.
48           if foundvar = 1 then capture_name
49           extractedvar$ = ""
50           foundvar = 0
51  end select
52  if foundvar = 1 and pos = len(line$) then capture_name
53  pos = pos+1
54  wend
55  end sub
56  
57  sub capture_name
58  keyword_test 
59  if result = 1 then savevar
60  end sub
61  
62  sub keyword_test
63  result = 1
64    for k = 1 to keywordcount
65       if extractedvar$ = keywordlist$(k) then  result = 0
66     next
67  end sub
68  
69  sub savevar 'adds the var to the list
70  j=1
71  do until (varname$(j,1) = extractedvar$) or (varname$(j,1)="")
72  j = j + 1
73  loop
74  if (varname$(j,1) = extractedvar$) then 
75    varname$(j,2) = varname$(j,2)+" "+str$(i)
76  else
77    varname$(j,1) = extractedvar$
78    varname$(j,2) = varname$(j,2)+" "+str$(i)
79    varcount = varcount+1
80    'wlog "varcount = "+str$(varcount)
81    j=j+1
82  endif
83  end sub
84  
85  sub listvars
86  wlog "========================================="
87  wlog "   program names in file: "+ fname$
88  wlog "========================================="
89  wlog "no of variable and subprogram names: "+str$(varcount)
90  for i = 1 to varcount
91  tablength = 20-len(varname$(i,1))
92  wlog varname$(i,1)+space$(tablength)+varname$(i,2)
93  next
94  end sub
95  
96  
lines read 96
=========================================
   PROGRAM NAMES IN FILE: /parser.bas
=========================================
no of variable and subprogram names: 20
varname$             5 71 71 74 75 75 77 78 78 91 92 92
keywordlist$         6 65
keywordcount         7 64
line$                8 15 16 16 17 18 19 21 22 22 33 34 36 44 52
result               9 59 63 65
varcount             10 79 79 89 90
fname$               11 13 13 15 21 87
foundvar             12 32 39 41 48 50 52
i                    14 15 18 20 20 21 24 24 25 75 78 90 91 92 92
process_line         19 29
listvars             26 85
extractedvar$        30 38 38 41 41 49 65 71 74 77
pos                  31 33 34 36 43 43 44 45 45 52 53 53
thischar$            34 35 38 41
capture_name         48 52 57
keyword_test         58 62
savevar              59 69
k                    64 65
j                    70 71 71 72 72 74 75 75 77 78 78 81 81
tablength            91 92
Stuart
Posts: 136
Joined: Fri Feb 19, 2021 7:46 pm
Has thanked: 5 times
Been thanked: 20 times

Re: code parser to help checking

Post by Stuart »

here is a better version, written in python. I plan to incrementally improve it hence the messy code. But it will read an annex file, list the variables and where they appear, sorted alphabetically. the source code to be analysed is hard-coded so will need editing. It seems to work in a flash even on a large program. Plenty of scope for improvement. On linux I type: 'python3 parser.py > report.text ' and the listing appears in report.txt for reading or printing.

Code: [Local Link Removed for Guests]

#program to list all user-defined names in a basic program, with line numbers listed where they are used.
#use to identify unused names, misspelled names, to find references etc.
#import keyword
#print (keyword.kwlist)
keywordlist = {"-","space","dim","if",":","print","do","AND","asc","then","CLS",
"until", "loop","select","case","end","while","else","file","read",
"len","mid","to","str","exists","lcase","wend","for","next","exit","or",
"ramfree","and","endif","elif","set","chr","in","not",
"sort","open","def","keys","str$","html","dateunix","timeunix","ResetReason",
"SSID$","chr$","MOD","ASC","JSON$","tr","td","button$","GetElementById"}
print("list all user-defined names in an AnnexRDS basic program")
#global subcount 
#subcount = 0
extractedvar = ""
valid_first = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSUVWXYZ&")
valid_subsequent = set("01234567890$_")
quote = (chr(124),chr(34))   #allows " and | as quotes, and ' as comment/not quote
vardict = {"firstvar":"0"}  #null entry so it exists
subdict = {"firstsub":"0"}
fname = "Annex_chw_1.bas"
f = open(fname,"r")
print("=========================================================")
print("scanning file ",fname," for user-defined names ")
print("=========================================================")
print("skips to end of line after finding single unquoted quote")
def process_line(thislineno):
    extractedvar = ""
    foundvar = 0
    lineno =  1
    pos = 0
    while pos < len(line):
       thischar = line[pos]
       if thischar == "'": 
          pos = len(line) #skip
       elif thischar in valid_first:
          extractedvar = extractedvar + thischar
          foundvar = 1
       elif thischar in valid_subsequent:
          if foundvar == 1 : extractedvar=extractedvar + thischar
       elif thischar in quote:
          quotefound = thischar
          #print("double quote found")
          pos += 1
          while line[pos] != quotefound and pos < len(line) :
             #print("scanning")
             pos +=  1 #scan to closing quote
       else:
         # if foundvar == 1 and extractedvar not in keyword.kwlist and extractedvar not in keywordlist:#for python code
          if foundvar == 1 and extractedvar.lower() not in keywordlist:#for AnnexRDS code
             # if extractedvar == "sub":
              #    subcount += 1
              if extractedvar not in vardict: 
                  vardict[extractedvar] = str(thislineno)
              else: 
                  vardict[extractedvar] =  vardict[extractedvar]+ " "+str(thislineno)
          extractedvar = ""
          foundvar = 0
       pos += 1

lineno = 0
#subcount = 0
for line in f:
  lineno += 1
  if len(line)>1 : result = process_line(lineno)
  
#print ("variables found ",vardict.keys())
#vardict.sort()
sorted_vardict = sorted(vardict.items(),key = lambda x:x[0].lower())
converted_vardict = dict(sorted_vardict)
#print ("No of subs detected ",subcount)

for varname in converted_vardict:
    printstring = converted_vardict[varname]
    spos = 0
    epos = 64
    while spos < len(printstring) :   
        if spos == 0 :
            printline = varname+"\t"+printstring[spos:epos]
        else:
            printline = "\t"+printstring[spos:epos]
        printexline =  printline.expandtabs(20)
        print(printexline)
        spos += 64
        epos += 64


User avatar
cicciocb
Site Admin
Posts: 1989
Joined: Mon Feb 03, 2020 1:15 pm
Location: Toulouse
Has thanked: 426 times
Been thanked: 1329 times
Contact:

Re: code parser to help checking

Post by cicciocb »

I could integrate this functionality directly in the editor of Annex, using Javascript.
User avatar
Electroguard
Posts: 855
Joined: Mon Feb 08, 2021 6:22 pm
Has thanked: 273 times
Been thanked: 321 times

Re: code parser to help checking

Post by Electroguard »

An integrated javascript Vars feature might offer opportunity for some additional benefits.
I seem to remember Espbasic having a Vars page, and I think it may even have shown the current variable values when refreshed.
If Annex had such a page it could be very useful for debugging.
And it should be fairly easy to add a Save Vars button to the page for saving the current variable=value list to a file.
Why not also an Annex Save Vars instruction, which could be a handy precaution before going into Sleep mode, or useful before halting on error.
It might also offer a fairly simple mechanism for passing existing vars and values to newly loaded script 'modules', perhaps as a bas.load option.
Stuart
Posts: 136
Joined: Fri Feb 19, 2021 7:46 pm
Has thanked: 5 times
Been thanked: 20 times

Re: code parser to help checking

Post by Stuart »

Yes to the ideas above. Anything that helps catch errors. Integration would mean the keyword list was complete, and could distinguish types of name e.g. subs, gosubs, etc. I did try and run a version on an M5stack using the filesystem (probably the old version) and it was rather slow and didn't like large files hence python on my laptop, which is instant on 1500 lines of code.

Meanwhile here is a slightly improved version:

Code: [Local Link Removed for Guests]

#program to list all user-defined names in a basic program, with line numbers listed where they are used.
#use to identify unused names, misspelled names, to find references etc.
#everything is converted to lower case for comparison
keywordlist = {"-","space","dim","if",":","print","do","and","asc","then","cls","init",
"until", "loop","select","case","end","while","else","file","read","writeregbyte",
"len","mid","to","str","exists","lcase","wend","for","next","exit","or","gosub",
"ramfree","and","endif","elif","set","chr","in","not","sub","udp","wgetasync","tft",
"sort","open","def","keys","str$","html","dateunix","timeunix","resetreason","rtcmem",
"onerror","onhtmlreload","onwgetasync","readregbyte","reboot","bas","cssid","rgb",
"errline","errmsg","errnum","font","goto","lowram","mac","millis","pos","return",
"wdt","wdtreset", #add any of your favorite keywords - these are just ones I have used
"ssid$","chr$","mod","asc","json$","tr","td","button$","getelementbyid","status"}
print("list all user-defined names in an AnnexRDS basic program")
extractedvar = ""
valid_first = set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSUVWXYZ&")
valid_subsequent = set("01234567890$_")
quote = (chr(124),chr(34))  #allows " and | as quotes, and ' as comment/not quote
# single quote ignored if inside quoted string
vardict = {"firstvar":"0"}  #null entry so it exists
subdict = {"firstsub":"0"}
fname = "Annex_chw_7.bas"
f = open(fname,"r")
print("=========================================================")
print("scanning file ",fname," for user-defined names ")
print("=========================================================")
print("skips to end of line after finding single unquoted quote")
def process_line(thislineno):
    extractedvar = ""
    foundvar = 0
    lineno =  1
    pos = 0
    while pos < len(line):
       thischar = line[pos].lower()
       if thischar == "'": 
          pos = len(line) #skip
       elif thischar in valid_first:
          extractedvar = extractedvar + thischar
          foundvar = 1
       elif thischar in valid_subsequent:
          if foundvar == 1 : extractedvar=extractedvar + thischar
       elif thischar in quote:
          quotefound = thischar
          pos += 1
          while line[pos] != quotefound and pos < len(line) :
             pos +=  1 #scan to closing quote
       else:
          if foundvar == 1 and extractedvar.lower() not in keywordlist:#for AnnexRDS code
              if extractedvar not in vardict: 
                  vardict[extractedvar] = str(thislineno)
              else: 
                  vardict[extractedvar] =  vardict[extractedvar]+ " "+str(thislineno)
          extractedvar = ""
          foundvar = 0
       pos += 1

lineno = 0
#subcount = 0
for line in f:
  lineno += 1
  if len(line)>1 : result = process_line(lineno)
  
print ("count of variable names found: ",len(vardict))
print ("check the list for mis-spellings, variables mentioned only once, etc.")
print(" ")
sorted_vardict = sorted(vardict.items(),key = lambda x:x[0].lower())
converted_vardict = dict(sorted_vardict)

for varname in converted_vardict:
    printstring = converted_vardict[varname]
    spos = 0
    epos = 64
    while spos < len(printstring) :   
        if spos == 0 :
            printline = varname+"\t"+printstring[spos:epos]
        else:
            printline = "\t"+printstring[spos:epos]
        printexline =  printline.expandtabs(20)
        print(printexline)
        spos += 64
        epos += 64


User avatar
cicciocb
Site Admin
Posts: 1989
Joined: Mon Feb 03, 2020 1:15 pm
Location: Toulouse
Has thanked: 426 times
Been thanked: 1329 times
Contact:

Re: code parser to help checking

Post by cicciocb »

If you give me good ideas I can integrate new functionalities in the JavaScript inside the editor (that is very fast)
Post Reply