Author Topic: I linked Google's HTML5 parser  (Read 875 times)

Offline OS923

  • Platinum Member (500+ Posts)
  • *****
  • Posts: 888
  • Liked:
  • Likes Given: 10
I linked Google's HTML5 parser
« on: December 01, 2021, 09:11:54 AM »
I linked Google's Gumbo parser for OS 9.
It parses around 15 MB/s.
The memory use is around 7 times the file length.
It does complete validation and builds a DOM tree.
Unfortunately the error handling is done with assert, which means that your debug program will show an error and stop, but the release version may crash, for example because of out of memory.
There has to be a better error handling before this is really usable, for example in a web spider program.

Offline OS923

  • Platinum Member (500+ Posts)
  • *****
  • Posts: 888
  • Liked:
  • Likes Given: 10
Re: I linked Google's HTML5 parser
« Reply #1 on: February 22, 2022, 09:06:13 AM »
I linked Lexbor. It's about 6 times faster than Gumbo parser.

I try to improve it with shorter identifiers and better includes because now they have to be done in a particular order and I want random order because I use alphabetical order.

My goal is to convert HTML files to a binary format. This can then be used easily in C++ programs. For example, you could use it to write your own browser or an HTML simplifying proxy like they used on Palm handhelds, or you could simplify an HTML file to view it in iCAB..

Offline OS923

  • Platinum Member (500+ Posts)
  • *****
  • Posts: 888
  • Liked:
  • Likes Given: 10
Re: I linked Google's HTML5 parser
« Reply #2 on: March 02, 2022, 11:13:19 AM »
It's 305,000 lines of code but everything goes as planned.

Offline OS923

  • Platinum Member (500+ Posts)
  • *****
  • Posts: 888
  • Liked:
  • Likes Given: 10
Re: I linked Google's HTML5 parser
« Reply #3 on: April 05, 2022, 08:09:15 AM »
Finished renaming. Now sorting.

 


SimplePortal 2.3.6 © 2008-2014, SimplePortal