Author Topic: Browsing Wikipedia offline on OS 9  (Read 8988 times)

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Browsing Wikipedia offline on OS 9
« on: February 07, 2019, 05:05:02 AM »
WikiTaxi stopped working on my Windows computer (out of memory). Then I switched to BzReader. This has the advantage that you can select and copy the text, style and links. Unfortunately it doesn't understand templates, just like WikiTaxi.

I did some calculations for having Wikipedia offline on OS 9. It looks like a realistic plan. A Wikipedia data dump XML is now around 66 GB. The longest page title is 266 characters. A plain text index is around 750 MB. I could split this index into 676 files like aa.idx, ab.idx and so on. If I want page "aax" then I search "aax" in aa.idx, which requires reading around 1.1 MB. There I find the offset of the "<page>" tag in the XML. Then I need to read a few lines and convert the Wikitext to HTML. It should be possible to do this in a fraction of a second and with less than 5 MB of memory.

The idea is to install a small HTML page and AppleScript CGI bin in MacHTTP that communicates via AppleScript with my program.

How the Wikitext has to be converted to HTML an how formulas can be converted to pictures can be found in the source code of BzReader. It doesn't seem too difficult.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #1 on: February 13, 2019, 06:36:16 AM »
The indexing program works already. This was one day of work. The speed is comparable to similar programs for Windows. The solution will continue to work until the uncompressed Wikipedia data dump XML is around 1 TB.

This is just an intermediate project because my Windows solution was insufficient.

The classes for reading and writing the index will be published as a library. The translation of Wikitext and the rendering of formulas will be open source plugins.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #2 on: September 16, 2019, 08:51:50 AM »
"Build index" is replaced with "Index Wikipedia". It works with multiple languages. In the example I use English, French and Dutch. I'm now working on "Offline Wikipedia".

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #3 on: September 20, 2019, 08:58:20 AM »
Offline Wikipedia works already. The pages use an AppleScript CGI which uses AppleScript to communicate with the Offline Wikipedia program. It's fast and Unicode was never an issue.

Unfortunately, my pages are cut after 32K. It looks like the author of MacHTTP limited the text that can be returned by an AppleScript CGI to 32K because he thought that strings in AppleScript are limited to 32K.

I do a simple conversion from Wikitext to HTML: I replace the special characters with spaces. I'll do the correct translation when I've solved the 32K problem.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #4 on: September 22, 2019, 01:19:05 AM »
I copied the source code of MacHTTP. I spent 2 hours trying to make this work in the hopes of changing the 32K limit. Nothing but crashes.

Then I switched to Apple's "Web sharing". My URLs don't look so nice, but it worked immediately and without 32K limit.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #5 on: September 23, 2019, 09:05:51 AM »
I linked MimeText 1.77 to convert formulas to pictures. It works already as a shared library. I find it OK, but when I use antialiasing then some characters are not "closed".

Offline IIO

  • Platinum Member
  • *****
  • Posts: 4439
  • just a number
Re: Browsing Wikipedia offline on OS 9
« Reply #6 on: September 24, 2019, 04:08:39 PM »
seems readable from the third on (24pt?)
insert arbitrary signature here

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #7 on: September 25, 2019, 01:57:17 AM »
It's MimeTex, not MimeText. One feature doesn't work (calendar) because it uses too much memory. Antialiasing and transparency are optional.

Offline IIO

  • Platinum Member
  • *****
  • Posts: 4439
  • just a number
Re: Browsing Wikipedia offline on OS 9
« Reply #8 on: September 25, 2019, 01:47:29 PM »
or print to a bitmap?
insert arbitrary signature here

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #9 on: September 26, 2019, 02:36:20 AM »
It can return a picture as a file or in stdout. I changed that to return it in memory instead of stdout. I ask to return it in memory. Then I copy it into the reply of the Apple event that was received from the AppleScript CGI.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #10 on: September 26, 2019, 02:41:51 AM »
I found here an interesting program which converts Wikitext to XML:
https://dizzylogic.com/wiki-parser/

Unfortunately, it's not exactly like Wikipedia:
Quote
Wiki Parser currently omits tables in Wikipedia pages as they are almost impossible to present in textual format. It also flattens multi-level lists (but keeps every list element in its own XML node).

It's open source. This shows how to handle templates.

Offline teroyk

  • Platinum Member
  • *****
  • Posts: 623
  • -
Re: Browsing Wikipedia offline on OS 9
« Reply #11 on: February 18, 2020, 02:27:20 AM »
Any new news about this project?

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #12 on: February 20, 2020, 06:42:32 AM »
No.

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #13 on: July 15, 2020, 11:18:40 AM »
This is discontinued because I use now Kiwix on my tablet.

Offline teroyk

  • Platinum Member
  • *****
  • Posts: 623
  • -
Re: Browsing Wikipedia offline on OS 9
« Reply #14 on: July 24, 2020, 10:32:03 AM »
This is discontinued because I use now Kiwix on my tablet.

from Kiwix page: "The Kiwix Reader runs on all platforms and operating systems"
ok..I started with platforms Mac PPC 32-bit, no..then Mac PPC 64-bit, no, Atari, no, MSX, no, 486, no...ok..how about phones and tablets..Nokia, no..Jolla, no...
maybe I should find by OS..Mac OS 9, no, Mac OS X, no, TOS, no, SymbOS, no, MenuetOS, no..ok..how about phones and tablets...Symbian, no, Maemo, no..Sailfish, no..
ok...I might interested in continue your work someday...

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #15 on: July 24, 2020, 10:45:06 AM »
I would be interested to convert XML data dumps if the Wikitext has already been converted to XML. I don't want to dabble in Wikitext anymore.

Offline teroyk

  • Platinum Member
  • *****
  • Posts: 623
  • -
Re: Browsing Wikipedia offline on OS 9
« Reply #16 on: July 24, 2020, 12:48:31 PM »
I would be interested to convert XML data dumps if the Wikitext has already been converted to XML. I don't want to dabble in Wikitext anymore.

Intresting is that Xowa (another offline wikipedia thing (http://xowa.org)) works top of Java 1.7+..and Mac OS 9.2.2 has Java 2.2.5.
Sadly I don't know much of Java, but sources has java or jar extensions...would it be too hard make Mac OS 9 version?

Offline OS923

  • Platinum Member
  • *****
  • Posts: 888
Re: Browsing Wikipedia offline on OS 9
« Reply #17 on: July 28, 2020, 08:12:56 AM »
7.0 is the product version. 1.7.0 is the developer version.