Mac OS 9 Lives

Classic Mac OS Software => Application Development & Programming => Topic started by: OS923 on February 07, 2019, 05:05:02 AM

Title: Browsing Wikipedia offline on OS 9
Post by: OS923 on February 07, 2019, 05:05:02 AM
WikiTaxi stopped working on my Windows computer (out of memory). Then I switched to BzReader. This has the advantage that you can select and copy the text, style and links. Unfortunately it doesn't understand templates, just like WikiTaxi.

I did some calculations for having Wikipedia offline on OS 9. It looks like a realistic plan. A Wikipedia data dump XML is now around 66 GB. The longest page title is 266 characters. A plain text index is around 750 MB. I could split this index into 676 files like aa.idx, ab.idx and so on. If I want page "aax" then I search "aax" in aa.idx, which requires reading around 1.1 MB. There I find the offset of the "<page>" tag in the XML. Then I need to read a few lines and convert the Wikitext to HTML. It should be possible to do this in a fraction of a second and with less than 5 MB of memory.

The idea is to install a small HTML page and AppleScript CGI bin in MacHTTP that communicates via AppleScript with my program.

How the Wikitext has to be converted to HTML an how formulas can be converted to pictures can be found in the source code of BzReader. It doesn't seem too difficult.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on February 13, 2019, 06:36:16 AM
The indexing program works already. This was one day of work. The speed is comparable to similar programs for Windows. The solution will continue to work until the uncompressed Wikipedia data dump XML is around 1 TB.

This is just an intermediate project because my Windows solution was insufficient.

The classes for reading and writing the index will be published as a library. The translation of Wikitext and the rendering of formulas will be open source plugins.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 16, 2019, 08:51:50 AM
"Build index" is replaced with "Index Wikipedia". It works with multiple languages. In the example I use English, French and Dutch. I'm now working on "Offline Wikipedia".
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 20, 2019, 08:58:20 AM
Offline Wikipedia works already. The pages use an AppleScript CGI which uses AppleScript to communicate with the Offline Wikipedia program. It's fast and Unicode was never an issue.

Unfortunately, my pages are cut after 32K. It looks like the author of MacHTTP limited the text that can be returned by an AppleScript CGI to 32K because he thought that strings in AppleScript are limited to 32K.

I do a simple conversion from Wikitext to HTML: I replace the special characters with spaces. I'll do the correct translation when I've solved the 32K problem.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 22, 2019, 01:19:05 AM
I copied the source code of MacHTTP. I spent 2 hours trying to make this work in the hopes of changing the 32K limit. Nothing but crashes.

Then I switched to Apple's "Web sharing". My URLs don't look so nice, but it worked immediately and without 32K limit.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 23, 2019, 09:05:51 AM
I linked MimeText 1.77 to convert formulas to pictures. It works already as a shared library. I find it OK, but when I use antialiasing then some characters are not "closed".
Title: Re: Browsing Wikipedia offline on OS 9
Post by: IIO on September 24, 2019, 04:08:39 PM
seems readable from the third on (24pt?)
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 25, 2019, 01:57:17 AM
It's MimeTex, not MimeText. One feature doesn't work (calendar) because it uses too much memory. Antialiasing and transparency are optional.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: IIO on September 25, 2019, 01:47:29 PM
or print to a bitmap?
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 26, 2019, 02:36:20 AM
It can return a picture as a file or in stdout. I changed that to return it in memory instead of stdout. I ask to return it in memory. Then I copy it into the reply of the Apple event that was received from the AppleScript CGI.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on September 26, 2019, 02:41:51 AM
I found here an interesting program which converts Wikitext to XML:
https://dizzylogic.com/wiki-parser/

Unfortunately, it's not exactly like Wikipedia:
Quote
Wiki Parser currently omits tables in Wikipedia pages as they are almost impossible to present in textual format. It also flattens multi-level lists (but keeps every list element in its own XML node).

It's open source. This shows how to handle templates.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: teroyk on February 18, 2020, 02:27:20 AM
Any new news about this project?
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on February 20, 2020, 06:42:32 AM
No.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on July 15, 2020, 11:18:40 AM
This is discontinued because I use now Kiwix on my tablet.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: teroyk on July 24, 2020, 10:32:03 AM
This is discontinued because I use now Kiwix on my tablet.

from Kiwix page: "The Kiwix Reader runs on all platforms and operating systems"
ok..I started with platforms Mac PPC 32-bit, no..then Mac PPC 64-bit, no, Atari, no, MSX, no, 486, no...ok..how about phones and tablets..Nokia, no..Jolla, no...
maybe I should find by OS..Mac OS 9, no, Mac OS X, no, TOS, no, SymbOS, no, MenuetOS, no..ok..how about phones and tablets...Symbian, no, Maemo, no..Sailfish, no..
ok...I might interested in continue your work someday...
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on July 24, 2020, 10:45:06 AM
I would be interested to convert XML data dumps if the Wikitext has already been converted to XML. I don't want to dabble in Wikitext anymore.
Title: Re: Browsing Wikipedia offline on OS 9
Post by: teroyk on July 24, 2020, 12:48:31 PM
I would be interested to convert XML data dumps if the Wikitext has already been converted to XML. I don't want to dabble in Wikitext anymore.

Intresting is that Xowa (another offline wikipedia thing (http://xowa.org)) works top of Java 1.7+..and Mac OS 9.2.2 has Java 2.2.5.
Sadly I don't know much of Java, but sources has java or jar extensions...would it be too hard make Mac OS 9 version?
Title: Re: Browsing Wikipedia offline on OS 9
Post by: OS923 on July 28, 2020, 08:12:56 AM
7.0 is the product version. 1.7.0 is the developer version.