Jump to content

User:Dušan Kreheľ

From Wikimedia Belgium

About me

Name: Dušan Kreheľ
Born: 1991, Prešov, Czecho-Slovakia
Life: Prešov Region
Email: dusankrehel@gmail.com
Language (native): Slovak
Language (understand): Czech
Foreign languages: German, English, Croatian
Social networks:

Bot

Articles

Exports

Technologies

d0cmf

d0cmf – shorted Dušan's zero matrix format

Practical:

Practical comparison (2023-01 to 2023-06)
Original d0cmf
RAW bz2 RAW bz2
91531991545B 16923192176B 8272043931B 1415546226B
91.5 GB 16.9 GB 8.2 GB 1.4 GB
9% 8%
Notice: In a practical comparison, in d0cmf, pagevies are divided according to local wikipedia and thus their size is calculated. Sources: [1] [2].

A bonus for the community (if implemented):

  • pageview statistic:
    • Smaller compression size of files.
    • When saving – support for any long time interval.
    • Store statistics divided according to local Wikipedia.

Revision databases

  • Otherwise, the storage of site data.
  • Revision line encoding: From all revision lines is creating the line index and the revision are then the group of the lines indexes. The line index of revision is stored in the binary format.
  • More: https://archive.org/details/revision-database
Demonstration on skwiki-20240101-pages-meta-history.xml.bz2
Now Concept
Database ~19GB 1 to 5GB
(5% to 26%)
Export (bz2) ~2.8GB ~1.1GB
(39%)

Wiki page language

Idea: to standardize the Wiki page language and to have the convertor wiki ⇒ HTML with DOM and DOM manipulation API.

  • Benefits:
    • Determining the boundaries where the bot and the user correspond,
    • better tools for bots
    • one standard, one change tracking document,
    • the support of MediaWiki table in the three part software.

Test implementations (2022-12-09)

0.000333s "dwiki"
0.000275s "dwiki editor"
0.016512s Wikimedia parser
1.260279s Parsoid

More: