Article
· Jun 12, 2023 2m read

OEX mapping

Scenario

You all know Open Exchange (OEX) and the is no need for a detailed explanation.

It consists of a directory with various filters and detail pages for packages.

This is great for manual navigation. 
But the most interesting information for me is the content of the blue box on the right.
All content comes from a database somewhere in the background and is not accessible to me and you.

Navigating manually over more than 700 packages in the search of a particular entity is not funny.
So  I decided to have my own table with my criteria of interest.

  • url        // relative as in OEX directory; UNIQUE
  • label      // package name
  • author
  • technology
  • zpmmodul
  • review     // flag if review exists
  • page       // page in OEX directory
  • stars      // assigned in reviews
  • version
  • lastupdate //Date
  • IRIS       // flag for IRIS 
  • ZPM        // flag for support of IPM/ZPM 
  • xurl       // full URL of package

To fill this table I decided to use only methods in embedded Python
all projected as SQL procedures.
So there is no need for terminal access. Except for SQL shell.

Data Loading Strategy

  1. ​​​​​​Scan OEX directory pages to collect Labels and URL and PageNumber
    • This is an acceptable fast step to load and scan ~25 pages => ~730 records
  2. Based on the URLs load and scan detail pages review pages.
    • This results in loading and scanning of ~1500 pages
    • which consumes quite some time depending on your network capacity, 

And then you are free to navigate and query with SQL as you like.     

Video      

GitHub
 

Discussion (2)3
Log in or sign up to continue