1. Hello!

    First of all, welcome to MapleLegends! You are currently viewing the forums as a guest, so you can only view the first post of every topic. We highly recommend registering so you can be part of our community.

    By registering to our forums you can introduce yourself and make your first friends, talk in the shoutbox, contribute, and much more!

    This process only takes a few minutes and you can always decide to lurk even after!

    - MapleLegends Administration-
  2. Experiencing crashes with 'Unknown Error'? Make sure you are on the latest MapleLegends version. The current latest version is found by clicking here.
    Dismiss Notice

OwlRepo - a repository of transcribed owl searches

Discussion in 'Items & Mesos' started by geospiza, Jul 3, 2020.

Thread Status:
You must be a logged-in, registered member of this site to view further posts in this thread.
  1. geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    179
    320
    196
    Apr 16, 2020
    11:13 AM
    geospiza
    Dark Knight
    Hey all, I'd like to share with all of you a site I built for fun and learning. OwlRepo is a site that machine transcribes owl searches and summarizes them. The best part is that anyone can help out by uploading screenshots from in-game.

    upload_2020-12-1_22-26-10.png

    The front page contains a searchable summary of every upload to the site. This is subject to change while I figure out the best way to present information about current prices. I usually set prices in my stores around the p25 value, which is the 25th percentile or the price that is greater than 25% of the items on the market at that time.

    If you click on an item, it will take you to the history of owl screenshots. You can get a sense for the change in the market by looking at the plot over time.

    upload_2020-12-1_22-40-25.png

    If you click on one of the items on this page, you will see the complete contents of the machine transcription. You can copy and paste the results into a spreadsheet for your own purposes. For a quick example of what's currently possible, take a look of these owl searches for all of the cape scrolls for base stats.

    upload_2020-12-1_22-41-34.png

    The centerpiece of the site is the ability to upload your own owl searches. I've made uploading as streamlined as possible. No account necessary, and no more of the screenshot than necessary to transcribe the owl search. Try it out :)

    I use the Owl of Minerva to search for all of these items, and take an in-game screenshot of each page.

    upload_2020-7-3_13-19-51.png

    I then use the Upload tab from the screenshots in the MapleLegendsHD folder.

    upload_2020-7-3_11-47-27.png

    A minute later, the transcribed data is available in the index.

    upload_2020-7-3_11-48-41.png


    In case you're interested in some of the details of how this thing was built:

    The heart of this is an OCR engine called tesseract. I put together a small library that extracts the data from the screenshots into a machine readable format. Each batch of owl searches are put into JSON file that can be used in other applications. The web service exposes this as part of it's API:

    https://owlrepo.com/api/v1/data/bcba6c04-b249-4905-8685-e4d45134bc5e/slim.json

    Since I have not open sourced the server code at the time of writing, here's the schema for the response.

    Code:
    {
      "type": "object",
      "properties": {
        "screenshot": {
          "type": "object",
          "properties": {
            "timestamp": { "type": "string", "format": "date-time" },
            "name": { "type": "string" },
            "sha1": { "type": "string", "format": "uri" }
          },
          "required": ["timestamp", "name", "sha1"]
        },
        "search": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "item": { "type": "string" },
            "results": { "type": "integer" }
          },
          "required": ["text", "item", "results"]
        },
        "paginator": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "current": { "type": "integer" },
            "total": { "type": "integer" }
          },
          "required": ["text", "current", "total"]
        },
        "body": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "entries": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "id": { "type": "string" },
                  "store_name": { "type": "string" },
                  "bundle": { "type": "integer", "minimum": 1, "maximum": 200 },
                  "price": { "type": "integer", "minimum": 1 },
                  "quantity": { "type": "integer", "minimum": 1 }
                },
                "required": ["id", "store_name", "bundle", "price", "quantity"]
              }
            }
          },
          "required": ["body", "entries"]
        }
      },
      "required": ["screenshot", "search", "paginator", "body"]
    }
    

    When I'm ready to ship changes to the back-end, a new Docker image is built and deployed to various services. The site is hosted on Google Cloud Platform and takes advantage of their free tier.


    upload_2020-7-3_11-58-45.png
    [source]

    It should scale up and accept millions of owl searches a month well. Of course, I'm the only user at the moment with maybe hundreds of pages a day.

    The data gets inserted into a BigQuery table, which can be used to generate tables and charts about everything that is uploaded. Here's the SQL that's used to generate the current chart pictured above. There are more sophisticated queries that can be used to handle bad actors.

    Code:
    WITH
      -- convert the nested JSON structure into a row per seller
      flattened AS (
      SELECT
        task_id,
        batch_sha1,
        payload.screenshot.timestamp AS screenshot_timestamp,
        payload.screenshot.sha1 AS screenshot_sha1,
        payload.search.item AS search_item,
        payload.search.results AS search_results,
        REPLACE(REPLACE(paginator.text, "(", ""), ")", "") AS paginator_text,
        entries.*,
        entry_index
      FROM
        owl.upload_v1 AS upload,
        UNNEST(upload.payload) payload,
        UNNEST(body.entries) entries
      WITH
      OFFSET
        AS entry_index ),
      extracted AS (
      SELECT
        task_id,
        DATE(screenshot_timestamp) AS screenshot_date,
        search_item,
        search_results,
        price
      FROM
        flattened )
      -- generate statistics over time/search result window
    SELECT
      DISTINCT * EXCEPT(price),
      COUNT(*) OVER (PARTITION BY task_id, screenshot_date, search_item, search_results) AS search_results_captured,
      percentile_disc(price,
        0.5) OVER (PARTITION BY task_id, screenshot_date, search_item, search_results) AS median_price
    FROM
      extracted
    ORDER BY
      search_item,
      screenshot_date,
      search_results_captured DESC
    


    The back-end is otherwise a dumb pipe that just transcribes the owls.

    The site itself handles displaying the machine-read files in a human friendly way. It's written in Svelte, which has been enjoyable to work with so far. If you're curious, the source can be inspected in the browser development tools. I'm no web developer though, so it can look ugly.

    Old individual listing page screenshot for cape scrolls.
    upload_2020-7-3_11-38-52.png

    This dashboard is no longer, you did well Data Studio.


    Updated 2020-12-01
    • Updated out-of-date screenshots into an appendix section
    • Quoted my closing thoughts to keep context for discussion, but the information is mostly out of date.
     

    Attached Files:

    • Great Work Great Work x 27
    • Like Like x 10
    • Agree Agree x 1
Thread Status:
You must be a logged-in, registered member of this site to view further posts in this thread.

Share This Page