1. Hello!

    First of all, welcome to MapleLegends! You are currently viewing the forums as a guest, so you can only view the first post of every topic. We highly recommend registering so you can be part of our community.

    By registering to our forums you can introduce yourself and make your first friends, talk in the shoutbox, contribute, and much more!

    This process only takes a few minutes and you can always decide to lurk even after!

    - MapleLegends Administration-
  2. Experiencing disconnecting after inserting your login info? Make sure you are on the latest MapleLegends version. The current latest version is found by clicking here.
    Dismiss Notice

OwlRepo - a repository of transcribed owl searches

Discussion in 'Items & Mesos' started by geospiza, Jul 3, 2020.

  1. Tate
    Offline

    Tate Capt. Latanica

    352
    229
    278
    Apr 16, 2020
    New Zealand
    8:36 PM
    Potayto
    Shadower
    175
    Beaters
    So i'm trying to upload owls but it's giving me a process error. any ideas why?
     
  2. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    The one possibility that I'm aware of is using HDClient=1 on macOS. I actually didn't realize this was possible until a few weeks ago and haven't touched the site since. The other possibility is that you're taking a screenshot using Window's snipping feature. If this is the case, you should make sure that the screenshot is coming from the screenshot folder in the MapleLegends install directory.

    If you go into the developer console (ctrl-shift-j on Chrome, ctrl-shift-k on Firefox) and then upload a few screenshots, it should give you more hints into what the processing error is. If you let me know what your system/resolution and the error is, I can make a few updates if need be.
     
  3. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I've put some time into site while the server was down today to push out small things I've been working on. First is a toy idea to show the store names of recent uploads.

    upload_2020-9-28_23-57-18.png

    Here are the most popular channel/fm using the same data across all time. I still get confused between fm-channel or channel-fm, so take it with a grain of salt.

    upload_2020-9-29_0-0-21.png

    I'm also looking to build a clean dataset to improve the quality of transcribing screenshots (https://owlrepo.com/curate). I put together a small tool for editing and submitting changes. Try it if you want to help out, but be mindful that I'm going to have to review each one manually.

    upload_2020-9-29_0-7-34.png

    Every task you contribute will get recorded locally and in a database: https://owlrepo.com/personal

    upload_2020-9-29_0-13-55.png
     
  4. zendaine
    Offline

    zendaine Blue Snail

    2
    0
    7
    Aug 29, 2020
    Male
    4:36 AM
    Stooo
    F/P Mage, Beginner
    This is an awesome tool! I've curated a few owls now and I've noticed that sometimes the only thing that's incorrect is the store name. I can't imagine that the store name would be very useful -- so if your model detects a problem in only store name, would it be better to not add to the curation list?

    Also, have you thought about highlighting the entries in the table that your model thinks may be incorrect? Maybe you could color code based on confidence that the entry is correct (red = not certain at all, green = complete certainty).
     
  5. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    Thanks for trying it out and for the great ideas! The goal of curating the images is to build a ground truth data set that can be used to fine-tune a pre-trained OCR model. Without a ground truth, it's hard to quantify confidence on the quality of the results that are coming out. I spent a lot of time playing with knobs and parameters to get the half-decent results on the site now, but there's room for improvement.

    That being said, I *think* it's important that all of the text is correct when it's fed back into the model. Feeding in half-baked data might lead to wonkier results. I have a pretty simple heuristic for gathering up candidate images, but it might be too lax if the store names are the only things that have typos.

    Code:
    WITH
      bad_rows AS (
      SELECT
        *
      FROM
        owlrepo.upload_flat
      WHERE
        price = 0
        OR quantity = 0
        OR bundle = 0
        -- sometimes a 1 is mistaken for a 4
        OR bundle = 4 )
    SELECT
      MIN(task_id) AS task_id,
      screenshot_sha1,
      screenshot_name
    FROM
      bad_rows
    GROUP BY
      screenshot_sha1,
      screenshot_name
    ORDER BY
      screenshot_sha1
    

    I haven't considered adding in confidence to the entries. It looks like tesseract exposes internal statistics about box boundaries and confidence, but I'd estimate a few weekends of dedicated work to get to something like "table entry confidence" in the owlrepo api. It's appealing, but it's a bit too much to implement for this UI atm. Having it show up on the site would require reprocessing all of the images, which is something that I'm saving for a v2 (with the fine-tuned model).
     
  6. zendaine
    Offline

    zendaine Blue Snail

    2
    0
    7
    Aug 29, 2020
    Male
    4:36 AM
    Stooo
    F/P Mage, Beginner
    Yeah that makes sense -- it's just a thought. I curated an image today and there was a row missing in the table:

    upload_2020-10-3_8-53-53.png

    See icyMageXI's row at the bottom. I'm not sure how common this is, but I had no way of fixing it.
     
  7. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I added a checkbox you can mark if a row is missing. It's not too uncommon (e.g % uploaded is 0.98), but they haven't been explicitly added to candidate tasks. The SQL is a bit trickier, but the pages with missing rows are probably important.
     
    • Like Like x 1
    • Agree Agree x 1
  8. Joker
    Offline

    Joker King Slime Retired Staff

    27
    19
    36
    Oct 20, 2020
    Male
    3:36 AM
    Sanji
    Buccaneer
    200
    Nimbus
    I just wanted to post to appreciate how well designed and programmed this site is. Props to you and everyone who’s worked on this. I appreciate it.
     
    • Friendly Friendly x 2
  9. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I've added a new page at owlrepo.com/merchants that lists the items that have been seen across uploaded owl searches by a seller. Not very useful information, but it's neat to be able to look yourself up.

    upload_2020-10-25_23-20-24.png
     
    • Great Work Great Work x 1
  10. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I've updated the original post since some of the screenshots are wildly out of date. The new interface looks like this:

    upload_2020-12-1_22-54-38.png

    I am now cleaning the data before creating this summary table (which means anything outside of 3 MAD are thrown out). The box plot contains all of the important information including the min, p25, p50, p75, max, average, and standard deviation. I also now include the sum of bundle sizes to get a total number. The min and bundle sizes are the most unreliable out of all of these measures because there are cases when the price text overflows into the bundle column (look for this when people price things over 10B). If things look wonky, it's probably because they are.

    The scrolls on the front page now rotate through the 6 scrolls that I've been uploading every 4-5 days instead of being stuck with just Claw for ATT 30. Thanks again NiseNise for the scroll suggestions.

    I also swapped over to Plot.ly for all charts. If you are looking at any of the time series plots, you can click on the legend at the bottom to remove lines from the plot. For example, here is the median price for CSS 20%.

    upload_2020-12-1_23-3-47.png

    Finally I've added some interesting cryptography to prove that an upload was made by you. I'm using JSON Web Keys (JWK) and JSON Web Tokens (JWT) to help secure the upload page. In the personal page, there's a new section:

    upload_2020-12-1_23-7-25.png

    This is unique for every device and is automatically generated for you. The thumbprint is included on every upload, alongside a local timestamp so the owl screenshots align to server time instead of local time. Treat this like you would a bitcoin wallet (with about the same level of anonymity) and don't lose your keys. I am planning on adding a way to claim a nickname, as well as an option to create new keys. I'll put together a small thank you/ranking page now that there's an account-like system in place.

    Usage has been steadily growing which I finding exciting. The last day of November (2020-11-30) had 346 unique visitors to the site which is something like 5% of the daily active player base. I am looking for feedback on the front page because I know that it is incredibly information dense and probably intimidating to folks. Is it too much? Is it confusing? Are there things you want to know more about? Would you rather see prices in a different way?

    Let me know your thoughts -- feel free to reach out to me here or on discord.
     

    Attached Files:

    • Great Work Great Work x 6
  11. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    • Great Work Great Work x 4
    • Like Like x 1
  12. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    The OwlRepo dataset is public with the upload history and SQL views that the website uses. You can query the data for free with a BigQuery sandbox account. You can run SQL queries like this:

    Code:
    SELECT * from owlrepo.owlrepo.uploads_v1
    
    Here's a starter notebook. If you're familiar with the tools, it only takes a few lines to make this:

    upload_2020-12-5_23-51-8.png
     
    • Great Work Great Work x 1
  13. robert8
    Offline

    robert8 Mushmom

    57
    102
    65
    Oct 17, 2018
    Male
    UK
    9:36 AM
    Robert8
    I/L Arch Mage
    165
    Hogwarts
    Can you add a figure of tasks remaining on the curation section? Just something to work towards...
     
  14. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    Yeah, I can add some stats to that page for progress. It's an ever growing list though :)
     
    • Funny Funny x 2
  15. Trion
    Offline

    Trion Capt. Latanica

    306
    80
    273
    Jul 23, 2019
    4:36 AM
    Trion
    Beginner
    1
    Thanks so much for this! I have been waiting something like this for years considering I was never the most savvy FM merchant this has helped me alot! I appreciate it.
     
    • Friendly Friendly x 2
  16. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I made a few minor updates, and realized that I haven't updated this thread in some time. First off, I fixed some recent usability issues:
    • The search was not returning correct results until you typed something and cleared it. Search is tolerable again (please let me know if it still sucks)
    • 100% scrolls were not showing up in the scroll guide. I was wondering why there were two listings for earring int 10%. It turns out that "10" is also a pattern in "100"...
    Anyhow, since the time of last writing (2021-01-14), I've made some more significant changes to the site. A notable one is that the site has some new styling that's easier on the eyes during the hours I look at it. I expanded the scroll guide to include mastery books and some random etc. (let me know if you have suggestions). Finally, I refactored the site to use sapper for routing, which lays the groundwork for optimizing page loads.

    It's been great seeing people other than me uploading to the site. To date, there have been ~48 unique contributors.

    upload_2021-3-2_0-20-57.png

    Thanks all for helping out :)
     
    • Great Work Great Work x 9
    • Like Like x 1
  17. Nise
    Offline

    Nise Supervisor Staff Member Supervisor Game Moderator

    2,059
    693
    500
    Jul 5, 2017
    Male
    Korea
    5:36 PM
    NoraONE
    Corsair
    189
    Sweetdreams
    The search does seem better! I had an incredibly hard time last night searching "helmet" scrolls, it kept popping up as gloves and I was super confused T_T but it worked pretty solidly just now when I tested ^^

    Keep up the great work!
     
    • Informative Informative x 1
  18. Floris
    Offline

    Floris Capt. Latanica

    367
    261
    273
    May 27, 2020
    Male
    10:36 AM
    Hi,
    For me the search feature still isn't really working (on the home page). When I type in 'scroll for helmet for' I get 'wand for magic att', 'two-handed sword for att' and 'topwear for str' scrolls as the first hits?

    upload_2021-3-2_12-39-11.png
     
    • Agree Agree x 1
    • Informative Informative x 1
  19. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    I can't reproduce this in Firefox 87, Chrome 88, and whatever version of Safari comes built into iOS 13. I get helmet dex 60, followed by helmet int 60. What browser are you using? Do you know if this is an issue on OwlRepo 2.2.1 (aa4cd7b0) specifically (bottom of the page)?
     
  20. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    1:36 AM
    geospiza
    Dark Knight
    146
    Funk
    Yet another small update: the guide will let you set a lower bound on the scroll prices. The prices might be more out of date the less valuable the scroll is. I use the recommendation page to remind me what to upload, and it prioritizes more expensive scrolls over cheaper scrolls. If something is out-of-date and you happen to owl it, consider uploading and updating the guide.
     

Share This Page