1. Hello!

    First of all, welcome to MapleLegends! You are currently viewing the forums as a guest, so you can only view the first post of every topic. We highly recommend registering so you can be part of our community.

    By registering to our forums you can introduce yourself and make your first friends, talk in the shoutbox, contribute, and much more!

    This process only takes a few minutes and you can always decide to lurk even after!

    - MapleLegends Administration-
  2. Experiencing disconnecting after inserting your login info? Make sure you are on the latest MapleLegends version. The current latest version is found by clicking here.
    Dismiss Notice

OwlRepo - a repository of transcribed owl searches

Discussion in 'Items & Mesos' started by geospiza, Jul 3, 2020.

  1. geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Hey all, I'd like to share with all of you a site I built for fun and learning. OwlRepo is a site that machine transcribes owl searches and summarizes them. The best part is that anyone can help out by uploading screenshots from in-game.

    upload_2020-12-1_22-26-10.png

    The front page contains a searchable summary of every upload to the site. This is subject to change while I figure out the best way to present information about current prices. I usually set prices in my stores around the p25 value, which is the 25th percentile or the price that is greater than 25% of the items on the market at that time.

    If you click on an item, it will take you to the history of owl screenshots. You can get a sense for the change in the market by looking at the plot over time.

    upload_2020-12-1_22-40-25.png

    If you click on one of the items on this page, you will see the complete contents of the machine transcription. You can copy and paste the results into a spreadsheet for your own purposes. For a quick example of what's currently possible, take a look of these owl searches for all of the cape scrolls for base stats.

    upload_2020-12-1_22-41-34.png

    The centerpiece of the site is the ability to upload your own owl searches. I've made uploading as streamlined as possible. No account necessary, and no more of the screenshot than necessary to transcribe the owl search. Try it out :)

    I use the Owl of Minerva to search for all of these items, and take an in-game screenshot of each page.

    upload_2020-7-3_13-19-51.png

    I then use the Upload tab from the screenshots in the MapleLegendsHD folder.

    upload_2020-7-3_11-47-27.png

    A minute later, the transcribed data is available in the index.

    upload_2020-7-3_11-48-41.png

    You can also download a desktop version of OwlRepo, which will automatically pull owl screenshots from your screenshot directory:

    https://github.com/geospiza-fortis/owlrepo-client/releases

    In case you're interested in some of the details of how this thing was built:

    The heart of this is an OCR engine called tesseract. I put together a small library that extracts the data from the screenshots into a machine readable format. Each batch of owl searches are put into JSON file that can be used in other applications. The web service exposes this as part of it's API:

    https://owlrepo.com/api/v1/data/bcba6c04-b249-4905-8685-e4d45134bc5e/slim.json

    Since I have not open sourced the server code at the time of writing, here's the schema for the response.

    Code:
    {
      "type": "object",
      "properties": {
        "screenshot": {
          "type": "object",
          "properties": {
            "timestamp": { "type": "string", "format": "date-time" },
            "name": { "type": "string" },
            "sha1": { "type": "string", "format": "uri" }
          },
          "required": ["timestamp", "name", "sha1"]
        },
        "search": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "item": { "type": "string" },
            "results": { "type": "integer" }
          },
          "required": ["text", "item", "results"]
        },
        "paginator": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "current": { "type": "integer" },
            "total": { "type": "integer" }
          },
          "required": ["text", "current", "total"]
        },
        "body": {
          "type": "object",
          "properties": {
            "text": { "type": "string" },
            "entries": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "id": { "type": "string" },
                  "store_name": { "type": "string" },
                  "bundle": { "type": "integer", "minimum": 1, "maximum": 200 },
                  "price": { "type": "integer", "minimum": 1 },
                  "quantity": { "type": "integer", "minimum": 1 }
                },
                "required": ["id", "store_name", "bundle", "price", "quantity"]
              }
            }
          },
          "required": ["body", "entries"]
        }
      },
      "required": ["screenshot", "search", "paginator", "body"]
    }
    

    When I'm ready to ship changes to the back-end, a new Docker image is built and deployed to various services. The site is hosted on Google Cloud Platform and takes advantage of their free tier.


    upload_2020-7-3_11-58-45.png
    [source]

    It should scale up and accept millions of owl searches a month well. Of course, I'm the only user at the moment with maybe hundreds of pages a day.

    The data gets inserted into a BigQuery table, which can be used to generate tables and charts about everything that is uploaded. Here's the SQL that's used to generate the current chart pictured above. There are more sophisticated queries that can be used to handle bad actors.

    Code:
    WITH
      -- convert the nested JSON structure into a row per seller
      flattened AS (
      SELECT
        task_id,
        batch_sha1,
        payload.screenshot.timestamp AS screenshot_timestamp,
        payload.screenshot.sha1 AS screenshot_sha1,
        payload.search.item AS search_item,
        payload.search.results AS search_results,
        REPLACE(REPLACE(paginator.text, "(", ""), ")", "") AS paginator_text,
        entries.*,
        entry_index
      FROM
        owl.upload_v1 AS upload,
        UNNEST(upload.payload) payload,
        UNNEST(body.entries) entries
      WITH
      OFFSET
        AS entry_index ),
      extracted AS (
      SELECT
        task_id,
        DATE(screenshot_timestamp) AS screenshot_date,
        search_item,
        search_results,
        price
      FROM
        flattened )
      -- generate statistics over time/search result window
    SELECT
      DISTINCT * EXCEPT(price),
      COUNT(*) OVER (PARTITION BY task_id, screenshot_date, search_item, search_results) AS search_results_captured,
      percentile_disc(price,
        0.5) OVER (PARTITION BY task_id, screenshot_date, search_item, search_results) AS median_price
    FROM
      extracted
    ORDER BY
      search_item,
      screenshot_date,
      search_results_captured DESC
    


    The back-end is otherwise a dumb pipe that just transcribes the owls.

    The site itself handles displaying the machine-read files in a human friendly way. It's written in Svelte, which has been enjoyable to work with so far. If you're curious, the source can be inspected in the browser development tools. I'm no web developer though, so it can look ugly.

    Old individual listing page screenshot for cape scrolls.
    upload_2020-7-3_11-38-52.png

    This dashboard is no longer, you did well Data Studio.


    Updated 2020-12-01
    • Updated out-of-date screenshots into an appendix section
    • Quoted my closing thoughts to keep context for discussion, but the information is mostly out of date.
    Updated 2022-09-18
    • Adding link to the desktop client
     

    Attached Files:

    • Great Work Great Work x 31
    • Like Like x 11
    • Agree Agree x 1
    • Useful Useful x 1
  2. Chew
    Offline

    Chew Headless Horseman

    897
    400
    372
    May 8, 2015
    Male
    3:33 PM
    Beginner
    This is amazing. Great job!
    You could use some help on the front end, I hope someone here can help you :) being able to search for a specific item in the index would be nice. Like if i wanted to see the history of prices for gfa 60%.
     
  3. Luscious
    Offline

    Luscious Mr. Anchor

    261
    69
    257
    May 2, 2017
    Male
    7:33 AM
    Luscious, MoeMoe, Spicy
    Demure
    I'm impressed with the data extraction from screenshots.

    However, the area that I'm mostly interested in discussing is the statistics. A problem that occurred with previous price guides was intentional manipulation both in-game and in the database. The goal then, is to proactively counter abuse. Your method of extracting data from screenshots is pretty robust against the database attacks (except for photo editing) but in-game players could put lots of a certain item at much higher prices than would actually sell to force your model to trend upwards. Even using MAD wouldn't help against a flood attack on a low volume item. So I believe the representation should have some sort of bias for the lower end of data while maintaining a heavy bias against outliers.

    I can't come up with a perfect solution yet, but maybe think about using MAD on Q1-Q2 of the data?

    Edit: An additional way of combating intentional price manipulation would be to count prices from distinct user id's instead of counting through quantity of an item from an individual (correct me if I'm wrong, but it looks like you're already doing this?). This would prevent somebody who has a huge stock of an item from arbitrarily dictating what the price looks like on the database. This could be circumvented by an abuser by spreading the stock of the item across many mules, but it's still a good mitigation.
     
    • Informative Informative x 1
  4. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Thanks for the feedback!

    I totally agree, my skills are pretty limited on this front. But on the other-hand, it's totally functional and exceeds my initial expectations for the project. If there are folks with a good eye for design or an interest in data visualization, feel free to reach out to me. It would be a good excuse to open-source the client and split it off from the server bits.

    After I add in support for MacOS folks (who don't have the ability to take in-game screenshots), I'm planning on adding limited search with elasticlunr.js to get historical prices for a single item. It is a bit frustrating that the only way to do this now is through the embedded chart.

    Sellers do this to some effect now. There were a few days where several Maple Shields were listed at >2B, clearly looking for offers (which funnily enough exceeds the price column width in-game). These items have enough volume where it should be difficult for a single player to manipulate a robust estimator. Another aspect of having the entire owl transcribed is that seller and shop names can be used for de-duplication or for detecting manipulation of images (e.g. bypassing built-in checks to transcribe owls from another game).

    But yes, unfortunately there's no stopping someone from listing an Omok set for 20m because no one else is selling. However, there is the context of having both the full transcript and (hopefully) history for an item now. There also an element of randomness as to when owl searches will be uploaded for low volume items, which would require sustained effort from a seller to affect prices over the long term.

    I haven't thought about trimming the upper part of the distribution -- I'm also not sure what the implications are around the interpretation of that number. A guildie suggested adding in a confidence interval with Q1/Q3 in the line chart (which is actually very doable). Similarly, a box and whisker plot might be a good way to communicate the data without having to trim too much data. I am all ears on the statistics side of things though, it's something that I would like to understand in more depth. If you're interested, this paper on quantitative data cleaning has influenced a lot of my thoughts on how I'm approaching the necessary stats.

    On a slight tangent, the discussion in C++'s scroll guide lays some of the concerns around how guides can manipulate the market. I'm unclear what effect owlrepo will have on the market, but I'm hoping that owl transcripts and historical data will be net positive, and kind of neat.
     
    • Great Work Great Work x 1
    • Informative Informative x 1
  5. Luscious
    Offline

    Luscious Mr. Anchor

    261
    69
    257
    May 2, 2017
    Male
    7:33 AM
    Luscious, MoeMoe, Spicy
    Demure
    Equips are a completely different ball game, I'm surprised you even included them. Honestly, no useful information can be gleaned from Bathrobe for Men having a price range from 50k to hundreds of mil without the context of the stats of the equip.

    I've put some more thought into the implications of trimming the data and cleaning it in general. Applying standard statistics for this model assumes that the data is normally distributed, but I suspect that is largely not the case. Take for example this result https://owlrepo.com/listing/ceb50828-a18d-41e9-9a53-ed22a3199a6c. If I plot this out it looks like this: face avoid 60.png
    As you can see, there's an obvious right-tail. Now there's different conclusions that you can draw from this. One that I suspect is that the price listings are actually semi-normally distributed but the left-tail is missing from the data-set because they are all sold out. This is an inherent bias in the data. To account for this bias, if you take Q1-Q2, trimming Q3, you sort of account for the loss of the left-tail. However this is subjective based on ones definition of "The Price" of an item.

    Also thanks for the paper, it looks like a good read and I'll definitely keep it.
     
    • Like Like x 1
    • Agree Agree x 1
  6. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    For completeness, I'm just capturing everything in the top 10 searched items. Even without the item stats, you can probably infer the price of clean/scrolled items. I'm a little sad when I'm reminded of the time I sold a brown work glove for 2m...

    That being said, I have another goal in mind where having more data is useful. There are obvious transcription errors that can be caught using simple methods. I can use these to determine which pages are affected by poor tuning of the underlying OCR model. Tesseract uses an LSTM character model for performing OCR, and allows for fine-tuning of its final layers. So I would write a query (like "give me all of the images where the bundle count is greater than 200"), build an image dataset that I verify by hand, and fine tune the model. The more the merrier. Then I can reprocess data so I don't have to look at an "Avoiclability" scroll or a "Maple Skan«a"


    I agree with the overall story you're telling with this plot here. These don't actually reflect the price of the items as they actually exchange hands -- only the parties involved and the MapleLegends database know those precise numbers. I do want to keep an objective view that is informed purely on the results of the owl results page. But I also think a complementary summary vetted by a person or a way to toggle the allowable range of prices would be useful too. The browser can do a surprising amount of work when generating a summary. This is definitely something to revisit once there's a time series to look at.
     
  7. Chew
    Offline

    Chew Headless Horseman

    897
    400
    372
    May 8, 2015
    Male
    3:33 PM
    Beginner
    I uploaded the Skanda prices, and it was super easy. It literally took me like 30 seconds to take the screenshots and upload them.

    Could you add thousand seperations (1,000,000)?
    And make the numbers hug the right side for easier readability?
    And maybe arranging from cheapest to most expensive? Some will look to sell quickly and might need the cheapest price, and others will be looking to buy, and might need the cheapest price.
    Like this:
    [​IMG]

    I experienced an issue, or perhaps i didnt use the program as intended.
    I uploaded owls from 2 seperate items at the same time, and it just bulked it together under 1 item. https://owlrepo.com/listing/bed74601-a99d-4c36-b074-24e51b31d067
     
  8. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Excellent, thank you for trying it out. The tables should be formatted like the table shown. It's a bit ugly now, but I'll let you try it out. The default view is the ordering in the uploads, but it's possible to sort on the columns now.

    Default view:
    upload_2020-7-4_0-49-52.png

    After sorting:

    upload_2020-7-4_0-50-14.png

    Also that empty space on the right is ugly, I'll fix it later.

    I quite like this, especially since it's common to put the fm/cc in the store name.

    This is actually a feature, not a bug. It can get very annoying to upload many small things, so it's possible to upload them in one big batch. For example, look at this upload:

    https://owlrepo.com/listing/d18a5a9e-a3eb-4954-ad90-83ddd400509b

    The index looks over the upload batches, not the individual items in the batches. There's some extra processing that needs to be done to group screenshots for the same item together, so don't worry about the double uploads since that will be taken care of.
     
  9. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Here are a few ergonomic updates. The first is a contribution heat map to see active contributions. I'm hoping to keep this green through the end of July with the top 10 owl results.

    upload_2020-7-5_19-55-55.png

    The second is an items page. It includes some basic stats and a link to the full transcript. Here's what filtering on "chaos" looks like:

    upload_2020-7-5_19-57-42.png

    This page is updated twice a day, once at server midnight and once at server noon.

    These are part of the api now, under the following routes:

    Code:
    owlrepo.com/api/v1/query/heatmap
    Code:
    owlrepo.com/api/v1/query/search_item_listing
    The heat map was an excuse for me to figure out how to structure exposing the data to the front-end. It turns out that running some SQL and putting the result into static hosting is the cheapest way to do this. The heatmap is updated 4x a day, while the listing is only updated 2x a day. I can now write SQL not have to worry about messing with the server bits. The index for upload tasks still remains the most up to date view of new uploads, though.

    The query for the search listing is here for transparency. The one thing that I'd like to figure out is how to group bursts of screenshots (e.g. consecutive and taken within 1 min of each other).

    Code:
    WITH
      extracted AS (
      SELECT
        DISTINCT search_item,
        search_results,
        task_id,
        -- todo: screenshot_date does not work over day boundaries, it would
        -- be nicer to capture windows based on distance between consecutive screenshots in a batch
        MIN(screenshot_timestamp) OVER (PARTITION BY task_id, DATE(screenshot_timestamp),
          search_item,
          search_results) AS item_timestamp,
        -- todo: deduplicate before performing analytical functions
        COUNT(DISTINCT CONCAT(screenshot_sha1, entry_index)) OVER (PARTITION BY task_id, DATE(screenshot_timestamp),
          search_item,
          search_results) AS search_results_captured,
        percentile_disc(price,
          0.25) OVER (PARTITION BY task_id, DATE(screenshot_timestamp),
          search_item,
          search_results) AS p25,
        percentile_disc(price,
          0.5) OVER (PARTITION BY task_id, DATE(screenshot_timestamp),
          search_item,
          search_results) AS p50,
        percentile_disc(price,
          0.75) OVER (PARTITION BY task_id, DATE(screenshot_timestamp),
          search_item,
          search_results) AS p75
      FROM
        owl.upload_flat
      WHERE
        search_item IS NOT NULL
      ORDER BY
        search_item,
        item_timestamp )
    SELECT
      * except(item_timestamp),
      FORMAT_TIMESTAMP("%Y-%m-%dT%X%Ez", item_timestamp) as search_item_timestamp,
      ROUND(search_results_captured / search_results, 2) AS percent_complete
    FROM
      extracted
    

    Right now, this is about 100kb, but I'm going to have to figure out a better way of doing things once the size of this file goes into the tens of MB. I'm pretty happy with this though.

    And sorry to those who can't take in-game screenshots, it's more difficult than I thought to support system screenshots safely.

    ----
    Also:

    LF> old owl screenshots to fill in history
     
    • Like Like x 2
  10. Chew
    Offline

    Chew Headless Horseman

    897
    400
    372
    May 8, 2015
    Male
    3:33 PM
    Beginner
    Any news on a search function?
    Honestly id love to pour some nx into the project, but its hard to justify if I cant search the many results. Clicking through the pages manually while looking for a scroll gets old quickly.
     
  11. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Try out the items tab. I've been using this to check prices as I sell my gached scrolls, it works well imo, it's just missing data. There should be a box under "Search Item" when you go on the page. I used "chaos" for the example before, but here's another example. This page is updated on a schedule though, so it's not always up to date. I can make it refresh more often though.

    upload_2020-7-9_15-5-33.png

    Still thinking through the interface, but these are the things I'm probably going to do in the near future:

    1. Remove the "index" and put it under the uploads section, and rename the section on the homepage "most recent uploads". This is currently confusing with the addition of the items page (which is slightly different).
    2. Add another view on the results that actually indexes the stuff in the items tables, one row per item. If you play with the items page enough, you'll notice it includes all of history. It makes this a bit harder to scan quickly. Here, you would be able to click on an item and go to the items tab with the search pre-populated.
    3. Add a place to see which items you uploaded. Currently, there isn't a way to get which listings are ones that you uploaded, plus it might be nice to get a personalized contribution graph.
     
  12. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Most of that near future ended up being today. The home page now has a searchable table. I've increased the refresh rate on the tables to be hourly, too.

    upload_2020-7-10_0-10-4.png

    What used to be known as the index is now "recent uploads" and has been moved into the uploads page. It is updated immediately when an upload is finished.

    upload_2020-7-10_0-12-19.png

    I still need to add a way to keep track of what items have been uploaded. When I do implement it, the history will belong to a browser and won't travel between devices.
     
  13. Chew
    Offline

    Chew Headless Horseman

    897
    400
    372
    May 8, 2015
    Male
    3:33 PM
    Beginner
    Thank you for the great effort you have put in the project so far. The search function is everything I had hoped for.
    I think all the site needs now to gain popularity is a simple makeover, to make it simpler to understand and pretty on the eye :)

    I have put 167 images in the database today, mostly gacha scrolls haha!
     
    • Like Like x 1
  14. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    \o/

    I'm actually a bit surprised that your 167 image upload got through without an issue. Currently if the upload fails, it does it silently. I'd like to fix that when I also add a place for storing a little upload history.

    I've also been thinking about the site aesthetics a bit, it'd be cool to get a Helios Tower Library/Magatia PQ (first stage) vibe for the background imagery. I might play with the color palette, but nothing too complicated.
     
  15. Chew
    Offline

    Chew Headless Horseman

    897
    400
    372
    May 8, 2015
    Male
    3:33 PM
    Beginner
    When you first launched the site, it was possible to copy the datasets into excel or google sheets with just a copy-paste. Is there any way to make it work again? I assume it has something to do with the new tables that I cant just copy paste?
     
  16. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Good catch, it was entirely broken. I enabled copy and paste on the listings page. It's enabled on the home page too, but it only copies what's on the screen and not the entire table. It's broken on the items page for some reason :/

    You can click on the table and ctrl+c, which should copy everything you can see. There's also a big button you can click now, too.

    Click the table to select all elements, or hit the button.

    upload_2020-7-11_16-24-36.png

    Paste into spreadsheet of choice.

    upload_2020-7-11_16-26-48.png


    Also a good time to mention that people playing on Macs can upload screenshots now too. It has to be a capture of the MapleLegends window (command+shift+4, then spacebar, and clicking on the window). The check to make sure that the png is a screenshot of an owl window is less brittle now.
     
  17. riceylin
    Offline

    riceylin Red Snail

    5
    1
    6
    Jun 22, 2020
    Male
    7:33 AM
    Hunter, Bowmaster
    Been using this for a few days now and I love it! Best way to track pricing and understand the market out of the different resources I've used.

    Is this the first of its kind in maple history?
     
    • Like Like x 1
  18. dnbMovement
    Offline

    dnbMovement Snail

    1
    2
    1
    Jun 10, 2020
    Male
    10:33 AM
    dnbMovement
    Beginner
    This is so similar to what I envisioned. This looks amazing! Nice work!

    I actually had this exact same idea last weekend - photo to json transcription and all - and just now, I was scouring the forum to look for available APIs ( I wanted to implement OAuth / OpenId Connect to verify users before uploading screencaps to prevent market manipulation. May be overkill for this).

    I think it would look great to get those distribution values and graph them as a box and whisker plot over time. I've seen some JS open source projects that did exactly that, so it should be pretty easy.

    I would love to help out in any way I can. Please let me know! I'm currently working on backend at work, but I've had my fair share of front end web dev.
     
    • Like Like x 2
  19. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:33 AM
    geospiza
    Dark Knight
    146
    Funk
    Awesome to see others with the same ideas. I decided not go with auth because I thought it would be too much of a barrier to entry and I don't like the idea of collecting email addresses. I'm confident duplication, spam, and manipulation can be dealt with server-side filters (also an excuse to play with noisy data).

    Probably the easiest way to help out is to upload owls. It's currently week 2 of data collection for the most frequent owls, but adding a larger variety of scrolls would be awesome. The more pages/results there are in the owl, the more valuable it is to have in the database imo.

    If you're interested in development work, feel free to reach out to me on Discord (geospiza#5912). There is some clean up I have to do, but I can give you access to the Bitbucket repo and a few of the non-production services later this week if you're interested.

    Here's a list of features that might be of interesting to work on:
    • Visualizations and plots. In the listing page, the JSON response for an upload should be enough (e.g. https://owlrepo.com/api/v1/data/9a65df8c-7fae-484a-9982-0d8ecdb2c4e2/slim.json). For stuff that looks over history, the response from the item listing (e.g. https://owlrepo.com/api/v1/query/search_item_listing) probably works. There are definitely a lot of nice looking plotting libraries out there, I haven't looked into them too deeply. Mostly JavaScript, maybe a little bit of SQL.
    • Local storage for storing upload ids. This is low hanging fruit, also a good intro to how the frontend is organized. Right now you can't tell which uploads are yours, but this would allow to keep track of simple stats like number of uploads. All frontend/JavaScript work.
    • Recommendations for contributions. I've been trying to figure out the best way to optimize the use of owls. I scraped a list of all the tradable scrolls in the library to help figure out what scrolls still need to be uploaded. It would also be useful to rank the existing screenshots to determine which ones are stale and valuable to have up to date information on. Mostly SQL, with a little bit of Python/JavaScript mixed in.
    • Building scripts and a dataset for fine-tuning the OCR. Before fine tuning can even begin, all of the screenshots and output JSON need to be reviewed line by line to create ground truth. A small set of screenshots with high errors (probably 100-1000) can be corrected by hand and used for fine tuning. Then a big reprocessing of all the data. Neat if you're into machine learning, but also a lot of work.
     
    • Great Work Great Work x 1
  20. AzamatBagato
    Offline

    AzamatBagato Timer

    106
    29
    125
    Jul 19, 2017
    Male
    Sweden
    3:33 PM
    AzamatBagato
    Thief
    190
    Demure
    Omg this is so cool! Great job
     

Share This Page