1. Hello!

    First of all, welcome to MapleLegends! You are currently viewing the forums as a guest, so you can only view the first post of every topic. We highly recommend registering so you can be part of our community.

    By registering to our forums you can introduce yourself and make your first friends, talk in the shoutbox, contribute, and much more!

    This process only takes a few minutes and you can always decide to lurk even after!

    - MapleLegends Administration-
  2. Experiencing disconnecting after inserting your login info? Make sure you are on the latest MapleLegends version. The current latest version is found by clicking here.
    Dismiss Notice

Exploring player growth via rankings page

Discussion in 'General Discussion' started by geospiza, May 8, 2021.

  1. Nightz
    Offline

    Nightz Supervisor Staff Member Supervisor Game Moderator

    1,782
    1,033
    490
    Oct 22, 2020
    Male
    3:30 PM
    Nightz
    I/L Arch Mage
    200
    Funk & Pasta
    Moderator Post
    Something seems way off here, not sure how you're getting this data but a quick check on the rankings gets me 498 results typing shivfame and they all seem to end with numbers, then there's also his NLC store mules that go like Shivbull8 etc

    Should net a lot more chars ending with a number.
    upload_2021-5-20_10-48-33.png

    Or am I just reading over things and missing the discussion? it's early :3
     
  2. Gurk
    Offline

    Gurk Nightshadow

    677
    447
    350
    Mar 9, 2020
    Male
    7:30 AM
    Gxrk
    Hero, Bishop, Marksman, Shadower, Buccaneer, Corsair

    This is based off of sampled data; it is not comprehensive, which is why it is maybe amusing that 4 of shiv's fame mules managed to make it in.
     
    • Agree Agree x 2
    • Like Like x 1
    • Funny Funny x 1
    • Informative Informative x 1
  3. Cascades
    Offline

    Cascades Slime

    23
    13
    30
    Jul 2, 2020
    Male
    7:30 AM
    Cascades, Crater
    Hero
    135
    Homies
    Just a thought - maybe do a filter instead of containing "HS" or "HB" or "MU"? I swear most of those mules have that in their name. :)
     
  4. Gurk
    Offline

    Gurk Nightshadow

    677
    447
    350
    Mar 9, 2020
    Male
    7:30 AM
    Gxrk
    Hero, Bishop, Marksman, Shadower, Buccaneer, Corsair
    rip PippyHS iPippyiPippy
     
    • Funny Funny x 1
  5. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I'm using that sampled dataset of about 1000 characters I made a few weeks ago. I don't have any data that isn't already public, so things like death timestamps aren't doable unfortunately.

    If I do scrape a a bunch of characters, I'll keep in mind the different ways to filter out mules :)
     
    • Great Work Great Work x 1
  6. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I went ahead and scraped all the leveling history for mages. Now I don't have to think about all the issues around sampling. There were a total of 51168 mages when I did my scrape last night (2021-05-21). I moved over to using Spark because the size of the dataset is large enough to call for parallelization.

    Notebook: https://github.com/geospiza-fortis/ml-player-growth/blob/main/notebooks/2021-05-22-levels.ipynb
    Data:
    Here's a breakdown by the job name:

    Code:
    +------------------------+-----+
    |mastery                 |n    |
    +------------------------+-----+
    |magician                |21241|
    |cleric                  |9877 |
    |wizard (ice/lightning)  |5634 |
    |priest                  |4160 |
    |bishop                  |3906 |
    |wizard (fire/poison)    |2554 |
    |archmage (ice/lightning)|1329 |
    |mage (ice/lightning)    |1087 |
    |archmage (fire/poison)  |862  |
    |mage (fire/poison)      |518  |
    +------------------------+-----+
    
    And here's a plot of the 2nd job specialization. It's "all" if they haven't reached 2nd job yet:

    upload_2021-5-22_13-19-31.png

    The distribution of levels across mages is interesting:

    upload_2021-5-22_13-17-46.png

    I believe that spike around level 81 are HS mules :)

    I took a look at the time to next level for 3rd job classes. I used the 10th percentile of leveling times in minutes to make the plot (i.e. 10% of players level faster than this time). The sample size in these plots are the number of players at the last level shown in the plot.

    upload_2021-5-22_13-21-26.png

    The fire/poison meta is very apparent here. Here's another view of the cumulative time to level 120 in hours by each specialization:

    upload_2021-5-22_13-24-41.png

    That's a whole 20 hours difference for the top 10% of fp mages.

    Going back, we can expand the span of time out further:

    upload_2021-5-22_13-22-23.png

    We see that bishops take a while after hitting 4th job, presumably for the genesis quest.

    I'll be playing around with this for a while in preparation for a final scrape of the entire rankings. Let me know if you have any ideas you'd like to see for mages.
     
    • Great Work Great Work x 6
  7. Slime
    Offline

    Slime Pixel Artist Retired Staff

    641
    1,184
    381
    Apr 8, 2015
    Male
    Israel
    4:30 PM
    Slime / OmokTeacher
    Beginner
    102
    Flow
    Very interesting graphs, gotta love how intuitive the distribution of levels across mages is:
    upload_2021-5-23_1-56-7.png
     
    • Like Like x 1
  8. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I spent a day making another plot of the estimated experience per hour:

    upload_2021-5-23_16-45-18.png

    This is the experience for a level divided by the top 5% of leveling times. It's reasonable, somewhere areound 30-50m eph after level 120. There's a dip the eph after 130 because it generally takes multiple days to level at that point. There's no way of knowing the time spent online, so we have to make do with estimates.

    I thought one way to solve this is to take the minimum time between levels, with the idea that there's probably one person who just leeched those levels away. I got this plot:

    upload_2021-5-23_16-49-10.png

    Is that 7b eph? Here's another view with the y axis in log scale:

    upload_2021-5-23_16-49-48.png

    I think that those weird cases are caused by people leveling during daylights savings:

    Code:
    +----------+-----+--------------+-------+-------------------+-------------------+
    |      name|level|specialization|   diff|          timestamp|     prev_timestamp|
    +----------+-----+--------------+-------+-------------------+-------------------+
    |  blehmain|   16|           all|-522187|2017-03-08 13:46:22|2017-03-14 15:49:29|
    |Valdebebas|   21|           all|-420905|2017-04-02 00:11:19|2017-04-06 21:06:24|
    | SacerBill|   52|        cleric|-273243|2017-03-08 14:17:24|2017-03-11 18:11:27|
    |      Aram|   17|           all|-218495|2017-03-07 22:32:13|2017-03-10 11:13:48|
    |lordswag69|   52|           ice|-200862|2017-03-07 15:48:08|2017-03-09 23:35:50|
    |     Foxed|    8|           all|-149236|2017-04-01 22:14:13|2017-04-03 15:41:29|
    | ChillDown|    9|           all|-128753|2017-03-08 19:40:39|2017-03-10 07:26:32|
    |  BetaCruz|   12|           all| -94815|2017-03-09 16:38:06|2017-03-10 18:58:21|
    | Corgeeees|   31|           ice| -79362|2017-04-02 06:48:02|2017-04-03 04:50:44|
    |    Mattyy|    6|           all|  -3321|2018-03-11 03:01:04|2018-03-11 02:56:25|
    |MaxMighty1|   10|        cleric|  -3252|2019-03-10 03:00:45|2019-03-10 02:54:57|
    | Fledgling|   12|           ice|  -3205|2020-03-08 03:05:37|2020-03-08 02:59:02|
    |    Credit|    9|          fire|  -3072|2019-03-10 03:06:21|2019-03-10 02:57:33|
    |Misclicked|    7|           all|  -3012|2021-03-14 03:04:55|2021-03-14 02:55:07|
    |  Bluberri|    7|        cleric|  -2958|2021-03-14 03:06:26|2021-03-14 02:55:44|
    |      Guap|   22|           ice|  -2907|2020-03-08 03:01:38|2020-03-08 02:50:05|
    | eddybell4|    7|          fire|  -2901|2021-03-14 03:03:37|2021-03-14 02:51:58|
    |     ChisT|    9|        cleric|  -2734|2018-03-11 03:02:29|2018-03-11 02:48:03|
    |    PaiBao|  104|          fire|  -2711|2020-03-08 03:10:27|2020-03-08 02:55:38|
    |  Coconaza|   23|        cleric|  -2590|2021-03-14 03:13:16|2021-03-14 02:56:26|
    +----------+-----+--------------+-------+-------------------+-------------------+
    only showing top 20 rows
    
    Many of these are daylights savings dates. So if it took 1:05 to level up, it would show up as a 5 minute difference because of the fall back in time.

    There are a couple of extremely unusually cases here. For example, there is one character that leveled up in the past:

    upload_2021-5-23_16-54-47.png

    I think this one was probably due to a server rollback.

    I spent more time than I should have trying to figure a good solution that relied on statistics to filter out bad values. I'm not sure it was worth the time. The query optimizer also came up with some very gnarly plans:

    upload_2021-5-23_16-44-14.png

    The idea I want to play with next is trying to estimate time to next level using matrix factorization methods. I'm not even sure if the idea will work, but it's worth setting up the problem and seeing if it works at all. I'll be treating the level times (or the eph) as a matrix of size n x k, where n is the number of players and k is the number of levels. I'll get a 50k x 200 matrix that has a lot of empty values. I can use this a couple of different ways, but I think the most interesting will be this:

    Create a matrix with all the values up to a certain cutoff date. Factorize the matrix and use it to estimate missing values (like values in the future). Create a new matrix that is 1 month after the cutoff date. Then compare the values from the estimate with the new matrix. Will this work reasonably? What does the factorized matrix end up looking like?
     
    • Great Work Great Work x 3
  9. whatdatoast
    Offline

    whatdatoast Windraider

    469
    122
    301
    Apr 9, 2020
    7:30 AM
    whatdatoast
    Bowman
    I feel like the 4th job peaks could be people who cap off at gen1 / gen20
     
  10. IHealForYou
    Offline

    IHealForYou King Slime

    28
    23
    21
    Jan 16, 2021
    Male
    3:30 PM
    LegendKnight, IHealForYou
    Dark Knight
    128
    Losers
  11. Slime
    Offline

    Slime Pixel Artist Retired Staff

    641
    1,184
    381
    Apr 8, 2015
    Male
    Israel
    4:30 PM
    Slime / OmokTeacher
    Beginner
    102
    Flow
    To me it just seems like people are least likely to stop playing when they're close to job advancing, which is why the graph goes low before job advancements, and bounces back up after the job advancement.
     
  12. Nise
    Offline

    Nise Supervisor Staff Member Supervisor Game Moderator

    2,059
    693
    500
    Jul 5, 2017
    Male
    Korea
    11:30 PM
    NoraONE
    Corsair
    189
    Sweetdreams
    Some questions I had after seeing these amazing graphs:
    • What does the cumulative hours per level look like for mages 120 - 200?
    • What would the charts look like if it the x-axis wasn't a linear 1 - 200?
      • Half of Level 200 in terms of EXP is Level 187, so seeing an x-axis formatted around that would be interesting to see
     
    • Like Like x 1
  13. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    The cumulative for hours in the 4th job range looks like this:

    upload_2021-5-25_23-13-47.png

    Looks like fire/poison mages are still on top in terms of leveling.

    Here's the plot using the amount of experience it takes to level on the x axis (from 1-200):

    upload_2021-5-25_23-14-38.png

    It's much flatter than any of the other plots. Ideally this would be a straight line if time to level were proportional to experience. The plot has a downward trajectory at the end; I'm not totally clear on the interpretation (it's easier to make end levels than you would expect, perhaps).


    While I was at it, I tried making a few other plots to see the representation of the population at various stages of the server's history.

    upload_2021-5-25_23-23-6.png

    This breaks down the population by the job advancements that they've had. Looking at this proportionally:

    upload_2021-5-25_23-24-3.png

    I realized this query might be a bit skewed, because it only includes people who leveled in a particular month. I would have to include every player in each time period after they start for this to accurately represent the population. I'll leave this for another day.
     
    • Like Like x 2
    • Great Work Great Work x 1
  14. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I am starting a full scrape of the ranking tables starting today using the methods here. I'm hoping to finish by dump the end of the weekend, but there's no guarantee everything goes well.

    I'm going to be building an interactive web application with the dump (statistics only, but broken down by class and level). So far I'm thinking of capturing the following information:
    • number of players for each class/level over time
    • average/cumulative time to level for each class
    • survival (kaplan-meier) estimates for each class/level at time of dump
    I'm going to be building this using sql.js so everything can be on static hosting. If things go well, I may try try out sql.js-httpvfs to include the row-level data, too.
     
    • Like Like x 2
    • Great Work Great Work x 2
  15. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    The scrape of the rankings is complete, with a few minor hiccups. There are 201,514 rows after removing some duplicates. Counting the characters by jobs results in the following table:

    Code:
    +--------+-----+----+
    |job     |n    |pct |
    +--------+-----+----+
    |magician|53214|26.4|
    |beginner|45021|22.3|
    |warrior |39162|19.4|
    |thief   |34375|17.1|
    |bowman  |16185|8.0 |
    |pirate  |13557|6.7 |
    +--------+-----+----+
    
    If we take a look at only the 4th jobs (I call it mastery), then we get the following:

    Code:
    +------------------------+----+----+
    |mastery                 |n   |pct |
    +------------------------+----+----+
    |bishop                  |4117|29.8|
    |night lord              |1451|10.5|
    |archmage (ice/lightning)|1388|10.1|
    |dark knight             |1149|8.3 |
    |bowmaster               |1005|7.3 |
    |buccaneer               |944 |6.8 |
    |archmage (fire/poison)  |928 |6.7 |
    |hero                    |885 |6.4 |
    |shadower                |822 |6.0 |
    |paladin                 |414 |3.0 |
    |corsair                 |351 |2.5 |
    |marksman                |339 |2.5 |
    +------------------------+----+----+
    
    I also counted the number of players by level:

    upload_2021-8-1_22-24-42.png

    From 1-45, there are few notable spikes. For easier viewing:

    upload_2021-8-1_22-25-1.png

    There's also a notable spike around 4th job and level 200:

    upload_2021-8-1_22-25-28.png

    It's going to take another few days to get the leveling data, but that should provide much richer information than what I have to play with here.
     
    • Informative Informative x 5
    • Great Work Great Work x 1
  16. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    Scrape is done, I have about 450mb of leveling data from about 200k characters. Here's the plot of new characters a day based on the full dataset:

    upload_2021-8-2_23-3-53.png

    This shows a much sharper image than the sample of ~1000 characters all the way in the beginning:

    [​IMG]

    The sample was surprisingly good though.

    Daily joins looks a bit different:

    upload_2021-8-2_23-5-50.png

    Finally, I took a look at the cumulative time to level vs experience to level:

    upload_2021-8-2_23-6-45.png

    There's a slightly more crowded version of this chart with second job specializations:

    upload_2021-8-2_23-9-36.png

    Thinning down on the class, it's interesting to see the training gap:

    upload_2021-8-2_23-11-1.png

    Now that I have all of the data, I need a bit of time to chew on the best way to plot the data in a web app. Might be something fun to play with on the weekend.
     

    Attached Files:

    • Great Work Great Work x 6
    • Like Like x 1
    • Informative Informative x 1
  17. Rohan Varma
    Offline

    Rohan Varma Mano

    10
    9
    25
    May 20, 2020
    Male
    7:30 AM
    cwispy
    Bandit, Shadower
    162
    Konoha
    Hey geospizageospiza , this is super interesting! It would be fun to be able to explore this data.

    If you are interested I could probably help make it super easy to spin up a web app with a tool called Explo! It makes it super simple to create user interfaces to explore data.

    https://www.explo.co/

    We would just need to get the data into a Postgres DB or BigQuery (I could help very easily with this) and then we can hook it up and quickly spin up either a standalone data viz app or something you could embed into a static website.
     
    • Friendly Friendly x 1
  18. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    Very cool, looks like a better Data Studio. I'm game, though I'm still planning on building a web app for the survival analysis out of curiosity (to learn the underlying stats and expose the statistical tests for different cohorts). I'll stick the dumps into a bucket and make a BigQuery dataset publicly available, and see how it goes from there.
     
  19. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I've created a few csv files dumps and made data available via BigQuery.

    Links to the csv dumps:
    The data is available publicly under geospiza:mlrank. This colab notebook shows how to access the tables:
    Using this data, I had an idea of trying count the number of players who have leveled in the last month over time. It took 2.3 hours to execute this query in Spark:

    Code:
    with names as (
        select distinct name
        from ranking
    ),
    activity as (
        select
            name,
            timestamp
        from levels
        right join names
        using (name)
    ),
    date_range as (
        select sequence(to_date(min(timestamp)), to_date(max(timestamp)), interval 1 day) as dates
        from activity
    ),
    dates as (
        select date
        from date_range
        lateral view explode (dates) t as date
    ),
    characters_active as (
        select
            name,
            date,
            datediff(date, max(timestamp)) as last_active_days
        from activity
        cross join dates
        where timestamp <= date
        group by 1, 2
        order by 1, 2
    )
    select
        date,
        count(distinct if(last_active_days < 28, name, null)) as n_active,
        count(distinct name) as n_total
    from characters_active
    group by 1
    order by 1
    

    Running this in BigQuery takes 30 seconds [​IMG]

    The results are kind of neat:

    upload_2021-8-6_21-33-51.png

    There's the obvious spike from mid-2017, but we can also see the insane increase in user population in the last year. For a subset of 2020-2021, we can see the number of characters wax and wane with events:

    upload_2021-8-6_21-34-12.png

    Compare this to the online count scrape (separate from this ranking/levels scrape):

    [​IMG]

    For fun, I also made a plot of just the islanders:

    upload_2021-8-6_21-42-37.png
     
    • Great Work Great Work x 3
  20. OP
    OP
    geospiza
    Offline

    geospiza Web Developer Staff Member Web Developer

    212
    449
    215
    Apr 16, 2020
    7:30 AM
    geospiza
    Dark Knight
    146
    Funk
    I trained 195 different survival models and wrote a little cloud function that spits out a plot.

    https://us-west2-geospiza.cloudfunctions.net/maplelegends-survival?name=geospiza

    Replace name=geospiza with any character name to get a plot like below.

    [​IMG]

    It uses Cox's proportional hazard model for regressing the last 10 time-to-levels and the age of a character to approximate a survival function. If the character line is above the baseline, then your progress is worse than average. If the character line is below the baseline, you're doing better than average. The table below can be used to see how long it takes for a % of the population to reach the next level. For example, the 75th percentile of characters (or 25% of the population) reaches level 146 in 26 hours.
     
    • Great Work Great Work x 2

Share This Page