Flashcard Scoring / SRS

mikelove · Jul 5, 2010

Wanted to start a new discussion specifically on this since it's an area we're hoping to improve considerably in 2.2.

We've already done two major overhauls of our flashcard selection / scoring algorithms since the original 2.0 came out on Palm/WM - "limit unlearned" in 2.0.1 and the new priority option / more tweakable Automated algorithm in 2.0.4 - but we're running up into some limitations now because our focus has essentially been too broad. Continuing to support the use of flashcard "scores" for not only SRS (repetition spaced) but "frequency adjusted" mode as well, and for other, manually-controlled test modes (card filters etc), has left us with a system that, while capable, can't do as good a job at the SRS stuff as a program designed specifically around that.

While the interval calculation algorithm now is pretty solid - tweak a few constants and you can get it to spit out almost the exact same intervals as Anki / Mnemosyne / Skritter / etc - there are some other ways in which those scores ought to be handled differently to reflect their primary use as repetition intervals.

Specifically, we seem to need:

A more flexible definition of a "session" that allows new cards to appear mid-testing
Better reporting - graphs etc designed specifically around spaced repetition
Better history tracking, in order to facilitate synchronizing test history between multiple devices and sharing data with other flashcard programs (producing whatever statistics they expect)
Some sort of option to automatically bury / stop testing on cards that are sucking up a lot of time
A better system for presenting / handling newly introduced cards
A system for calculating when to add new cards based not on the size of the pool but on the desired retention rate
Better indication / control of cards' "pool" status in general
Better review of recently incorrect cards, again intra-session

Does anyone have any specific feedback / suggestions on any of these areas, or can you think of any other major way in which our software needs to change to better suit learners using SRS / better match the feature set of other SRS products?

mongrel · Jul 6, 2010

If you want to coordinate with another program you might want to poll on which one it should be.

mikelove · Jul 7, 2010

mongrel said:
If you want to coordinate with another program you might want to poll on which one it should be.

Good idea - I was assuming Anki since that's the one most people seem to talk about but I suppose it would be better to actually ask.

sych · Jul 9, 2010

Hi Mike,

Glad you're continuing to work on SRS in Pleco.

Of the things you've mentioned, the ones that would affect me most are the ability to synchronise flashcards between devices, adjustments to introduction of new Items, and re-testing cards within a session.

Synchronise between Devices
I bought an iPad a week ago and Pleco would be much more useful on it if my flashcard lists, scores, sessions and profiles synced between it and my iPhone. On the same topic, syncing Documents, too, would be great. Syncing with Anki, etc isn't an issue for me because I only use Pleco.

New Items
The current "limit unlearned cards to xx" item generally doesn't suit me well because It doesn't match well with either my goals or my time restrictions. I often have a goal like "learn all these cards within the next 3 months", which translates into "add xx new cards a day", so I do that manually. The current "limit unlearned cards" option is perhaps gentler, but it's hard to predict how fast new cards are going to be introduced. Also, if I start by setting it to 20, and on the first day I get most of my cards right, then I will find I have nothing to study on the 2nd day. Then as a few more days go on, a "wave" of new cards would come along, and I'd have a disproportionate amount of learning to do on that day. To avoid this I can first set a lower number for "limit unlearned cards" and then ramp it up slowly over a few days. This is fiddly.

When I don't have a particular goal in mind, I'm generally thinking about how many cards I'm able do in one day. For example I might think I have time to study 80 cards per day for the next few months, so I would want Pleco to introduce new cards to me in such a way that it keeps my daily study to around 80 cards. The current method of introducing new cards doesn't work well in this scenario, either.

Re-testing within a session
For some cards, it seems like I need a frequency of less than one day. These are the really really hard ones that i just seem to keep getting wrong day in and day out. My first contact with spaced repetition was through Pimsleur, which uses a schedule (for new items) of something like 1 minute, 3 minutes, 12 minutes, 24 minutes. (the schedule was once on Wikipedia somewhere, not sure how accurate it was or whether it's still there). My flashcard sessions are usually at least 15 mins per profile (and I do several profiles per day) so there is space for perhaps some re-testing mid-session. Of course I can do something manually using a separate profile and some filters, but it requires me to set it up manually and remember to stop myself mid-session every now and again.

sych · Jul 9, 2010

BTW, I find the algorithm in 2.1 much better than the one in 2.0, so thanks for that

mikelove · Jul 11, 2010

Thanks for the detailed feedback. Glad you like the new algorithm.

Very good point on "limit unlearned" - we implemented it the way we did basically because it was the easiest way to retrofit that capability onto our existing system, but going forward we definitely need to do a better job of introducing them in a way that people are likely to find useful. I'm actually not sure about going by "retention rate" the more I think about it - seems like it's something users would have a tough time grasping / translating to a real-world study pattern - so something day-based makes sense.

As far as re-testing within a session, I'm still trying to figure out a good way to implement that without completely screwing up other modes of study - we'd probably just want to make repetition-spaced sessions "continuous" / automatically have them add new cards as those cards come due over the course of a day, and combine that with finer-grained spacing intervals. Tricky business anyway.

taijidan · Jul 20, 2010

Thanks for starting the thread. Actually I am surprised it has not received more attention. First a couple of questions:

1)

"more tweakable Automated algorithm in 2.0.4"

I am still using the old automatic because I never saw anything explaining the new automatic algorithm. Is it written up anywhere?
2)

"tweak a few constants and you can get it to spit out almost the exact same intervals as Anki / Mnemosyne / Skritter"

What settings do I need to set to match ANKI?

In response to your questions:

>A more flexible definition of a "session" that allows new cards to appear mid-testing
Doesn't seem very useful to me.

>Better reporting - graphs etc designed specifically around spaced repetition
Yes! Something I've been asking for a long time. Specifically:
1) A Chart or Report predicting cards due for each day in the next week/month/year
2) Something like the hanzistats plugin for Anki - shows number of individual characters learned and % learned from frequency list and hsk lists (could link it to pleco categories).

>Better history tracking, in order to facilitate synchronizing test history between multiple devices and sharing data with other flashcard programs >(producing whatever statistics they expect)
Not a priority

> Some sort of option to automatically bury / stop testing on cards that are sucking up a lot of time
Could be useful I did read about this leech concept on supermemo site. I think pleco can already handle it quite well.
For example I created a category called leech and then used batch organising to add to leech all the cards that had been marked incorrect 10 times in a row.
Then for my flashcard settings I exclude all cards in leech.

I think having something automatic would be good, it's a feature I probably would use if you had it, but it's not something I miss.
Having something like this would make it more complete SRS system.

?Better indication / control of cards' "pool" status in general
>Better review of recently incorrect cards, again intra-session
Can you elaborate on this?

One more request - If there was single click way to add the example sentences from a dictionary entry into flashcards that would be cool.

mikelove · Jul 20, 2010

taijidan said:
I am still using the old automatic because I never saw anything explaining the new automatic algorithm. Is it written up anywhere?

It's essentially the same as the old algorithm but with more tweakable constants - the instructions for the configuration screen for it should give you a pretty good idea how it works, though I suppose a separate / dedicated description might be better. But even just by default it's a bit more forgiving / friendly - less dramatic drops for failure and a "cushioning" option for new cards - so you may find yourself spending less time forced to review and re-review cards you happened to get wrong once.

taijidan said:
What settings do I need to set to match ANKI?

That depends on how you've got Anki configured - most of the scaling factors / default intervals / etc that go into their system can be changed. The defaults in both programs are fairly similar, though - ours were a bit harsher with the old system (somewhere between Anki and the extremely unforgiving SuperMemo), but we mellowed them after the vast majority of email on the subject seemed to suggest that cards were getting demoted too aggressively.

taijidan said:
>A more flexible definition of a "session" that allows new cards to appear mid-testing
Doesn't seem very useful to me.

So you prefer to start a session with your fixed set of cards to work on for the day and not see it change until it's finished? Or do you start several different sessions over the course of the day?

taijidan said:
>Better reporting - graphs etc designed specifically around spaced repetition
Yes! Something I've been asking for a long time. Specifically:
1) A Chart or Report predicting cards due for each day in the next week/month/year
2) Something like the hanzistats plugin for Anki - shows number of individual characters learned and % learned from frequency list and hsk lists (could link it to pleco categories).

I'm a little iffy on a count of characters learned because it doesn't differentiate between learning characters in different ways - you might be able to read a character perfectly but not reproduce it or pick it out of a list of similar characters (or even know its definition - you could just know the Pinyin), and you might only know a character as part of a word and not be able to recognize it on its own. An HSK vocabulary report makes sense, though it remains to be seen whether the old HSK or new HSK lists will be the standard going forward.

taijidan said:
>Better history tracking, in order to facilitate synchronizing test history between multiple devices and sharing data with other flashcard programs >(producing whatever statistics they expect)
Not a priority

That's interesting - so you like Anki but you don't use its sync features? (my understanding was that AnkiOnline was the biggest reason people used Anki nowadays)

taijidan said:
> Some sort of option to automatically bury / stop testing on cards that are sucking up a lot of time
Could be useful I did read about this leech concept on supermemo site. I think pleco can already handle it quite well.
For example I created a category called leech and then used batch organising to add to leech all the cards that had been marked incorrect 10 times in a row.
Then for my flashcard settings I exclude all cards in leech.

I think having something automatic would be good, it's a feature I probably would use if you had it, but it's not something I miss.
Having something like this would make it more complete SRS system.

Well it's nice because everybody can benefit from it without having to think about it - the main question is whether we do it as a filter (check for / exclude leech cards in every session) or whether we manually "tag" leeches and don't un-tag them later even if your definition of a leech changes.

taijidan said:
?Better indication / control of cards' "pool" status in general
>Better review of recently incorrect cards, again intra-session
Can you elaborate on this?

The "pool" status relates to the "limit unlearned" feature - cards in your "pool" will come up during sessions, while cards not in it will only come up once you run low on "unlearned" cards. We added a "force add to pool" option in 2.0.4 that would at least let you manually add a new card to that "pool" instead of waiting for "limit unlearned" to find it (or awkwardly creating a category / putting the card in it / starting a new session with only that category, thus forcing "limit unlearned" to add it), and another option to manually remove cards from that pool, but those two statuses should probably be simplified / consolidated.

"Better review of recently incorrect" -> don't just show incorrect cards at the very end of the session, since that may come several hours later and people may have forgotten them; the "loop" option helps somewhat, but true SRS nirvana would probably require us to show those incorrect cards again just a few minutes later.

taijidan said:
One more request - If there was single click way to add the example sentences from a dictionary entry into flashcards that would be cool.

I've been talking for years about redesigning the flashcard system to better accommodate sentences, it's just a matter of actually sitting down and doing it - it's a considerably bigger new feature than an undo button or even a new scoring algorithm.

numble · Jul 21, 2010

mikelove said:
I'm a little iffy on a count of characters learned because it doesn't differentiate between learning characters in different ways - you might be able to read a character perfectly but not reproduce it or pick it out of a list of similar characters (or even know its definition - you could just know the Pinyin), and you might only know a character as part of a word and not be able to recognize it on its own. An HSK vocabulary report makes sense, though it remains to be seen whether the old HSK or new HSK lists will be the standard going forward.

I think most learners know that recognizing a character once won't make you an expert on all of its uses and etymology, but I still think that it is a useful, if rough, rubric.

mikelove · Jul 23, 2010

numble said:
I think most learners know that recognizing a character once won't make you an expert on all of its uses and etymology, but I still think that it is a useful, if rough, rubric.

I suppose that's fair - would we be better off counting only single-character cards or counting characters that were included in multi-char words? Single-character at least guarantees some level of character-specific knowledge, but multi-char is the form way people ever see a lot of characters.

numble · Jul 25, 2010

mikelove said:
numble said:

I think most learners know that recognizing a character once won't make you an expert on all of its uses and etymology, but I still think that it is a useful, if rough, rubric.

Click to expand...

I suppose that's fair - would we be better off counting only single-character cards or counting characters that were included in multi-char words? Single-character at least guarantees some level of character-specific knowledge, but multi-char is the form way people ever see a lot of characters.

I would go with counting words in multi-character words, I don't test myself on many words by themselves. I usually just use it as a rough rubric about to know how many characters I've seen before.

taijidan · Jul 26, 2010

I want to know how many unique characters are in my 'learned cards' pool - including multi-character words, cheng-yu and sentences.

taijidan · Jul 26, 2010

>though I suppose a separate / dedicated description might be better.

Yes something like there was for the old automated scoring, which had a nice description of the algorithm would be great. I think at some time it changed from difficulty factor to easiness factor and after that it wasn't clear to me how it worked.

For example if I am using old automatic with aggressiveness 5, then swithing to new Automatic - should I set initial correct score to 600 to be consistent?

>So you prefer to start a session with your fixed set of cards to work on for the day and not see it change until it's finished? Or do you start several different sessions over the course of the day?

I have the show correct/incorrect cards option on, and then usually run a session until I have about 5 incorrect cards, then I stop the session and move onto the loop incorrect cards part. After that is complete, then I'll carry on with a new session. I find breaking down the number of incorrect cards into smaller batches help me focus on them more. These sessions might be split out over the day. ie. some in the morning some in the evening.

>That's interesting - so you like Anki but you don't use its sync features? (my understanding was that AnkiOnline was the biggest reason people used Anki nowadays)

I don't regularly use Anki, It's just something I have played around with from time to time. I used to periodically export my cards from pleco and then import them into Anki to view the statistics in the Hanzistats report. Also there was a time when I started doing sentence flashcards in Anki, but then I realised pleco could handle sentence flashcards just as well - I just needed to create them on a spreadsheet and then import them.

>"Better review of recently incorrect" -> don't just show incorrect cards at the very end of the session, since that may come several hours later and people may have forgotten them; the "loop" option helps somewhat, but true SRS nirvana would probably require us to show those incorrect cards again just a few minutes later.

Ok - sounds like it could be a cool feature.

mikelove · Jul 26, 2010

numble said:
I would go with counting words in multi-character words, I don't test myself on many words by themselves. I usually just use it as a rough rubric about to know how many characters I've seen before.

Makes sense, it is an oft-cited measure of Chinese proficiency, though one that feels a bit outdated with modern, less character-centric teaching methods.

taijidan said:
I want to know how many unique characters are in my 'learned cards' pool - including multi-character words, cheng-yu and sentences.

Sentences too? So you generally only create / review sentences where you know all the words (practicing the grammar structure / translation)? Do you find that the vocabulary in most of the online sentence lists / sentence databases is simple enough that you already know most of it, or do you have to spend a lot of time learning new words whenever you add a batch of new sentences?

taijidan said:
Yes something like there was for the old automated scoring, which had a nice description of the algorithm would be great. I think at some time it changed from difficulty factor to easiness factor and after that it wasn't clear to me how it worked.

For example if I am using old automatic with aggressiveness 5, then swithing to new Automatic - should I set initial correct score to 600 to be consistent?

Yes. "Easiness factor" and "Difficulty factor" are the same thing, we changed the name on iPhone to fit its actual meaning better - the whatever-it-is factor goes up when you remember a card well and it goes down when you remember it incorrectly.

taijidan said:
I have the show correct/incorrect cards option on, and then usually run a session until I have about 5 incorrect cards, then I stop the session and move onto the loop incorrect cards part. After that is complete, then I'll carry on with a new session. I find breaking down the number of incorrect cards into smaller batches help me focus on them more. These sessions might be split out over the day. ie. some in the morning some in the evening.

But would it help if the system did this for you automatically - let you set a certain number of cards to study (or a certain number to get incorrect) after which it would take a break and let you review incorrect cards - or do you prefer to control it manually?

taijidan said:
I don't regularly use Anki, It's just something I have played around with from time to time. I used to periodically export my cards from pleco and then import them into Anki to view the statistics in the Hanzistats report. Also there was a time when I started doing sentence flashcards in Anki, but then I realised pleco could handle sentence flashcards just as well - I just needed to create them on a spreadsheet and then import them.

Heh, well that's definitely an argument for adding character/HSK statistics reports ourselves.

mikelove · Jul 26, 2010

Another question to raise here, though only partially scoring-related - are there any other batch functions or search fields you'd like to see? Some particular transformation you'd like to have automated, maybe on the score / easiness factor or maybe even on the card text? (remove something, add something, change the score a certain way if certain conditions are met, attempt to auto-convert characters to Pinyin, bring up a list of any cards where the Pinyin doesn't appear to match the characters, etc)

HW60 · Jul 27, 2010

I think this forum might as well be interesting for WM users - I just stumbled into it and found some interesting points to answer to. But then I did not find any information about Automatic Scoring. I darkly remember there were formulas in the User Manual how the score changes depending on different parameters. Now I can customize several scoring rules, but I cannot find the basics, only (in the Flashcard Reference):
Automatic - flashcard scores are automatically managed by Pleco, factoring in your past history with the card (how often you've remembered it correctly/incorrectly) in order to determine how much to increase/decrease the score by after each answer

The Pleco 2.0 Instruction Manual (for WM) still has a link to "Automatic Scoring Algorithm", but nothing is linked to the link.

mikelove · Jul 27, 2010

HW60 said:
I think this forum might as well be interesting for WM users - I just stumbled into it and found some interesting points to answer to. But then I did not find any information about Automatic Scoring. I darkly remember there were formulas in the User Manual how the score changes depending on different parameters. Now I can customize several scoring rules, but I cannot find the basics, only (in the Flashcard Reference):
Automatic - flashcard scores are automatically managed by Pleco, factoring in your past history with the card (how often you've remembered it correctly/incorrectly) in order to determine how much to increase/decrease the score by after each answer

The Pleco 2.0 Instruction Manual (for WM) still has a link to "Automatic Scoring Algorithm", but nothing is linked to the link.

Actually you raise an interesting point - should we maybe have a "Pleco General" (or even specifically a "Pleco Flashcards General") forum that's cross-platform and covers features like SRS that are platform-neutral? My worry with that, however, would be that a lot of threads tend to incorporate both platform-specific and non-platform-specific ideas, and some UI-intensive ideas may not end up getting developed on every platform, or may not be necessary on every platform (e.g., WM users don't really care about built-in file sharing / file management / file backup capabilities since they have unlocked file systems and can use whatever utilities they want for that).

Sorry about the broken link - that section of the manual was removed but wasn't yet replaced by a description of the new algorithm. The descriptions of what the tweak options do should actually give you a pretty good idea of how the algorithm works, though, at least if you already understand the basic concept (scores increase by a certain % after a correct answer that's determined by a combination of the difficulty / easiness factor and whether the answer itself was a 4/5/6).

Charles · Jul 28, 2010

The only thing harder for me than trying to learn Chinese is trying to understand how the Pleco flashcard scoring systems work!

mikelove · Jul 28, 2010

Very well, here's a quickie outline for the newer system:

Initialize each card with "score" and "easiness factor" values of 100.
If a score change is permitted (the score hasn't been changed too recently), then:
- Correct answer: Divide the card's current "easiness factor" by the "easiness divisor" configured in Tweak Parameters, and multiply the result by the card's current score.
  If a "correct scale increase %" is configured in Tweak Parameters, multiply this new score by the appropriate percentage for the answer quality.
  If the card is being reviewed early or late in a repetition-spaced test, transform the score as configured in "If review early" / "If review late" in the main flashcard Scoring settings screen.
  Check to see that the card's score is at least as high as the "Initial correct score" configured in Tweak Parameters, and raise it to that score if not.
- Incorrect answer: Multiply the card's score by the "Incorrect score decrease %" configured in Tweak Parameters.
Ensure that the new score is between the "Minimum score" and "Maximum score" configured in "Tweak Parameters."
Finally, adjust the "easiness factor" based on the answer quality, according to the settings in the "Easiness Change" section of Tweak Parameters. Ensure that the new easiness factor is within the range between the "Minimum easiness" and "Maximum easiness" configured in that screen. Skip this step is the card was answered correctly ahead of schedule in a repetition-spaced test.

ben_gb · Jul 28, 2010

mikelove said:
Another question to raise here, though only partially scoring-related - are there any other batch functions or search fields you'd like to see? Some particular transformation you'd like to have automated, maybe on the score / easiness factor or maybe even on the card text? (remove something, add something, change the score a certain way if certain conditions are met, attempt to auto-convert characters to Pinyin, bring up a list of any cards where the Pinyin doesn't appear to match the characters, etc)

Hi Mike,

I'd like to see ways of searching for the following sorts of cards:

- cards which have been 'forced excluded' out of the pool
- (may as well search for 'forced included' too, if those are specifically marked in some way)
- cards which are not yet learned (according to whatever definition of 'learned' the user has set) but have been tested at least once
- cards which were marked wrong for the most recent X tests (user can specify how many)

Ben

Flashcard Scoring / SRS

皇帝

举人

皇帝

榜眼

榜眼

皇帝

举人

皇帝

状元

皇帝

状元

举人

举人

皇帝

皇帝

状元

皇帝

Charles

Guest

皇帝

探花