Flashcards Statistics Competition

Shun

状元
Great, if you do that, you will surely get a feeling for how much you still have to learn to reach that point.
 

HW60

状元
Here is mine
View attachment 2180

about 230k total repetitions, 10k cards learned.
I'm using SRS flashcard every day with 200 to 250 cards. I mark as "forgotten" when I forgot the card or even the tone, so that could explain the high number of repetitions.
This is the first really impressing statistic. I do not like too low figures for cards learned, like 350 or 400, because such a low score only shows that you remembered the card after 2 or 3 days. But @FrancoisTaipei has only 104 unlearned cards, and even with "cards learned: score >= 1000) more than 10000 out of 10715 belong to cards learned. This is an excellent result! The 83% average correct certainly are above average too.

I have one question @Mike: Total Repetitions / Per Day is 230348/191 = 1206 days of repetition, Cards Learned / Per Day = 10611/8 = 1326 days of successful learning. So it took @FrancoisTaipei 1326-1206=120 days more learning than repeating - what do these figures mean?
 

Shun

状元
Hi HW60,

I agree, @FrancoisTaipei has repeated the same cards over and over until he really knew them. As for your question to Mike, I'd say that's mostly due to rounding error. 10611/1206 would be 8.8 cards learned per day, so perhaps that was rounded off.
 

lovepleco

秀才
How about adding 'characters learned' in the statistics screen instead of just 'cards' as cards could also be any combination (words) and this is probably the reason I see 10,000+ cards in some of the stats in the above posts. I read the Chinese government puts literacy at 2000 characters, primary school students are expected to know 2500 and well-educated Chinese may know about 8000 characters. Comparing against that may provide an additional, optional indicator of progress and motivation.

How to do it? Could this just be as simple as breaking down any learned cards with a combination (word) into it's component characters? Would this be reasonably easy to implement?

Also, how to ensure or at least limit the likelihood we don't learn duplicates and use duplicates in the statistics? I know about the 'Duplicate' search option under the magnifying class in 'Organize Cards' but I don't know how to see and possibly subtract their impact from the statistics.
 

Shun

状元
Good point, of my 12,705 learned cards, 4,500 were duplicates, with 8,205 remaining. Even more interesting, in my 12,705 learned cards, there were 2,925 unique characters. My Learned threshold was at >=400 points. So I'd guess I can recognize roughly 3,500 characters at this point.

I did this with Export, Excel, and TextWrangler hard wrapping, but it would surely be nice (and very relevant/motivating) if Pleco did it.
 

HW60

状元
Hi HW60,

I agree, @FrancoisTaipei has repeated the same cards over and over until he really knew them. As for your question to Mike, I'd say that's mostly due to rounding error. 10611/1206 would be 8.8 cards learned per day, so perhaps that was rounded off.
That is why I asked @Mike - rounding off 8.8 to 8.0 is not standard rounding.

Of course it is a matter of taste if you prefer effort or result. I think the certainly great effort of @FrancoisTaipei had an outstanding result. I can easily compete with him as Total Repetitions is concerned, as I am an old Pleco user.

There is one more drawback of the actual statistic information: overdue cards. I have many cards which I did not review for a long time, but some of them count as cards learned, and I am sure I would not know all of those after such a long time. So my statistic is actually better than my knowledge.
 

Shun

状元
That is why I asked @Mike - rounding off 8.8 to 8.0 is not standard rounding.

In programming, this kind of rounding happens a lot with integers. He will surely tell us!

There is one more drawback of the actual statistic information: overdue cards. I have many cards which I did not review for a long time, but some of them count as cards learned, and I am sure I would not know all of those after such a long time. So my statistic is actually better than my knowledge.

True, but Pleco can never know exactly what is in the learner's head except by querying him all the cards at once, which is impossible. But you're right, some sort of half life for points may be the way to go. (which could also be configured)
 
Last edited:

HW60

状元
How about adding 'characters learned' in the statistics screen
I have different categories for characters and words and therefore can easily see the number of characters and words and their statistic. One character words belong to two categories. Therefore the statistics for words and characters cannot be "added".

In ancient times there was a discussion to let Pleco automatically add character flashcards when a word with new characters is added to the flashcards.
 

HW60

状元
Also, how to ensure or at least limit the likelihood we don't learn duplicates and use duplicates in the statistics? I know about the 'Duplicate' search option under the magnifying class in 'Organize Cards' but I don't know how to see and possibly subtract their impact from the statistics.
What about removing the duplicate flashcards? And use different categories, if you need the category information?
 

mikelove

皇帝
Staff member
In ancient times there was a discussion to let Pleco automatically add character flashcards when a word with new characters is added to the flashcards.

Batch command for that has been written now + is currently active in development versions of 4.0.
 

lovepleco

秀才
Good point, of my 12,705 learned cards, 4,500 were duplicates, with 8,205 remaining. Even more interesting, in my 12,705 learned cards, there were 2,925 unique characters. My Learned threshold was at >=400 points. So I'd guess I can recognize roughly 3,500 characters at this point.

Nice! Related, I just found this:
https://www.quora.com/How-many-Chin...rn-in-one-two-etc-years/answer/Ashwin-Purohit

To be able to recognize 99% of characters (to just read the abstract characters themselves), you should learn about 1572 characters.
To be able to recognize 99% of full words (to know what is actually meant), you should learn about 12,054 words.
The above numbers are by frequency. ...

I did this with Export, Excel, and TextWrangler hard wrapping, but it would surely be nice (and very relevant/motivating) if Pleco did it.

I'm on Win and don't use Excel. My exported Pleco Flashcards XML file also crashed Open Office Spreadsheet. In case you got more info how you did it (i.e. export settings), let me know. ;)
 

lovepleco

秀才
I have different categories for characters and words and therefore can easily see the number of characters and words and their statistic. One character words belong to two categories. Therefore the statistics for words and characters cannot be "added".

If each character belongs to two categories (the one for characters and the one for words), does it mean you have lots of duplicates overall? Is that an issue in making you learn more cards than needed? Maybe it's what you want, learning each component of a word in isolation + learning the composite word in isolation. Does your 'Words' category never include single characters?

Did you apply any automation already to break down words into chars? i.e. I use some existing HSK lists and custom lists I built up over time during every-day life.
 

lovepleco

秀才
Batch command for that has been written now + is currently active in development versions of 4.0.

Great to hear! Would this command also convert words of any existing categories such as the HSK categories or any custom categories into individual characters? I know you probably wouldn't say, but for the unlikely chance: when about could we get our hands on dev version v4...maybe as beta user? ;)
 

Shun

状元
Hi lovepleco!

Thanks, you can try the following:

1. In Organize Cards, Search for Flashcards whose score is equal to or higher than your Learned score
2. Select all cards in the results list, tap Batch
3. At the very bottom, choose "Export Cards"
4. Set "Text" as your Export file format, choose "Simplified" and disable exporting of any definitions
5. Open the Text file in Open Office Spreadsheet with 'tab' as delimiter
6. Remove the column containing the pinyin
7. Copy the column containing the Chinese characters
8. In a text editor, paste the Chinese characters in, then hard wrap the the text file to a width of one character
9. Copy the wrapped text back into a new column in OO Spreadsheet
10. Select the pasted column, find a command to Remove Duplicates in OO Spreadsheet. Possibly, also the text editor or another tool can do this.

If there are any difficulties, just ask.
 

HW60

状元
If each character belongs to two categories (the one for characters and the one for words), does it mean you have lots of duplicates overall?
Duplicate means 2 or more flashcards with the same headword and pronunciation. I have no duplicates in my flashcards, neither in my character categories nor in my word categories. My character flashcards usually have one or two categories, e.g. 馆 has only one character category, because I have no single character word 馆 (yet), but several flashcards like 图书馆, 博物馆 etc. with this character, while 高 has 2 categories, one for the character, one for the word (but only one flashcard, no duplicate).
Is that an issue in making you learn more cards than needed? Maybe it's what you want, learning each component of a word in isolation + learning the composite word in isolation.
Yes, I try to learn characters and words in isolation. From the beginning of my learning I kept them in separate character and word categories as pairs of categories. When I first learned 图书馆, I added 3 characters in the pair category of that word.
Does your 'Words' category never include single characters?
It does include single characters like 高 above, but 高 is also included in the pair character category.
Did you apply any automation already to break down words into chars? i.e. I use some existing HSK lists and custom lists I built up over time during every-day life.
About once in a week I make a user dictionary from all my flashcards, which is on top of all my dictionaries and therefore comes first. When I enter a word in the Pleco dictionary, I can see at once if I have a flashcard with this word, because then it is in the user dictionary. When I enter 图书, my dictionary displays 图书馆, 图书室, and 图书馆员. And I find 馆 and other guan if I enter guan in the dictionary, so I know if I have already a flashcard. I could do the same in the search screen of Pleco, but that requires switching from dictionary to flashcards and back again.
From time to time I made a big Excel table in which all words were separated in their character components, and the category name for each character was shown. When a character was missing, I could see it in that table. But that takes some time and effort.

I think things may become better with the long awaited Pleco 4.0, but that was already a promise for the long awaited 3.0 ...
 
Last edited:

mikelove

皇帝
Staff member
Already done for 4.0, actually - flashcards are now treated as a user dictionary and automatically included in searches. Though you can also see if you have a flashcard for a particular word just by checking for the [+] icon in the search result item.
 

捞什子

秀才
Came looking for 4.0 news then saw this thread in the new posts section. So it's a bit unfair that I have two years of reps on everyone, but I bet Shun'll still be interested in another set of statistics to chew on.

Screenshot_2017-01-05-23-38-51.png

A few notes: no duplicate cards, cards graduate through a series of tests (multiple choice --> tone --> fill in the blanks [plus a self-graded written test for words too long to appear in tone/fitb tests]) with a maximum score set below the learned score so that cards are never learned and in theory, I'll see every card at least every year and a half or so. In practice, I'm too lazy to cut my due cards pile down, so that interval is much longer. Finally, I opted for a custom repetition spaced regime (only for the fitb and self-graded tests - the mc and tone tests that feed into them use manual scoring) with a correct scale score increase of 100% that lets me play with spacing more easily in excel.

Started using the flashcard module in 2011 and mileage has really varied by year - the average repetitions per day isn't representative of years when I took the subway to work. ;)
 

Shun

状元
Hi Laoshizi,

wow, thanks, steady work! So like francoistaipei's example, the cards you added to Flashcards were also those that you later actually checked yourself on. By contrast, I tend to add anything that catches my interest to an "Encountered" Flashcards category, just so I have a record of it. That's why I have over 70,000 cards. I also think studying Flashcards is ideal for the daily commute. But if there's no commute, perhaps one needs to find another time slot? :)

I only now realize that the "Average repetitions per day" must be looking at the recent past. Same for me, currently at 7 per day.

For repeating older cards, I simply work with selecting old categories and perhaps adding a card filter to remove recently studied cards and those I've already studied plenty. But I think it's great to have such an an elaborate system.

Cheers, Shun
 

捞什子

秀才
It certainly evolved a lot over the years. Hoping 4.0 will accommodate it. :p

As for finding another time slot, throwing this laptop out the window would help more than anything. Books and flashcards always take a back seat to clicking around and regret.
 

Shun

状元
Definitely will. It's interesting to see the different ways people are doing things. Most participants seem to be rational planners so far. I prefer to stay flexible.

What you say is true, it might help to just put the laptop to sleep whenever the work on it is done. That forces you to find new, more useful things to do, and if you need the laptop for that, you can still wake it up again.
 
Top