Bigrams sorted by frequency with pinyin & English?

DavidMars

进士
I'm searching for a list of Mandarin Bigrams sorted by frequency. This is just for general study, so I'm not too concerned about what corpus the frequency is derived from, as long as it is relatively modern. News, popular culture, etc. are all fine.

Ideally, I'd like to have pinyin and English in the list. I'm OK with tab-delimited or comma-delimited or Pleco flashcards.

Am I looking in all the wrong places, or is this hard to find?

Thanks.
 

Shun

状元
Hi DavidMars,

@John. has uploaded excellent, modern frequency lists here:


You could do the following:
  1. Filter the lists to two-character word frequency lists (using regex or Python), sort them in descending order using the frequency number (using a spreadsheet app or Python), then discard the number field
  2. Import the lists of Hanzi words into Pleco, telling it to Fill in missing fields. If you only allow free dictionaries (and some paid ones, like the ABC dictionary) as a source for your card definitions, you could export the lists from Pleco to text later, including the pinyin and definitions. If you only need the lists inside Pleco, you could also use all of your dictionaries as a source for your card definitions.
Enjoy,

Shun
 
Last edited:

DavidMars

进士
Shun;

Thanks very much for your reply. I have downloaded the file and saved the spreadsheet column with the bigrams sorted by frequency as a file named Bigrams_For_Pleco.txt. I saved it in csv format but removed the second file type extension .csv that gave me the file name Bigrams_For_Pleco.txt.csv so am working with the file name Bigrams_For_Pleco.txt

The import format looked perfect, as per the screenshot below.

The import process seemed to work perfectly, as per the second screenshot below.

I'm expecting to see a set of Flashcards named "Bigrams_For_Pleco" in Pleco's "Organize Cards" but I don't see it. I waited some time and did a Pleco restart. No joy.

Any thoughts?

Thanks.

Encoding Check.PNG
Successful Import.PNG
Successful Import.PNG
 

Shun

状元
Hi DavidMars,

you're welcome! The imported cards have probably ended up in the "Uncategorized" category. To avoid this, you can add a line

// <destination category path>

at the beginning of the text file to be imported. Now that you've already imported the cards, you can also move them from Uncategorized to a category of your choice.

Best, Shun
 

DavidMars

进士
Shun, I have hundreds of entries in Uncategorized, so it is very time consuming to try to separate the new Bigram entries from the old ones in Uncategorized. So I'd like to export the file into a new Flashcard folder named:

Bigram

by using this as the first line of my text file:

// <Bigram>
Flashcard Folder Name.jpg


It doesn't seem to do the job. Is the destination category path more complex?

Thanks again.

David
Flashcard Path Name.jpg

Flashcard Path Name.jpg
 

Shun

状元
Hi David,

oh, I'm sorry about having misled you a little.

For the category path, you may just write as the first line, then import:

// Bigrams

After the words have been imported into the Bigrams category, you can split it sequentially into groups of 100.

Would you like to share your Hanzi list to this thread?

You're welcome,

Shun
 

DavidMars

进士
Shun;

Still no luck!

I am attaching the file I am working with. I am actually working with Column B only, but I include the other columns for those who may want to look at the frequency count, etc. I am using OpenOffice on my Macintosh, I expect anyone could change the extension to .xls for Microsoft Excel, etc. I could not upload the file here using the extension .ods so changed it to .txt.

Cheers, David
 

Attachments

  • 13253 Bigrams By Frequency.txt
    507.1 KB · Views: 198

Shun

状元
Hi David,

thanks! I copied the column B into a text file using a text editor. Does the file you are importing into Pleco look like this? (see attachment)

Cheers, Shun
 

Attachments

  • 13253 Bigrams By Frequency - Hanzi.txt
    90.6 KB · Views: 211

DavidMars

进士
Shun, brilliant, thanks. I now have my Bigram flashcards in my Bigram flashcards folder. Using your file the // Bigrams appeared after using the Import button when Pleco asks me to review the format. It was not appearing with my file, no idea why. You're a lifesaver, many thanks.
 

Shun

状元
Hi David,

you're welcome! As a next step, for another category named "Bisyllabic Words", you could do the following:

The BCC frequency lists include a lot of bigrams which are just bigrams (= frequently co-occurring Hanzi) but not lexical words. So you could import only those bigrams which are actual words and have a dictionary definition, by choosing this setting:

Import only words.PNG

(Missing entries: Skip and change the first line in the txt file to "// Bisyllabic Words")

Enjoy, Shun
 
Top