Python Data Visualization in PyCharm: Build Pyecharts Word Clouds, Pie Charts, and Chinese Word Frequency Analysis Fast

This article explains how to quickly generate pie charts, rose charts, Nightingale charts, and Chinese word clouds with Python in PyCharm. It addresses common issues such as Chinese word segmentation, garbled fonts, background masks, and word frequency statistics. It is well suited for coursework, text analysis, and visualization demos. Keywords: PyCharm, Pyecharts, word cloud.

Technical specifications are summarized below

Parameter Description
Programming Language Python
Runtime Environment PyCharm + Anaconda
Visualization Output HTML charts, PNG/JPG images
Star Count Not provided in the original content
Core Dependencies pyecharts, pyecharts-snapshot, jieba, wordcloud, Pillow, matplotlib, imageio, numpy

This approach delivers multiple visualization outputs with beginner-friendly code

The original material is essentially a collection of visualization scripts for beginners. It is not a complete project. Instead, it organizes three capability layers around text-based word frequency analysis: Pyecharts chart output, Chinese word cloud generation, and word frequency counting with color mapping.

For most users, the real pain point is not the chart API itself. The challenge is whether the Chinese text processing pipeline actually works end to end: Can the text be segmented correctly? Can the font render Chinese characters? Can the background image be used as a mask? Can the output file be saved directly? This article restructures that workflow into a reusable technical guide.

Install the required dependencies first

# Install dependencies for charts and word cloud generation
pip install pyecharts pyecharts-snapshot jieba wordcloud pillow matplotlib imageio numpy

This step installs the foundational packages required for chart rendering, Chinese word segmentation, image processing, and word cloud generation.

Pyecharts works well for quickly generating interactive statistical charts

If your input is already structured as word plus count data, Pyecharts is the fastest option. It outputs HTML directly, which makes it ideal for classroom demos, report attachments, and browser-based previews.

At the code level, pie charts, rose charts, and Nightingale charts differ only slightly. The main changes are in rosetype, radius, and label formatting. The original data uses terms such as “沙具”, “沙子”, and “移动” as categories, which is sufficient for building basic charts.

Generate pie charts and rose charts with Pyecharts

from pyecharts.charts import Pie
import pyecharts.options as opts

# Prepare word frequency data
num = [4989, 3235, 2715, 1009, 865, 467, 428, 240, 230, 190]
lab = ['沙具', '沙子', '移动', '旋转', '创建', '椰子树', '草丛', '缩放', '石头', '篱笆']
data_pair = [(i, j) for i, j in zip(lab, num)]  # Combine categories and values

# Generate a standard pie chart
(
    Pie()
    .add(series_name='词频统计', data_pair=data_pair)  # Pass chart data
    .set_global_opts(title_opts=opts.TitleOpts(title='词频统计'))  # Set the title
    .set_series_opts(label_opts=opts.LabelOpts(formatter='{b}: {d}%'))  # Show percentages
    .render('D:/饼图-词频统计.html')  # Output the HTML file
)

# Generate a rose chart
(
    Pie()
    .add(
        series_name='词频统计',
        data_pair=data_pair,
        rosetype='radius',  # Radius mode creates a rose chart
        radius=[100, 150]
    )
    .set_global_opts(title_opts=opts.TitleOpts(title='词频统计'))
    .set_series_opts(label_opts=opts.LabelOpts(formatter='{b}: {c}'))  # Show raw values
    .render('D:/玫瑰图-词频统计.html')
)

This code uses the same word frequency dataset to generate both a standard pie chart and a rose chart.

Chinese word cloud generation depends on segmentation, fonts, and masks

When a Chinese word cloud fails, the problem is usually not wordcloud itself. In most cases, the input text has not been segmented, or the environment is missing a font that supports Chinese characters. The original example uses jieba for tokenization and explicitly sets msyh.ttf, which is a reliable choice on Windows.

If you want the word cloud to follow the outline of an image, simply load that image into the mask parameter. This produces a visualization shaped by the background image instead of a default rectangular layout.

Generate a Chinese color word cloud with jieba and wordcloud

import jieba
import wordcloud
from imageio.v2 import imread

# Read Chinese text
with open(r'D:\shapan.txt', encoding='gbk') as f:
    text = f.read()

# Segment Chinese text; without segmentation, the word cloud will not render well
words = jieba.lcut(text)
txt = ' '.join(words)  # WordCloud requires spaces between words

# Read the background image as the word cloud mask
pic = imread(r'D:\meinv.jpg')

# Create the word cloud object
wc = wordcloud.WordCloud(
    font_path=r'C:\WINDOWS\Fonts\msyh.ttf',  # Specify a Chinese font to avoid garbled text
    background_color='white',
    mask=pic
)

# Generate and save the word cloud image
wc.generate(txt)
wc.to_file(r'D:\meinvkeshihua.jpg')

This code converts a Chinese TXT file into a word cloud image shaped by a background mask.

The word frequency pipeline determines whether the result has analytical value

If your only goal is a visually appealing word cloud, the visual output may be enough. But if you want to analyze high-frequency words, you should clean the text first and then compute counts with Counter. The original script already covers regex-based cleanup, stopword filtering, Top-N extraction, and recoloring, so its structure is relatively complete.

This approach works well for extracting themes from interview transcripts, comment datasets, and classroom writing samples. Compared with directly generating a word cloud from segmented text, it places greater emphasis on frequency interpretability.

Build an analysis-ready Chinese word cloud with Counter

import re
import collections
import numpy as np
import jieba
import wordcloud
from PIL import Image
import matplotlib.pyplot as plt

# Read the raw text
with open('D://shapan.txt', 'rt', encoding='utf-8') as fn:
    string_data = fn.read()

# Remove invalid characters
pattern = re.compile(r'\t|\n|\.|-|:|;|\)|\(|\?|"')
string_data = re.sub(pattern, '', string_data)  # Remove noisy characters

# Use accurate mode for segmentation
seg_list = jieba.cut(string_data, cut_all=False)
object_list = []
remove_words = [',']  # Extend this stopword list as needed

for word in seg_list:
    if word not in remove_words:
        object_list.append(word)  # Keep valid words

# Count word frequency
word_counts = collections.Counter(object_list)
print(word_counts.most_common(10))  # Output the top 10 high-frequency words

# Read the background image and generate the word cloud
mask = np.array(Image.open('D://meinv.jpg'))
wc = wordcloud.WordCloud(
    font_path='C:/Windows/Fonts/msyh.ttf',
    mask=mask,
    max_words=2000,
    max_font_size=100
)
wc.generate_from_frequencies(word_counts)  # Generate the word cloud directly from frequencies

# Display the word cloud
plt.imshow(wc)
plt.axis('off')  # Hide axes
plt.show()

This code completes the full analysis pipeline: clean the text, segment it, count frequencies, and generate a word cloud.

Image assets in this example mainly serve decorative rather than technical purposes

AI Visual Insight: These images come from page ads or site decoration assets. They do not directly show chart structures, segmentation results, mask contours, or visualization output, so they provide no meaningful technical reference for the implementation.

Environment issues should be addressed first when running these scripts in PyCharm

First, make sure the Chinese text encoding matches the actual file encoding. The original example uses gbk. If your TXT file is UTF-8, you must update the code accordingly or you will get a decoding error.

Second, confirm that the font path actually exists. On Windows, C:\Windows\Fonts\msyh.ttf is usually a safe first option. Third, use project-relative output paths whenever possible to avoid save failures caused by missing drive letters or invalid absolute paths.

A recommended project structure is easier to reuse

project/
├── data/
│   ├── shapan.txt
│   └── mask.jpg
├── output/
│   ├── pie.html
│   ├── rose.html
│   └── wordcloud.jpg
└── main.py

This structure reduces path coupling and makes the project easier to run on different machines.

FAQ provides clear answers to common issues

1. Why does my Chinese word cloud show garbled text or square boxes?

You must explicitly specify a font file that supports Chinese characters, such as msyh.ttf. If you do not set font_path, the default font used by wordcloud usually does not support Chinese.

2. Why does the generated word cloud not match the shape of the background image?

You need to load the image into the mask parameter, and the image outline should be clear. If the mask image is too small, contains too much solid color, or fails to load, the word cloud may fall back to a standard rectangular layout.

3. How should I choose between Pyecharts and wordcloud?

Choose Pyecharts if you need interactive HTML charts. Choose wordcloud if you want to emphasize text themes and visual expression. If you need both statistics and presentation, combine the two tools.

AI Readability Summary: This article rebuilds a PyCharm-based Python data visualization workflow that covers Pyecharts pie charts, rose charts, Nightingale charts, and Chinese word cloud generation. It focuses on solving common issues such as Chinese word segmentation, font rendering, background masks, and word frequency statistics. It is a practical fit for coursework, text analysis, and fast chart-based presentations.