Python 2.7如何将JSON推文情感极性结果转为CSV用于R

阿华AIGC实验室

2026-5-22

嘿，我来帮你搞定这个Python 2.7里把推文情感分析结果导出成R兼容CSV的问题！你之前用pandas没成功大概率是踩了Python2.7的编码坑或者数据结构的小细节问题，我给你整理一套能跑通的方案：

完整解决方案步骤

1. 先把情感分析结果整理成结构化数据

首先确保你从JSON读取的tweet_text是包含推文ID和文本的二元组列表，然后把每条推文的ID、文本、情感得分（compound/neg/neu/pos）规整成统一的列表结构，方便后续导出：

# 假设你已经初始化了情感分析器analyser（比如VADER的SentimentIntensityAnalyzer）
tweet_polarity = []
for tweet in tweet_text:
    tweet_id = tweet[0]
    tweet_content = tweet[1]
    # 获取情感得分
    polarity_scores = analyser.polarity_scores(tweet_content)
    # 把所有需要的字段打包成一行数据
    tweet_polarity.append([
        tweet_id,
        tweet_content,
        polarity_scores['compound'],
        polarity_scores['neg'],
        polarity_scores['neu'],
        polarity_scores['pos']
    ])

2. 用pandas导出（推荐，解决编码问题）

Python2.7的pandas处理中文/特殊字符容易出问题，导出时必须指定编码，还要关闭索引列（避免R读取时多一列无用数据）：

import pandas as pd

# 定义表头，让R读取时能直接识别字段名
columns = ['tweet_id', 'tweet_content', 'compound_polarity', 'negative', 'neutral', 'positive']
# 转成DataFrame
df = pd.DataFrame(tweet_polarity, columns=columns)

# 导出CSV，关键参数解决编码和格式问题
df.to_csv('tweet_sentiment_results.csv', 
          encoding='utf-8', 
          index=False,
          encoding_errors='replace')  # 处理Python2.7里的编码异常

3. 纯Python内置模块方案（不用pandas也能行）

如果pandas安装有问题，用Python自带的csv模块也能搞定，同样要注意编码转换：

import csv
import codecs  # Python2.7处理utf-8文件需要这个

# 打开文件时指定utf-8编码
with codecs.open('tweet_sentiment_results.csv', 'w', 'utf-8') as f:
    writer = csv.writer(f)
    # 先写入表头
    writer.writerow(['tweet_id', 'tweet_content', 'compound_polarity', 'negative', 'neutral', 'positive'])
    # 逐行写入数据，注意把unicode转成utf-8字符串
    for row in tweet_polarity:
        encoded_row = []
        for item in row:
            if isinstance(item, (int, float)):
                encoded_row.append(str(item))
            else:
                # Python2.7里unicode转utf-8 str
                encoded_row.append(item.encode('utf-8'))
        writer.writerow(encoded_row)

4. R里验证读取结果

导出完成后，在R里用下面的代码读取，确保编码匹配：

# 读取CSV，指定utf-8编码避免乱码
tweet_data <- read.csv("tweet_sentiment_results.csv", fileEncoding = "UTF-8")
# 查看前5行验证
head(tweet_data)

踩坑提示

编码乱码：Python2.7里字符串分str和unicode，导出前一定要把unicode类型的推文文本转成utf-8格式的str，不然CSV里会出现乱码。
推文ID变成科学计数法：如果你的推文ID是长数字，R读取时可能自动转成科学计数法，解决办法是导出时把tweet_id转成字符串，或者在R里指定列类型：read.csv(..., colClasses = c(tweet_id = "character"))。
字段长度不一致：确保tweet_polarity里的每一行数据长度都和表头一致，不然导出时会报错。