Java Map函数开发需求：统计同时购买商品ID21和27的客户数

阿华AIGC实验室

2026-5-26

Map Function Implementation for Retail Dataset

Got it, let's build that Map function you need for your retail dataset. This will check each input line for both tokens "21" and "27", then emit a fixed key-value pair only when both are present.

Step-by-Step Implementation

1. Core Logic Overview

Initialize two boolean flags item_21 and item_27 to false
Split the input text line into individual tokens using StringTokenizer
Iterate through each token: set item_21 to true if the token matches "21", set item_27 to true if it matches "27"
After processing all tokens, if both flags are true, emit the key Both_21_27 with the value 1

2. Full Java Code Example

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Mapper;

public class RetailItemMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    // Define reusable constant objects to optimize performance
    private static final Text OUTPUT_KEY = new Text("Both_21_27");
    private static final IntWritable OUTPUT_VALUE = new IntWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        
        // Initialize flags to track presence of target items
        boolean item_21 = false;
        boolean item_27 = false;

        // Convert input text to string and split into tokens
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);

        // Check each token for matches
        while (tokenizer.hasMoreTokens()) {
            String token = tokenizer.nextToken();
            if (token.equals("21")) {
                item_21 = true;
            } else if (token.equals("27")) {
                item_27 = true;
            }
            // Optional early exit to save processing once both items are found
            if (item_21 && item_27) {
                break;
            }
        }

        // Emit output only if both items are present in the line
        if (item_21 && item_27) {
            context.write(OUTPUT_KEY, OUTPUT_VALUE);
        }
    }
}

3. Key Details Explained

Reusable Constants: We define OUTPUT_KEY and OUTPUT_VALUE as class-level constants to avoid creating new objects for every input line—this cuts down on memory overhead and speeds up the MapReduce job.
Exact Token Matching: Using equals() ensures we only match the exact tokens "21" and "27", so values like "210" or "27a" won't be incorrectly flagged. If your dataset uses a non-whitespace delimiter (like commas for CSV), update the StringTokenizer to use that delimiter (e.g., new StringTokenizer(line, ",")).
Early Exit Optimization: Breaking out of the token loop once both flags are true reduces unnecessary processing, which is especially helpful for large retail datasets with long lines.
Efficient Output Types: We use IntWritable for the value instead of Text because it's more efficient for numeric data in Hadoop's serialization system.