You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

无分支DotProduct函数实现技术咨询及MIPS汇编代码问题求解

Solution: Branchless DotProduct Implementation in MIPS

I get it, implementing a branchless DotProduct function in MIPS can feel tricky at first—especially when you can't use any conditional branches or jumps to external labels. Let's break down how to pull this off, plus fix some issues in your existing main program.

Key Idea for Branchless DotProduct

The core trick here is using masking to control whether each iteration's operations affect the result. We'll repeat a fixed number of operation blocks (enough to handle any reasonable vector length, like 32 iterations since MIPS registers are 32-bit), and use the remaining count to generate a mask that disables operations once we've processed all elements.

Here's how the masking works:

  1. For each iteration, check if the remaining count is greater than 0.
  2. Generate a mask: -1 (all bits set to 1) if we still have elements to process, 0 otherwise.
  3. Apply this mask to the product of the current elements—this zeroes out the product once we're done.
  4. Use the mask to only update the vector addresses and decrement the count when we still have elements left.

Corrected Branchless DotProduct Function

DotProduct:
    # Initialize registers:
    # $v0 = dot product result (starts at 0)
    # $t0 = remaining element count
    # $t1 = current address of vector 1
    # $t2 = current address of vector 2
    move $v0, $zero
    move $t0, $a2
    move $t1, $a0
    move $t2, $a1

    # --------------------------
    # Iteration 1: Process element 0
    # --------------------------
    slti $t3, $t0, 1       # $t3 = 1 if count is 0, else 0
    xori $t3, $t3, 1       # Flip bits: $t3 = 1 if count >0, else 0
    addi $t3, $t3, -1      # Convert to mask: -1 (全1) if count>0, else 0
    lw $t4, ($t1)          # Load vector1 element
    lw $t5, ($t2)          # Load vector2 element
    mult $t4, $t5          # Multiply elements
    mflo $t4               # Get product from LO register
    and $t4, $t4, $t3      # Mask product: zero if count is 0
    add $v0, $v0, $t4      # Accumulate to result
    andi $t6, $t3, 4       # Get 4 if mask is active (count>0), else 0
    add $t1, $t1, $t6      # Update vector1 address only if needed
    add $t2, $t2, $t6      # Update vector2 address only if needed
    add $t0, $t0, $t3      # Decrement count only if needed (mask is -1)

    # --------------------------
    # Iteration 2: Process element 1
    # --------------------------
    slti $t3, $t0, 1
    xori $t3, $t3, 1
    addi $t3, $t3, -1
    lw $t4, ($t1)
    lw $t5, ($t2)
    mult $t4, $t5
    mflo $t4
    and $t4, $t4, $t3
    add $v0, $v0, $t4
    andi $t6, $t3, 4
    add $t1, $t1, $t6
    add $t2, $t2, $t6
    add $t0, $t0, $t3

    # --------------------------
    # Repeat this block 30 more times (total 32 iterations)
    # --------------------------
    # (Copy-paste the iteration block above 30 times to handle up to 32 elements)
    # For brevity, we'll skip writing all 32 here, but you need to include them.

    # --------------------------
    # Iteration 32: Process element 31
    # --------------------------
    slti $t3, $t0, 1
    xori $t3, $t3, 1
    addi $t3, $t3, -1
    lw $t4, ($t1)
    lw $t5, ($t2)
    mult $t4, $t5
    mflo $t4
    and $t4, $t4, $t3
    add $v0, $v0, $t4
    andi $t6, $t3, 4
    add $t1, $t1, $t6
    add $t2, $t2, $t6
    add $t0, $t0, $t3

    # Return to caller
    jr $ra

Notes on This Implementation:

  • No branch/jump instructions: The only jump is jr $ra for returning to the caller, which is allowed.
  • Handles variable-length vectors: Up to 32 elements (easily extendable by adding more iteration blocks).
  • Masking ensures correctness: Once the count hits 0, all subsequent operations won't modify the result, addresses, or count.

Fixes to Your Main Program

Your main program had several issues that needed fixing to correctly calculate vector length and display results:

  1. Vector length calculation: Your original code used arrayEnd which is after vector3, leading to incorrect element counts. We'll add end labels for each vector to get accurate lengths.
  2. Display function bugs: Your display loop had incorrect syscall ordering and infinite loops. We'll rewrite it to properly print messages and return to the main program.
  3. Incorrect vector selection: You accidentally reused vector2 for the second dot product calculation; we'll fix that to use vector3.

Corrected Full Code

.data
vector1:    .word 2, 6, 2
vector1_end:.word 0  # Marker for end of vector1
vector2:    .word 4, -3, 5
vector2_end:.word 0  # Marker for end of vector2
vector3:    .word 5, 15, 5
vector3_end:.word 0  # Marker for end of vector3
per:        .asciiz "Two vectors are perpendicular"
n_per:      .asciiz "Two vectors are not perpendicular"
newLine:    .asciiz "\n"
.align 3

.text
main: 
    # Calculate dot product of vector1 and vector2
    la $a0, vector1
    la $a1, vector2
    # Get vector length: (end address - start address) / 4 bytes per element
    la $t0, vector1_end
    sub $a2, $t0, $a0
    srl $a2, $a2, 2       # Divide by 4 (shift right by 2)
    jal DotProduct
    move $s0, $v0
    jal display           # Print result

    # Reset registers for next calculation
    li $a0, 0
    li $a1, 0
    li $a2, 0
    li $s0, 0
    li $v0, 0

    # Calculate dot product of vector1 and vector3
    la $a0, vector1
    la $a1, vector3
    la $t0, vector1_end
    sub $a2, $t0, $a0
    srl $a2, $a2, 2
    jal DotProduct
    move $s0, $v0
    jal display           # Print result

    # Exit program
    li $v0, 10
    syscall

# Branchless DotProduct function (as written above, with all 32 iterations)
DotProduct:
    move $v0, $zero
    move $t0, $a2
    move $t1, $a0
    move $t2, $a1

    # Iteration 1
    slti $t3, $t0, 1
    xori $t3, $t3, 1
    addi $t3, $t3, -1
    lw $t4, ($t1)
    lw $t5, ($t2)
    mult $t4, $t5
    mflo $t4
    and $t4, $t4, $t3
    add $v0, $v0, $t4
    andi $t6, $t3, 4
    add $t1, $t1, $t6
    add $t2, $t2, $t6
    add $t0, $t0, $t3

    # Iteration 2
    slti $t3, $t0, 1
    xori $t3, $t3, 1
    addi $t3, $t3, -1
    lw $t4, ($t1)
    lw $t5, ($t2)
    mult $t4, $t5
    mflo $t4
    and $t4, $t4, $t3
    add $v0, $v0, $t4
    andi $t6, $t3, 4
    add $t1, $t1, $t6
    add $t2, $t2, $t6
    add $t0, $t0, $t3

    # Iteration 3 to 31 (copy-paste the iteration block here)
    # ...

    # Iteration 32
    slti $t3, $t0, 1
    xori $t3, $t3, 1
    addi $t3, $t3, -1
    lw $t4, ($t1)
    lw $t5, ($t2)
    mult $t4, $t5
    mflo $t4
    and $t4, $t4, $t3
    add $v0, $v0, $t4
    andi $t6, $t3, 4
    add $t1, $t1, $t6
    add $t2, $t2, $t6
    add $t0, $t0, $t3

    jr $ra

# Display function to print perpendicularity status
display: 
    beq $s0, $zero, not_perpendicular
    # Print perpendicular message
    li $v0, 4
    la $a0, per
    syscall
    j print_newline

not_perpendicular: 
    # Print non-perpendicular message
    li $v0, 4
    la $a0, n_per
    syscall

print_newline:
    # Print newline
    li $v0, 4
    la $a0, newLine
    syscall
    # Return to main
    jr $ra

How It Works

  • DotProduct: Each iteration block uses masking to only process elements while the count is positive. Once the count reaches 0, all subsequent blocks do nothing to the result.
  • Main Program: Correctly calculates vector lengths using end markers, calls DotProduct for both vector pairs, and uses a fixed display function to print results without infinite loops.

内容的提问来源于stack exchange,提问作者Aiden

火山引擎 最新活动