无分支DotProduct函数实现技术咨询及MIPS汇编代码问题求解
I get it, implementing a branchless DotProduct function in MIPS can feel tricky at first—especially when you can't use any conditional branches or jumps to external labels. Let's break down how to pull this off, plus fix some issues in your existing main program.
Key Idea for Branchless DotProduct
The core trick here is using masking to control whether each iteration's operations affect the result. We'll repeat a fixed number of operation blocks (enough to handle any reasonable vector length, like 32 iterations since MIPS registers are 32-bit), and use the remaining count to generate a mask that disables operations once we've processed all elements.
Here's how the masking works:
- For each iteration, check if the remaining count is greater than 0.
- Generate a mask:
-1(all bits set to 1) if we still have elements to process,0otherwise. - Apply this mask to the product of the current elements—this zeroes out the product once we're done.
- Use the mask to only update the vector addresses and decrement the count when we still have elements left.
Corrected Branchless DotProduct Function
DotProduct: # Initialize registers: # $v0 = dot product result (starts at 0) # $t0 = remaining element count # $t1 = current address of vector 1 # $t2 = current address of vector 2 move $v0, $zero move $t0, $a2 move $t1, $a0 move $t2, $a1 # -------------------------- # Iteration 1: Process element 0 # -------------------------- slti $t3, $t0, 1 # $t3 = 1 if count is 0, else 0 xori $t3, $t3, 1 # Flip bits: $t3 = 1 if count >0, else 0 addi $t3, $t3, -1 # Convert to mask: -1 (全1) if count>0, else 0 lw $t4, ($t1) # Load vector1 element lw $t5, ($t2) # Load vector2 element mult $t4, $t5 # Multiply elements mflo $t4 # Get product from LO register and $t4, $t4, $t3 # Mask product: zero if count is 0 add $v0, $v0, $t4 # Accumulate to result andi $t6, $t3, 4 # Get 4 if mask is active (count>0), else 0 add $t1, $t1, $t6 # Update vector1 address only if needed add $t2, $t2, $t6 # Update vector2 address only if needed add $t0, $t0, $t3 # Decrement count only if needed (mask is -1) # -------------------------- # Iteration 2: Process element 1 # -------------------------- slti $t3, $t0, 1 xori $t3, $t3, 1 addi $t3, $t3, -1 lw $t4, ($t1) lw $t5, ($t2) mult $t4, $t5 mflo $t4 and $t4, $t4, $t3 add $v0, $v0, $t4 andi $t6, $t3, 4 add $t1, $t1, $t6 add $t2, $t2, $t6 add $t0, $t0, $t3 # -------------------------- # Repeat this block 30 more times (total 32 iterations) # -------------------------- # (Copy-paste the iteration block above 30 times to handle up to 32 elements) # For brevity, we'll skip writing all 32 here, but you need to include them. # -------------------------- # Iteration 32: Process element 31 # -------------------------- slti $t3, $t0, 1 xori $t3, $t3, 1 addi $t3, $t3, -1 lw $t4, ($t1) lw $t5, ($t2) mult $t4, $t5 mflo $t4 and $t4, $t4, $t3 add $v0, $v0, $t4 andi $t6, $t3, 4 add $t1, $t1, $t6 add $t2, $t2, $t6 add $t0, $t0, $t3 # Return to caller jr $ra
Notes on This Implementation:
- No branch/jump instructions: The only jump is
jr $rafor returning to the caller, which is allowed. - Handles variable-length vectors: Up to 32 elements (easily extendable by adding more iteration blocks).
- Masking ensures correctness: Once the count hits 0, all subsequent operations won't modify the result, addresses, or count.
Fixes to Your Main Program
Your main program had several issues that needed fixing to correctly calculate vector length and display results:
- Vector length calculation: Your original code used
arrayEndwhich is aftervector3, leading to incorrect element counts. We'll add end labels for each vector to get accurate lengths. - Display function bugs: Your
displayloop had incorrect syscall ordering and infinite loops. We'll rewrite it to properly print messages and return to the main program. - Incorrect vector selection: You accidentally reused
vector2for the second dot product calculation; we'll fix that to usevector3.
Corrected Full Code
.data vector1: .word 2, 6, 2 vector1_end:.word 0 # Marker for end of vector1 vector2: .word 4, -3, 5 vector2_end:.word 0 # Marker for end of vector2 vector3: .word 5, 15, 5 vector3_end:.word 0 # Marker for end of vector3 per: .asciiz "Two vectors are perpendicular" n_per: .asciiz "Two vectors are not perpendicular" newLine: .asciiz "\n" .align 3 .text main: # Calculate dot product of vector1 and vector2 la $a0, vector1 la $a1, vector2 # Get vector length: (end address - start address) / 4 bytes per element la $t0, vector1_end sub $a2, $t0, $a0 srl $a2, $a2, 2 # Divide by 4 (shift right by 2) jal DotProduct move $s0, $v0 jal display # Print result # Reset registers for next calculation li $a0, 0 li $a1, 0 li $a2, 0 li $s0, 0 li $v0, 0 # Calculate dot product of vector1 and vector3 la $a0, vector1 la $a1, vector3 la $t0, vector1_end sub $a2, $t0, $a0 srl $a2, $a2, 2 jal DotProduct move $s0, $v0 jal display # Print result # Exit program li $v0, 10 syscall # Branchless DotProduct function (as written above, with all 32 iterations) DotProduct: move $v0, $zero move $t0, $a2 move $t1, $a0 move $t2, $a1 # Iteration 1 slti $t3, $t0, 1 xori $t3, $t3, 1 addi $t3, $t3, -1 lw $t4, ($t1) lw $t5, ($t2) mult $t4, $t5 mflo $t4 and $t4, $t4, $t3 add $v0, $v0, $t4 andi $t6, $t3, 4 add $t1, $t1, $t6 add $t2, $t2, $t6 add $t0, $t0, $t3 # Iteration 2 slti $t3, $t0, 1 xori $t3, $t3, 1 addi $t3, $t3, -1 lw $t4, ($t1) lw $t5, ($t2) mult $t4, $t5 mflo $t4 and $t4, $t4, $t3 add $v0, $v0, $t4 andi $t6, $t3, 4 add $t1, $t1, $t6 add $t2, $t2, $t6 add $t0, $t0, $t3 # Iteration 3 to 31 (copy-paste the iteration block here) # ... # Iteration 32 slti $t3, $t0, 1 xori $t3, $t3, 1 addi $t3, $t3, -1 lw $t4, ($t1) lw $t5, ($t2) mult $t4, $t5 mflo $t4 and $t4, $t4, $t3 add $v0, $v0, $t4 andi $t6, $t3, 4 add $t1, $t1, $t6 add $t2, $t2, $t6 add $t0, $t0, $t3 jr $ra # Display function to print perpendicularity status display: beq $s0, $zero, not_perpendicular # Print perpendicular message li $v0, 4 la $a0, per syscall j print_newline not_perpendicular: # Print non-perpendicular message li $v0, 4 la $a0, n_per syscall print_newline: # Print newline li $v0, 4 la $a0, newLine syscall # Return to main jr $ra
How It Works
- DotProduct: Each iteration block uses masking to only process elements while the count is positive. Once the count reaches 0, all subsequent blocks do nothing to the result.
- Main Program: Correctly calculates vector lengths using end markers, calls
DotProductfor both vector pairs, and uses a fixed display function to print results without infinite loops.
内容的提问来源于stack exchange,提问作者Aiden




