2
votes

I am quite new to C and assembler (all programming, really), and this has really been bothering me for a day or two.

This is my assignment (I've already done 4. It's the extra credit I'm having a problem with.):

For each of these questions, you must write a C (not C++) program that contains three parts:

A. Some C code to read inputs (using scanf).

B. A segment of inline-assembler to do the computation.

C. Some C code to write the output (using printf).

  1. maxi.c : reads a count n, then reads a list of of n integers into an array (which you allocate with malloc), then uses assembler to nd the biggest of all the integers, then outputs it. You will need conditional-jump opcodes like jg. To use such an opcode, you usually first use an opcode like dec, or sub, which sets bits in the flags register. Then you will, for example use jg to jump somewhere (eg the top of the loop) if the result of the previous operation was greater than zero. You can also set the flags register using a cmp opcode. You will also need to access within an array, using base-displacement mode: mov eax,[ebx+ecx*4].

Extra credit: 5. maxf.c : same as above, but uses floats instead of integers.

This is what I have right now. When I input the list, it outputs the first number in the list, regardless of whether or not it is the biggest.

// maxi.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "malloc.h"

int n;     // length of list
float *nums; // the list
int max;   // the result

int main()
{
    int i;
    float arr[10];
    printf("How many integers?  ");
    scanf_s("%d", &n);
    nums = (float*)malloc(n * sizeof(float));
    for (i = 0; i<n; i++)
    {
        printf("Enter next number:  ");
        scanf_s("%d", &arr[i]);
    }

    __asm
    {
        mov eax, n;    // A is the count
        dec eax;       // A becomes the end of the list
        mov ebx, nums; // B is the beginning of the list
        mov ecx, arr;    // C is the current largest integer

    top:
        cmp eax, 0;
        jl done;

        mov edx, [ebx + eax * 4];
        cmp edx, ecx;
        jg Change;
        jmp Increment;

    Increment:
        //mov eax,[ebx+ecx*4];
        dec eax;
        jmp top;

    Change:
        mov ecx, edx;
        jmp Increment;
    done:
        mov max, ecx;
    }
    printf("The largest integer in your list is: %d\n", max);
    return 0;
}
3
Use a debugger. They also can step assembler.too honest for this site
@Olaf I am really new to this, so I don't really know what you mean by that. Do you mean a different debugger other than the one I use by pressing F5?user3313728
you are a beginner in programming and you start with assembly... hmm... I don't want to be in your shoes. I did 4 years of programming in high school and the first assembler Course I had was in the 2nd year of college. Who is torturing you? :))bolov
@user3313728 the same debugger. Depends on what IDE you have, but most will support stepping through assembly, though might have different shortcut to advance to next assembler instruction.bolov
cmp edx, ecx compares integers, not floats. As bolov pointed out, there are different registers and instructions for floating point.pcarter

3 Answers

3
votes

I think you're confusing the two variables arr and nums. You allocate nums to the correct size, but you read in the numbers into arr which is pre-allocated to hold 10 numbers. Fix this first, then see about the assembly stuff.

3
votes

Are you typing actual floating point numbers?

If you do, the first scanf_s("%d", &arr[i]); will parse the integral part of the first number and each subsequent call will fail because the '.' is not part of an integer. Since you do not test the return value of scanf_s, you would just ignore this failure and the array will contain undetermined values beyond the first entry.

The assembly initializes ecx from arr, the first value, and the loop uses nums instead of arr to compare to the other values, again indeterminate values because malloc does not initialize the memory it returns... there is a good chance it is all zeroes, but no certainty.

The assembly loop looks inefficient, but not incorrect to me, but since the array contains indetermined values, the value for max can be the first if by chance the rest of the array contains zeroes.

Note that you treat the contents of arr as integers in both scanf_s and the assembly loop, so you get the same result as you would if you declared it as int.

2
votes

The structure of your jumps in asm is pretty ugly and could be a lot more efficient. jmp Increment is a complete waste, because there are no instructions between the jump and the label. Remember that execution proceeds to the next instruction on its own, even if you leave a line of white-space in the source!

int n;
// scanf..
int *nums = malloc(n * sizeof(int));
// loop scanf to set nums[0..n-1]
__asm
{
    // MSVC inline asm sucks, and forces you to write load instructions instead of letting you ask for things to already be in registers.
    //  Nothing you can do about that, other than write the whole function in asm yourself.
    mov   ecx, [n];    // ECX is the count (and array index)
    mov   edx, [nums]; // load the pointer.  ESI is the conventional choice for source pointers, but we haven't yet run out of registers that can be used without saving
                // If you wanted to get the address of a local array, IDK if you'd use OFFSET in MSVC, or what.
    mov   eax, [edx + ecx*4 - 4];  // Dereference the pointer to get the last list element (which we used to check first).  The other option is to start with the most-negative possible integer, like you'd initialize a sum=0 instead of loading the first element.

    sub   ecx, 2;  // start from the 2nd-last list element.
    jle  .done;    // or leave this out if you can assume n >= 2

.top:    // use local labels so they don't show up in the symbol table.
    cmp   eax, [edx + ecx*4];  // it is actually more efficient to compare with memory directly here.  It saves an instruction, and unless the list is mostly increasing, the nochange branch will usually be taken.
    jng .nochange;
    mov   eax, [edx + ecx*4];  // skipped if the current isn't > max
.nochange:
    dec   ecx; // put the loop condition at the bottom to save branch insns
    jge .top;
    // jnz is a common choice, but this avoids a long loop if you accidentally call with n = 0, even if we didn't check for this at the top.

.done:
    mov   [max], eax;
}

Using cmp with a memory operand might not actually help on Intel SnB-family CPUs, though. In that case, loading to a scratch register is actually better.

To speed up the common case, where nums[ecx] <= max, you could structure things so that's the not-taken case. In the taken case, you'd jump to

.change:
    mov   eax, [edx + ecx*4];
    jmp .back_into_loop;    // to the same place as .nochange is in the above version

Normally you'd place this block after the ret instruction in a function, so you didn't have to jump over it later. MSVC inline asm prevents that, but you could put it outside the loop.


For float, it's really easy and more efficient to do branchless: Use the MAXSS instruction. Or, vectorize and use MAXPS. Even better, use 3 or 4 accumulator registers to overlap the latencies of MAXPS, since with 3c latency but one per 1c throughput, three can be in flight at once on Intel Haswell.

I didn't use cmov to do the equivalent in your integer loop, because on pre-Broadwell, it's a 2uop instruction with 2 cycle latency, and the loop-carried data dependency would limit the loop to running at one iteration per 2 clocks, instead of one per clock.

SSE4.1 PMAXSD does a max of signed dword integers, with 1c latency (and one per 0.5c throughput, so two accumulators could saturate p1 and p5, and the two load ports.) Again, numbers for Haswell from http://agner.org/optimize/.

Obviously for tiny arrays like your test-case of 10, there's barely any scope for vectorizing, and most of the work would be doing the horizontal max at the end.