Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion sleazy intel compiler trick (SOURCE ATTACHED)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
iccOut  
View profile  
 More options Feb 9 2004, 5:38 pm
Newsgroups: comp.arch
From: iccout2...@yahoo.com (iccOut)
Date: 9 Feb 2004 14:38:39 -0800
Local: Mon, Feb 9 2004 5:38 pm
Subject: sleazy intel compiler trick (SOURCE ATTACHED)
As part of my study of Operating Systems and embedded systems, one of
the things I've been looking at is compilers. I'm interested in
analyzing how different compilers optimize code for different
platforms. As part of this comparison, I was looking at the Intel
Compiler and how it optimizes code. The Intel Compilers have a free
evaluation download from here:
http://www.intel.com/products/software/index.htm?iid=Corporate+Header...
 

One of the things that the version 8.0 of the Intel compiler included
was an "Intel-specific" flag. According to the documentation, binaries
compiled with this flag would only run on Intel processors and would
include Intel-specific optimizations to make them run faster. The
documentation was unfortunately lacking in explaining what these
optimizations were, so I decided to do some investigating. 

First I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited.  This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations. If I'm missing something, I'd love for someone to point
it out for me. From the way it looks right now, it appears that Intel
is simply "cheating" to make their processors look better against
competitor's processors.

Links:
Intel Compiler:http://www.intel.com/products/software/index.htm?iid=Corporate+Header...
 

Here is the text:

/*
 * iccOut 1.0
 *
 * This program enables programs compiled with the intel compiler
using the
 * -xN flag to run on non-intel processors. This can sometimes result
in
 * large performance increases, depending on the application. Note
that even
 * though the check will be removed, the CPU running the application
*MUST*
 * support both SSE and SSE2 or the program will crash.
 *
 */

#include <stdio.h>
#include <string.h>

// x86 codes

#define X86_CALL 232  // E8 in hex
#define PUSH_EAX 80       // 50 in hex
#define X86_NOP 144       // 90 in hex

bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary );

//convienently, the check always seems to be one of the first calls in
//the file. this makes it easier to find.
void printUsage() {
        printf("Usage:\n");
        printf("iccOut filename\n\n");
        printf("Filename is the name of the file to fix.\n\n");

}

//returns whether code was replaced
bool processNextCall( FILE* inputBinary, FILE* fixedBinary ) {

        int lenRead;
        int startIndex, bytesNeeded;
        unsigned char addressBuffer[4];
        unsigned char checkBuffer[2];
        unsigned char fullBuffer[7];
        unsigned char tempChar;
        bool codeReplaced;
        bool otherReplaced;

        otherReplaced = false;

        //fixme: error checking for reads
        lenRead = fread( &addressBuffer, 1, 4, inputBinary );
        lenRead = fread( &checkBuffer, 1, 2, inputBinary );

        fullBuffer[0] = X86_CALL;
        for( int i=1; i<5;i++ ) {
                fullBuffer[i] = addressBuffer[i-1];
        }
        fullBuffer[5] = checkBuffer[0];
        fullBuffer[6] = checkBuffer[1];

        codeReplaced = handleCall( fullBuffer, inputBinary, fixedBinary );

        if ( ! codeReplaced ) {

                //if either of the last 2 bytes were a call, we need to keep doing
this
                //until we run out of calls
                while ( ( fullBuffer[5] == X86_CALL ) || ( fullBuffer[6] == X86_CALL
) ) {

                        if ( fullBuffer[5] != X86_CALL ) {      //write it and ignore it
                                tempChar = fullBuffer[5];
                                fwrite( &tempChar, 1, 1, fixedBinary );
                                fullBuffer[0] = fullBuffer[6];
                                bytesNeeded = 6;
                                startIndex = 1;
                        } else {
                                fullBuffer[0] = fullBuffer[5];
                                fullBuffer[1] = fullBuffer[6];
                                bytesNeeded = 5;
                                startIndex = 2;
                        }

                        for( int i=0; i < bytesNeeded; i++ ) {
                                fread( &tempChar, 1, 1, inputBinary );
                                fullBuffer[startIndex+i] = tempChar;
                        }

                        otherReplaced = otherReplaced || handleCall( fullBuffer,
inputBinary, fixedBinary );
                }
        }
        return ( codeReplaced || otherReplaced );

}

//returns whether code was replaced
bool handleCall( unsigned char theBuffer[7], FILE* inputBinary, FILE*
fixedBinary ) {

        bool replacedCode;
        unsigned char tempChar;

        replacedCode = false;

        //check if its what we're looking for (one of the first calls
followed by 2 push eax's)
        if ( ( theBuffer[5] == PUSH_EAX ) && ( theBuffer[6] == PUSH_EAX ) ){
                printf("Located call to subroutine to check intel support!\n");
                printf("Substituting code ...\n");

                //replace the call with nops
                replacedCode = true;
                for ( int i=0; i<5;i++ ) {
                        theBuffer[i] = X86_NOP;
                }
        }

        if ( replacedCode || ( ( theBuffer[5] != X86_CALL ) && ( theBuffer[6]
!= X86_CALL ) )) {
                //write out the two as they were
                for ( int j=0; j<7;j++ ) {
                        tempChar = theBuffer[j];
                        fwrite( &tempChar, 1, 1, fixedBinary );
                }      
        } else {
                        //don't write last 2 bytes
                        for( int i=0; i < 5; i++ ) {
                                tempChar = theBuffer[i];
                                fwrite( &tempChar, 1, 1, fixedBinary );
                        }
        }
        return replacedCode;

}

void fixIntelBinary( char *filename ) {

        FILE *inputBinary;
        FILE *fixedBinary;
        unsigned char theChar;
        bool editedCall;
        bool skipWrite;
        int lenRead;

        printf("iccOut is currently fixing binary: %s\n\n", filename );

        editedCall = false;
        skipWrite = false;

        //open files for reading and writing
        inputBinary = fopen( filename, "rb" );
        fixedBinary = fopen( strcat( filename, ".fixed" ), "wb" );

        if ( ! inputBinary ) {
                printf("Error opening input binary.\n");
                return;
        }

        if ( ! fixedBinary ) {
                printf("Error opening output file.\n");
                return;
        }

        //start reading until we find what we want
        fread( &theChar, 1, 1, inputBinary );
        while (1) {
                if ( !skipWrite ) {
                        //write last values
                        fwrite( &theChar, 1, 1, fixedBinary );
                }
                skipWrite = false;

                //read next
                lenRead = fread( &theChar, 1, 1, inputBinary );
                if ( lenRead == 0) {  //at end of file
                        break;
                }

                if ( ! editedCall ) {
                        //check if its the call XXX
                        if ( theChar == X86_CALL ) {
                                editedCall = processNextCall( inputBinary, fixedBinary );
                                skipWrite = true;

                        }
                }
        }

        printf("iccOut has saved the day!\n");

        //close files when finished
        fclose( inputBinary );
        fclose( fixedBinary );

}

bool fileExists( char *filename ) {

        FILE *temp;
        bool ret = false;

        temp = fopen( filename, "r" );

        if ( temp != 0 ) {
                ret = true;
                fclose( temp );
        }      
        return ret;

}

int main( int argc, char **argv ) {

        printf("\nWelcome to iccOut!\n\n");
        printf("This will enable binaries compiled with -xN to run on
non-intel machines\n\n");

        //verify parameters
        if ( argc < 2 ) {
                printUsage();
                return 0;
        }

        //make sure file exists
        if ( ! fileExists( argv[1] ) ) {
                printf("File does not exist or is not accessible: %s\n", argv[1] );
                return 0;
        }

        fixIntelBinary( argv[1] );
        return 0;

}


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google