Sunday, May 27, 2012

Assembly - strings

I want to find out a little about how strings work in assembly, and as a side effect call some functions that I haven't defined. So I will try to encode the following rather contrived example:

  char* hello = "hello";
  char* world = "world";
  char* helloWorld = (char*)malloc(64);
  strcpy(helloWorld, hello);
  helloWorld[5] = ' ';
  strcpy(helloWorld+6, world);
  printf("%s!\n", helloWorld);

The interesting bits are stack (I'm going to allocate the format string on the stack) and heap allocation (the latter done by malloc), copying strings and doing some character manipulation and calling a function which takes a variable number of arguments. I will make the hello and world strings global variables. Here is the assembly:

  // allocate on the heap for helloWorld
  push 64
  call malloc
  add esp, 4
  // save address of helloWorld on the stack
  push eax
  // copy hello to helloWorld
  mov ebx, 0
  mov ecx, hello
copyh:
  mov dl, BYTE PTR [ebx+ecx]
  mov BYTE PTR [eax], dl
  inc eax
  inc ebx
  cmp ebx, 5
  jl copyh
  // add a space
  mov [eax], ' '
  inc eax
  // copy world to helloWorld
  mov ebx, 0
  mov ecx, world
copyw:
  mov dl, BYTE PTR [ecx+ebx]
  mov BYTE PTR [eax], dl
  inc eax
  inc ebx
  cmp ebx, 5
  jl copyw
  // null terminate helloWorld
  mov [eax], 0
  // temporarily move address of helloWorld to eax
  pop eax
  // allocate format on the stack and save address in ebx
  sub esp, 8
  mov ebx, esp
  // create format
  mov byte ptr [ebx], '%'
  mov byte ptr [ebx+1], 's'
  mov byte ptr [ebx+2], '!'
  mov byte ptr [ebx+3], '\n'
  mov byte ptr [ebx+4], 0
  // call printf
  push eax
  push ebx
  call printf
  // tidy up printf call and format
  add esp, 16
  // should really call delete, but meh

One thing that surprised me is that you don't need to pass the number of vararg arguments - the callee just has to work it out! Getting the call to malloc working was a pain, I had to link the standard library statically rather than as a dll, and a whole bunch of other compilation options which I don't know whether they affected it or not (compile as C, not C++, turn off randomised addressing, etc.).

No comments: