computer structure

tr4ce 2023. 9. 10. 15:35

2023. 9. 10. 15:35

CS에 대해 공부하던 중 주어진 code에 의구심을 품게된 부분이 있어 알아보았습니다.

자료형 크기 확인(sizeof)

code

#include <stdio.h>

int main() {
        printf("Size of char: %zu bytes\n", sizeof(char));
        printf("Size of short: %zu bytes\n", sizeof(short));
        printf("Size of int: %zu bytes\n", sizeof(int));
        printf("Size of long: %zu bytes\n", sizeof(long));
        printf("Size of long long: %zu bytes\n", sizeof(long long));
        printf("Size of float: %zu bytes\n", sizeof(float));
        printf("Size of double: %zu bytes\n", sizeof(double));
        printf("Size of long double: %zu bytes\n", sizeof(long double));
        printf("Size of pointer: %zu bytes\n", sizeof(void*));
        return 0;
}

이때, %u가 아닌 %zu를 사용하는 이유는?

A: 우선, sizeof()의 경우 크기를 나타내는 함수 이므로, 양수이다. 이 말인 즉슨 unsigned를 사용하는 것이다.

여기서 근데 unsigned에 대한 type specifier는 %u이고, size_t의 경우는 %zu이다.

그렇다면 왜 %zu일까... 구글링 결과 %d를 사용하는 경우도 있고, %u를 사용하는 경우도 있긴했다.

아마 코드의 이식성으로 인하여 사용한 것이 아닐까 싶다...! 더하여, size_t는 unsigned int와 같은 의미이기에 혼용가능한 것 같다.

%u는 32bit냐 64bit냐에 따라 변화가 있을수도 있다고 한다.

[+] 공식 문서들을 참고해보았다. 내 예상이 얼추 들어맞은 것 같다만..

우선, sizeof()로 인해 %zu를 쓴다기보단, printf()에서 %zu로 쓰도록 C99에 나와있는 것 같다.(이부분은 찾지 못함 ㅠ)

그리고 이전에는 %zu와 %iu 로 나뉘어 msvc에서는 %iu, gcc는 %zu로 사용하였지만 이는 오래전 이야기라고 한다.

https://stackoverflow.com/questions/2524611/how-can-one-print-a-size-t-variable-portably-using-the-printf-family

https://stackoverflow.com/questions/15610053/correct-printf-format-specifier-for-size-t-zu-or-iu

[+] %zu대신 %u를 써야하는 이유는 이식성 때문이 맞다. 이는 size_t에 대한 공식 문서와 아래의 블로그를 참고하면 이해하는데에 훨씬 도움이 될것이라 여겨진다.

https://en.cppreference.com/w/c/types/size_t

https://en.cppreference.com/w/cpp/language/sizeof

https://blog.naver.com/oxcow119/220550770300

말이 두서없이 쓰였지만.. 아래의 링크와 위의 말들을 종합해서 이해하는 데에 도움이 되었으면 한다.

그리고 덕분에 size_t가 loop에 의해 메모리나 문자열의 길이를 구한다는 사실을 알게되었다.

또한 unsigned long은 사실 %lu라는 형식 지정자가 있으나, 본 실습의 목적은 sizeof()에 의해 size_t로 출력하는 것이므로 %zu가 권장되는 형식지정자이다. 이 부분 역시 호환성으로 인해 %zu가 권장되는거다!

결국 컴퓨터를 배울때마다 항상 중요시되는 호환성! 편의성! 요놈들이 진짜 중요한거다

요놈들에 대해서도 왜? 호환성이 야무져? 이게 왜 편해? 라는데 진짜 그리고 파도파도 이해가 안되면... 앨런 튜링센세를 뵙고오던..C contributer를 뵙고오던...그냥 받아들이던지.. 하는게 답인거 같다ㅠ

====================

이하의 내용은 간단한 실습들입니다.

overflow 재현

code

#include <stdio.h>
#include <limits.h>

int main(){
        char value = CHAR_MAX;
        printf("Original value: %d\n", value);

        value = value+1;
        printf("Value after adding 1: %d\n", value);

        return 0;
}

result

그렇다면 underflow로 해보자!

code

#include <stdio.h>
#include <limits.h>

int main(){
        char value = CHAR_MIN;
        printf("Original value: %d\n", value);

        value = value-1;
        printf("Value after subtracting 1: %d\n", value);

        return 0;
}

result

====================

cf.

gcc -o main main.c -g //디버깅 심볼 포함

일반적으로는 분석의 방해를 위하여 디버깅 심볼을 포함시키지 않는다.

디버깅 심볼을 날릴 시, IDA등의 decompiler로 보기에 난해하다..!!!

(gdb) b main //중단점 설정
(gdb) r //함수를 만날 경우 내불로 들어가지 않고 실행됨
(gdb) n //다음 줄 실행
(gdb) p value // p[변수명] 변수값 출력
(gdb) p/t value // p/[출력형식][변수명]: 출력형식에 맞추어 변수값 출력

==================

특정 위치의 비트를 끄는 c언어 프로그램 작성후 확인 실습

1. 특정 비트만큼 1을 시프트한다 (1 << position)

2. 모든 비를 NOT을 활용하여 반전 시킨다.

3. AND연산을 수행함으로써 특정 비트를 끌 수 있다.

Answer

#include <stdio.h>

int is_bit_set(unsigned char value, int position) {
        return (value&(1 << position)) != 0;
}

unsigned char set_bit(unsigned char value, int position) {
        return value | ( 1 << position);
}

unsigned char clear_bit(unsigned char value, int position) {
        return  value & ~( 1 << position);
}

int main() {
        unsigned char value = 0b00001000;

        if(is_bit_set(value, 3)) {
                printf("3rd bit is set!\n");
        }
        else {
                printf("3rd bit is not set!\n");
        }

        value = set_bit(value, 2);
        printf("Value after setting 2nd bit: %d\n", value);

        value = clear_bit(value, 2);
        printf("Value after setting 2nd bit: %d\n", value);


        return 0;
}

사용자가 position을 입력받도록 하는 code

#include <stdio.h>

int is_bit_set(unsigned char value, int position) {
        return (value&(1 << position)) != 0;
}

unsigned char set_bit(unsigned char value, int position) {
        return value | ( 1 << position);
}

unsigned char clear_bit(unsigned char value, int position) {
        return  value & ~( 1 << position);
}

int main() {
        unsigned char value = 0b00001000;
        int position = 0;

        if(is_bit_set(value, 3)) {
                printf("3rd bit is set!\n");
        }
        else {
                printf("3rd bit is not set!\n");
        }

        printf("input the position: ");
        scanf("%d", &position);

        value = set_bit(value, position);
        printf("Value after setting 2nd bit: %d\n", value);

        value = clear_bit(value, position);
        printf("Value after setting 2nd bit: %d\n", value);

		return 0;
}

=======================

c언어가 기계어가 되는 과정

헤더파일(*.h) + 소스코드(*.c) --(pre processing)--> 전처리된 소스코드 파일(*.i) ----(compile)----> 어셈블리어 파일(*.s)

----(assembly)---> 오프젝트파일 (*.o)들 + 라이브러리(*.a, *so) -----(linking)---> 실행파일

기본 코드로 실습

#include <stdio.h>

int main() {
        printf("Hello world!");
        return 0;
}

gcc -E hello.c -o hello.i

gcc -S hello.c -o hello.s (어셈블리어 확인가능)

gcc -c hello.c -o hello.o (readelf로 확인)

object file: 소스코드가 컴파일된 후의 중간 결과물, linker에 의해 실행 가능한 바이너리나 라이브러리로 만들어지기 전의 형태

특징

1. 바이너리 형태, but 이 파일 자체로는 실행 불가능하며, 다른 오브젝트 파일이나 라이브러리와 링크되어야 실행 가능.

2. 재배치 가능, 다른 object file이나 라이브러리와 링크되어 완전한 프로그램 형성.

3. 심볼 테이블 포함

4. 특정 architecture, OS, compiler에 따라 다르게 생성될 수 있음.

readelf, objdump, 010에디터 등으로 확인가능

아래의 화면은 HxD로 확인한 결과

gcc hello.o -o main

cf. 각단계의 파일을 모두 비교했을 시 과정에 의해 사이즈가 늘어났다가 줄어든다.

'System Hacking' 카테고리의 다른 글

basic_rop_x86 (0)	2023.10.03
basic_rop_x64 (0)	2023.09.29
pwnable.kr - flag (0)	2023.09.10
basic_heap_overflow (0)	2023.08.22
oneshot (0)	2023.08.05

낙서장

computer structure

'System Hacking' 카테고리의 다른 글

+ Recent posts

티스토리툴바