`

关于size_t 和 ptrdiff_t 【转】

 
阅读更多

Abstract

The article will help the readers understand what size_t and ptrdiff_t types are, what they are used for and when they must be used. The article will be interesting for those developers who begin creation of 64-bit applications where use of size_t and ptrdiff_t types provides high performance, possibility to operate large data sizes and portability between different platforms.

Introduction

Before we begin I would like to notice that the definitions and recommendations given in the article refer to the most popular architectures for the moment (IA-32, Intel 64, IA-64) and may not fully apply to some exotic architectures.

The types size_t and ptrdiff_t were created to perform correct address arithmetic. It had been assumed for a long time that the size of int coincides with the size of a computer word (microprocessor's capacity) and it can be used as indexes to store sizes of objects or pointers. Correspondingly, address arithmetic was built with the use of int and unsigned types as well. int type is used in most training materials on programming in C and C++ in the loops' bodies and as indexes. The following example is nearly a canon:

for (int i = 0; i < n; i++)
  a[i] = 0;

As microprocessors developed over time and their capacity increased, it became irrational to further increase int type's sizes. There are a lot of reasons for that: economy of memory used, maximum portability etc. As a result, several data model appeared declaring the relations of C/C++ base types. Table N1 shows the main data models and lists the most popular systems using them.

Table N1. Data models

Table N1. Data models

As you can see from the table, it is not so easy to choose a variable's type to store a pointer or an object's size. To find the smartest solution of this problem size _t and ptrdiff_t types were created. They are guaranteed to be used for address arithmetic. And now the following code must become a canon:

for(ptrdiff_t i =0; i < n; i++)
  a[i]=0;

 It is this code that can provide safety, portability and good performance. The rest of the article explains why.

size_t type

size_t type is a base unsigned integer type of C/C++ language. It is the type of the result returned by sizeof operator. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system size_t will take 32 bits, on a 64-bit one 64 bits. In other words, a variable of size_t type can safely store a pointer. The exception is pointers to class functions but this is a special case. Although size_t can store a pointer, it is better to use another unsinged integer type uintptr_t for that purpose (its name reflects its capability). The types size_t and uintptr_t are synonyms. size_t type is usually used for loop counters, array indexing and address arithmetic.

The maximum possible value of size_t type is constant SIZE_MAX.

ptrdiff_t type

ptrdiff_t type is a base signed integer type of C/C++ language. The type's size is chosen so that it could store the maximum size of a theoretically possible array of any type. On a 32-bit system ptrdiff_t will take 32 bits, on a 64-bit one 64 bits. Like in size_t, ptrdiff_t can safely store a pointer except for a pointer to a class function. Also, ptrdiff_t is the type of the result of an expression where one pointer is subtracted from the other (ptr1-ptr2). ptrdiff_t type is usually used for loop counters, array indexing, size storage and address arithmetic. ptrdiff_t type has its synonym intptr_t whose name indicates more clearly that it can store a pointer.

Portability of size_t and ptrdiff_t

The types size_t and ptrdiff_t enable you to write well-portable code. The code created with the use of size_t and ptrdiff_t types is easy-portable. The size of size_t and ptrdiff_t always coincide with the pointer's size. Because of this, it is these types that should be used as indexes for large arrays, for storage of pointers and pointer arithmetic.

Linux-application developers often use long type for these purposes. Within the framework of 32-bit and 64-bit data models accepted in Linux, this really works. long type's size coincides with the pointer's size. But this code is incompatible with Windows data model and, consequently, you cannot consider it easy-portable. A more correct solution is to use types size_t and ptrdiff_t.

As an alternative to size_t and ptrdiff_t, Windows-developers can use types DWORD_PTR, SIZE_T, SSIZE_T etc. But still it is desirable to confine to size_t and ptrdiff_t types.

Safety of ptrdiff_t and size_t types in address arithmetic

Address arithmetic issues have been occurring very frequently since the beginning of adaptation of 64-bit systems. Most problems of porting 32-bit applications to 64-bit systems relate to the use of such types as int and long which are unsuitable for working with pointers and type arrays. The problems of porting applications to 64-bit systems are not limited by this, but most errors relate to address arithmetic and operation with indexes.

Here is a simplest example:

size_t n =...;
for(unsigned i =0; i < n; i++)
  a[i]=0;

If we deal with the array consisting of more than UINT_MAX items, this code is incorrect. It is not easy to detect an error and predict the behavior of this code. The debug-version will hung but hardly will anyone process gigabytes of data in the debug-version. And the release-version, depending on the optimization settings and code's peculiarities, can either hung or suddenly fill all the array cells correctly producing thus an illusion of correct operation. As a result, there appear floating errors in the program occurring and vanishing with a subtlest change of the code. To learn more about such phantom errors and their dangerous consequences see the article "A 64-bit horse that can count" [1].

Another example of one more "sleeping" error which occurs at a particular combination of the input data (values of A and B variable):

int A =-2;
unsigned B =1;
int array[5]={1,2,3,4,5};
int*ptr = array +3;
ptr = ptr +(A + B);//Error
printf("%i\n",*ptr);

 This code will be correctly performed in the 32-bit version and print number "3". After compilation in 64-bit mode there will be a fail when executing the code. Let's examine the sequence of code execution and the cause of the error:

  • A variable of int type is cast into unsigned type;
  • A and B are summed. As a result, we get 0xFFFFFFFF value of unsigned type;
  • "ptr + 0xFFFFFFFFu" expression is calculated. The result depends on the pointer's size on the current platform. In the 32-bit program, the expression will be equal to "ptr - 1" and we will successfully print number 3. In the 64-bit program, 0xFFFFFFFFu value will be added to the pointer and as a result, the pointer will be far beyond the array's limits.

注: 哪个能表示更大的数就转为哪个类型。例如short+long,就要转为long.
在32位机上, unsigned int 最大可表示2^32 - 1; int最大可表示2^31-1, 这样int就转为了unsigned int

 

Such errors can be easily avoided by using size_t or ptrdiff_t types. In the first case, if the type of "i" variable is size_t, there will be no infinite loop. In the second case, if we use size_t or ptrdiff_t types for "A" and "B" variable, we will correctly print number "3".

Let's formulate a guideline: wherever you deal with pointers or arrays you should use size_t and ptrdiff_t types.

To learn more about the errors you can avoid by using size_t and ptrdiff_t types, see the following articles:

Performance of code using ptrdiff_t and size_t

Besides code safety, the use of ptrdiff_t and size_t types in address arithmetic can give you an additional gain of performance. For example, using int type as an index, the former's capacity being different from that of the pointer, will lead to that the binary code will contain additional data conversion commands. We speak about 64-bit code where pointers' size is 64 bits and int type's size remains 32 bits.

It is a difficult task to give a brief example of size_t type's advantage over unsigned type. To be objective we should use the compiler's optimizing abilities. And the two variants of the optimized code frequently become too different to show this very difference. We managed to create something like a simple example only with a sixth try. And still the example is not ideal because it demonstrates not those unnecessary data type conversions we spoke above, but that the compiler can build a more efficient code when using size_t type. Let's consider a program code arranging an array's items in the inverse order:

unsigned arraySize;
...
for(unsigned i =0; i < arraySize /2; i++){
  float value = array[i];
  array[i]= array[arraySize - i -1];
  array[arraySize - i -1]= value;
}

In the example, "arraySize" and "i" variables have unsigned type. This type can be easily replaced with size_t type, and now compare a small fragment of assembler code shown on Figure 1.

Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types

Figure N1.Comparison of 64-bit assembler code when using unsigned and size_t types

The compiler managed to build a more laconic code when using 64-bit registers. I am not affirming that the code created with the use of unsigned type will operate slower than the code using size_t. It is a very difficult task to compare speeds of code execution on modern processors. But from the example you can see that when the compiler operates arrays using 64-bit types it can build a shorter and faster code.

Proceeding from my own experience I can say that reasonable replacement of int and unsigned types with ptrdiff_t and size_t can give you an additional performance gain up to 10% on a 64-bit system. You can see an example of speed increase when using ptrdiff_t and size_t types in the fourth section of the article "Development of Resource-intensive Applications in Visual C++" [5].

Code refactoring with the purpose of moving to ptrdiff_t and size_t

As the reader can see, using ptrdiff_t and size_t types gives some advantages for 64-bit programs. However, it is not a good way out to replace all unsigned types with size_t ones. Firstly, it does not guarantee correct operation of a program on a 64-bit system. Secondly, it is most likely that due to this replacement, new errors will appear data format compatibility will be violated and so on. You should not forget that after this replacement the memory size needed for the program will greatly increase as well. And increase of the necessary memory size will slow down the application's work for cache will store fewer objects being dealt with.

Consequently, introduction of ptrdiff_t and size_t types into old code is a task of gradual refactoring demanding a great amount of time. In fact, you should look through the whole code and make the necessary alterations. Actually, this approach is too expensive and inefficient. There are two possible variants:

  • To use specialized tools like Viva64 included into PVS-Studio. Viva64 is a static code analyzer detecting sections where it is reasonable to replace data types for the program to become correct and work efficiently on 64-bit systems. To learn more, see "PVS-Studio Tutorial" [6].
  • If you do not plan to adapt a 32-bit program for 64-bit systems, there is no sense in data types' refactoring. A 32-bit program will not benefit in any way from using ptrdiff_t and size_t types.

References

本文转自:

http://www.viva64.com/en/a/0050/ 

分享到:
评论

相关推荐

    QAC工具介绍和使用说明(供一种可量化措施的代码度量值属性:33基于功能 32基于文件和4个项目级别)

    如上所述:QAC随提供一套标准库的头文件,如果想改变这些类型定义,必须先明白QAC内部的定义类型,因为那些头文件包含一些声明ptrdiff_t, size_t 和wchar_t,还有3种宏指令定义PRQA_PTRDIFF_T, PRQA_SIZE_T,和PRQA_...

    stl_interfaces:用于定义迭代器的C ++ 14和更高版本的CRTP模板

    stl_interfaces Boost.Iterator的iterator_facade和iterator_adaptor部分(现在称为iterator_interface )的更新的C ++ 20友好版本;... using difference_type = std:: ptrdiff_t ; using pointe

    Google C++ Style Guide(Google C++编程规范)高清PDF

    link ▶Don't use an #include when a forward declaration would suffice. When you include a header file you introduce a dependency that will cause your code to be recompiled whenever the header file ...

    span-lite:span lite-单文件标头库中的C ++ 98,C ++ 11和更高版本的类似于C ++ 20的跨度

    include &lt; iostream&gt;std:: ptrdiff_t size ( nonstd::span&lt; const&gt; spn ){ return spn. size ();}int main (){ int arr[] = { 1 , }; std::cout &lt;&lt; " C-array: " &lt;&lt; size ( arr ) &lt;&lt; " array: " &...

    node-v10.9.0-x86.msi

    Node.js,简称Node,是一个开源且跨平台的JavaScript运行时环境,它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立,旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎,可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型,这使得它非常适合处理大量并发连接,从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外,Node.js使用了模块化的架构,通过npm(Node package manager,Node包管理器),社区成员可以共享和复用代码,极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展,它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等,因此,开发者可以用JavaScript编写全栈应用程序,这一点大大提高了开发效率和便捷性。 在实践中,许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台,如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能,简化了开发流程,并且能更快地响应市场需求。

    塞北村镇旅游网站设计与实现

    城市旅游产业的日新月异影响着村镇旅游产业的发展变化。网络、电子科技的迅猛前进同样牵动着旅游产业的快速成长。随着人们消费理念的不断发展变化,越来越多的人开始注意精神文明的追求,而不仅仅只是在意物质消费的提高。塞北村镇旅游网站的设计就是帮助村镇发展旅游产业,达到宣传效果,带动一方经济发展。而在线消费与查询正在以高效,方便,时尚等的特点成为广大互联网用户的首选。塞北村镇旅游网站设计与开发以方便、快捷、费用低的优点正慢慢地进入人们的生活。人们从传统的旅游方式转变为在线预览,减轻了劳动者的工作量。使得旅游从业人员有更多时间来获取、了解、掌握信息。 塞北村镇旅游网站根据当地旅游风景和特色的实际情况,设计出一套适合当地旅游信息网站,通过网络,实现该网站的推广从而达到宣传的效果。 本系统在设计方面采用JSP和Java语言以及html脚本语言,同时采用B/S模式,进行各个界面和每个功能的设计与实现,后台管理与设计选用了SQL Server 2005数据库,前台设计与后台管理相结合,共同完成各功能模块的功能。

    其他类别Jsp考试系统-jspks.rar

    JSP考试系统_jspks.rar是一个为计算机专业学生和教师设计的JSP源码资料包,它提供了一个全面的、易于使用的在线考试平台。这个系统是基于Java Server Pages (JSP)技术构建的,这是一种用于创建动态网页的服务器端技术。通过这个系统,用户可以创建、管理和参加在线考试。这个系统的主要功能包括:用户注册和登录,试题管理(包括添加、修改和删除试题),试卷管理(包括创建、编辑和删除试卷),考试管理(包括开始、暂停和结束考试),成绩管理(包括查看和统计成绩)等。此外,系统还提供了丰富的试题类型,如选择题、填空题、判断题和简答题等,以满足不同的考试需求。JSP考试系统的界面设计简洁明了,操作方便,无论是教师还是学生都可以轻松上手。对于教师来说,他们可以通过这个系统轻松地管理试题和试卷,节省了大量的时间和精力。对于学生来说,他们可以随时随地参加在线考试,方便快捷。总的来说,JSP考试系统_jspks.rar是一个非常实用的JSP源码资料包,它不仅可以帮助学生更好地学习和掌握JSP技术,也可以帮助教师更有效地管理在线考试。无论是对于学生还是教师,这个系统都是一个不可或缺的工具。重

    TypeScript-2.4.1.tar.gz

    Node.js,简称Node,是一个开源且跨平台的JavaScript运行时环境,它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立,旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎,可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型,这使得它非常适合处理大量并发连接,从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外,Node.js使用了模块化的架构,通过npm(Node package manager,Node包管理器),社区成员可以共享和复用代码,极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展,它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等,因此,开发者可以用JavaScript编写全栈应用程序,这一点大大提高了开发效率和便捷性。 在实践中,许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台,如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能,简化了开发流程,并且能更快地响应市场需求。

    Data-Structure-词向量

    词向量

    node-v10.2.0-x86.msi

    Node.js,简称Node,是一个开源且跨平台的JavaScript运行时环境,它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立,旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎,可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型,这使得它非常适合处理大量并发连接,从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外,Node.js使用了模块化的架构,通过npm(Node package manager,Node包管理器),社区成员可以共享和复用代码,极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展,它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等,因此,开发者可以用JavaScript编写全栈应用程序,这一点大大提高了开发效率和便捷性。 在实践中,许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台,如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能,简化了开发流程,并且能更快地响应市场需求。

    基于matlab开发的光谱数据预处理程序,包括MSC,SNV,归一化,中心化,导数等等.rar

    基于matlab开发的光谱数据预处理程序,包括MSC,SNV,归一化,中心化,导数等等.rar

    实训作业基于javaweb的订单管理系统源码+数据库+实训报告.zip

    优秀源码设计,详情请查看资源源码内容

    基于matlab开发的TD-LTE随机接入过程前导序列检测算法Matlab信道仿真.rar

    基于matlab开发的TD-LTE随机接入过程前导序列检测算法Matlab信道仿真.rar

    词语语义和语法信息数学模型词向量词语语义和语法信息数学模型词向量

    词向量 词向量(Word Vectors)是一种用来表示词语语义和语法信息的数学模型。它将词语转换为固定长度的实数向量,使得词向量之间的距离(通常使用余弦相似度)可以反映出词语之间的语义关系。词向量在自然语言处理和机器学习领域有广泛的应用,例如文本分类、情感分析、句子相似度计算等。 词向量的发展经历了从传统的one-hot表示到分布式表示的转变。传统的one-hot表示将每个词语表示为一个高维稀疏的向量,向量中只有一个元素为1,其余元素都为0,表示词语在词汇表中的位置。然而,这种表示方式无法准确捕捉词语之间的语义关系。 为了解决这个问题,分布式表示方法被提出。分布式表示将每个词语表示为一个低维稠密的实数向量,其中每个元素都包含了词语的语义和语法信息。这种表示方式的关键思想是,具有相似语义和上下文的词语在向量空间中更加接近。 现在广泛应用的词向量模型有许多种,其中最著名的是Word2Vec模型和GloVe模型。Word2Vec是一种基于神经网络的模型,它通过一种称为连续词袋(CBOW)和另一种称为跳字(Skip-gram)的训练方法来学习词向量。GloVe模型是一种基于全局词频的词

    node-v10.0.0-x86.msi

    Node.js,简称Node,是一个开源且跨平台的JavaScript运行时环境,它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立,旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎,可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型,这使得它非常适合处理大量并发连接,从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外,Node.js使用了模块化的架构,通过npm(Node package manager,Node包管理器),社区成员可以共享和复用代码,极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展,它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等,因此,开发者可以用JavaScript编写全栈应用程序,这一点大大提高了开发效率和便捷性。 在实践中,许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台,如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能,简化了开发流程,并且能更快地响应市场需求。

    超市会员积分管理系统主要用于实现了企业管理数据统计等

    超市会员积分管理系统主要用于实现了企业管理数据统计等。本系统结构如下: (1)网络会员管理中心界面: 会员修改密码信息模块:实现会员密码功能; 会员登陆模块:实现会员登陆功能; 会员注册模块:实现会员注册功能; 留言板模块:实现留言板留言功能 (2)后台管理界面: 系统用户管理模块:实现管理员的增加、查看功能; 会员信息管理模块:实现会员信息的增加、修改、查看功能; 注册用户管理模块:实现注册用户的增加、修改、查看功能; 会员卡管理模块:实现会员卡信息的增加、查看功能; 商品销售管理模块:实现商品信息的增加、查看功能; 会员积分管理模块:实现合作公司信息的增加、查看功能; 信息统计模块:实现数据统计报表功能; 留言板模块:实现留言板信息的增加、修改、查看功能;

    信息办公XML考试系统-xmlks.rar

    信息办公XML考试系统_xmlks.rar是一款专为计算机专业设计的JSP源码资料包。这个资料包包含了丰富的功能和特性,旨在为教育机构、培训中心和企业提供一个完整的在线考试解决方案。通过使用这个资料包,用户可以轻松地创建和管理自己的在线考试平台,实现对学员的考试、评分和成绩管理。首先,这个资料包采用了流行的JSP技术,结合了XML数据存储和处理,使得系统具有很高的可扩展性和灵活性。用户可以方便地根据自己的需求定制和修改系统的功能和界面。同时,系统还支持多种题型,如选择题、填空题、判断题等,满足不同类型考试的需求。其次,信息办公XML考试系统_xmlks.rar具有良好的用户体验和易用性。系统的界面设计简洁明了,操作流程清晰易懂,使得考生可以快速上手并完成考试。同时,系统还提供了丰富的帮助文档和教程,方便用户在使用过程中随时查阅和学习。此外,这个资料包还具有强大的后台管理功能。管理员可以轻松地添加、删除和修改试题,设置考试参数,查看考生的成绩和答题情况。同时,系统还支持多种权限管理,确保数据的安全性和保密性。总之,信息办公XML考试系统_xmlks.rar是一款功能强大、易用性高、可扩展

    博客系统网站(JSP+SERVLET+MYSQL)130222.rar

    该资源包“130222.rar”是一个针对计算机专业学生或开发者设计的基于Java服务器页面(JSP)、Servlet以及MySQL数据库的博客系统网站的源码资料。这个压缩文件包含了构建一个功能齐全的动态网站所需的全部源代码和相关文档,它允许用户通过互联网发布文章、分享观点,并与他人进行互动。在内容上,它可能包含了多个JSP页面文件,用于展示博客首页、文章列表、文章内容页、写文章的表单等界面;包含了Servlet类文件,用于处理用户的请求、与数据库交互以及业务逻辑的处理;还可能包含配置文件如web.xml,用于配置Servlet映射等。至于数据库部分,则包括了MySQL的数据库文件,其中存储了博客系统的数据结构、初始数据以及存储过程等。此资料包是一套学习和实践Web开发的好材料,尤其适合那些想要深入学习JSP、Servlet和数据库交互技术的学习者。通过分析和运行这些源码,学习者可以了解Web应用的开发流程,掌握如何在Java Web环境中使用MVC设计模式,以及如何实现用户身份验证、会话管理、数据持久化等关键功能。由于是基于JSP的传统Web开发技术,虽然现代Web开发领域已逐渐向全

    Python可视化图库绘制动态图表源码

    可视化图库Pandas_Alive实现动态图表绘制,使用时减少数据会使生成GIF的时间变短。通过对CSV文件分析,实现动态条形图、动态曲线图、气泡图、饼状图、地理空间图等多个动态图表的可视化分析。

    基于SSM的“停车场管理系统”的设计与实现.zip

    基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现基于SSM的“停车场管理系统”的设计与实现

Global site tag (gtag.js) - Google Analytics