通过multiprocessing模块及时释放tensorflow的资源

cherishLC

浏览: 680047 次
性别:
来自: 北京

最近访客更多访客>>

jaybril

duanyilinelf

q343724746

半夏浮生

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

tensorflow
Python

python tensorflow

在使用tf.data等模块时，tensorflow会产生内存泄露；当内存泄露发生时，我们期望及时保存checkpoint，返回相应的状态，然后重新启动tensorflow进行增量训练。

如果采用subprocess.call()方案在子进程中调用tensorflow，需要自行实现参数、结果的序列化和反序列化，比较麻烦。
本文给出一种通过multiprocessing模块在子进程中调用tensorflow的实现，传参数so easy
话不多说，上代码：

# coding=utf-8
'''
Created on Sep 18, 2018

@author: colinliang
'''
from __future__ import absolute_import, division, print_function
def run_tf(args, queue=None):
    print('\n\n------- beginning of tf process')
    print('args for tf: %s' % args)
    import tensorflow as tf
    sess = tf.Session()
    
    import psutil
    mem_start=psutil.virtual_memory().available
    batch=2000000
    n=(args['epoch']+1) *batch
    with tf.device('/cpu:0'):
        v = tf.get_variable(name="tf_var", shape=[n], dtype=tf.float32, initializer=tf.random_uniform_initializer(-1, 1, 0, dtype=tf.float32))
    sess.run(tf.global_variables_initializer())
#     print( (mem_start-psutil.virtual_memory().available)/batch)
    if(mem_start-psutil.virtual_memory().available  >batch*12): #内存检测，有内存泄露时及时退出
        result={'exit code':-1}
        if(queue is not None):
            queue.put(result)
        return result
    
    import time 
    time.sleep(10)
    
    r = sess.run(v[0])
    print('sess: %s' % sess)
    sess.close()
#     tf.reset_default_graph()
    result={'first elem of tf var':r}
    if(queue is not None):
        queue.put(result)
    print('------- end of tf process')
    return result

#####################################################
from  multiprocessing import Process, Queue
# 参考自https://stackoverflow.com/questions/39758094/clearing-tensorflow-gpu-memory-after-model-execution
for i in range(5):  # Process的使用方法 https://docs.python.org/2/library/multiprocessing.html
    q = Queue()
    args = {'epoch':i}
    p = Process(target=run_tf, args=(args, q))
    p.start()
    p.join()   
    print("result: %s" % q.get())

分享到：

C++函数中的静态变量 | python subprocess shell=True False 的异 ...

2018-09-18 13:29
浏览 4131
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

通过multiprocessing模块及时释放tensorflow的资源

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

通过multiprocessing模块及时释放tensorflow的资源

评论

发表评论

相关推荐

tensorflow 静态编译笔记 -- linux系统

python subprocess shell=True False 的异同

利用pyenv管理默认python版本

Tensorflow 通过性能分析工具查看变量位置

tensorflow中的word2vec

python 获取 当前行号，函数名称， 文件名

tensorflow 字符串转数字（hash函数， 字符串解析为数字）

tensorflow 单机多卡示例--数据并行

linux下将python作为后台服务

python代码优化笔记，cython等

tensorflow杂记

Keras切换backend ： theano --> tensorflow

Ubuntu16.04 源码安装GPU版tensorflow

神经网络计算加法---tensorflow中的变量的embedding表示

tensorflow 之tensorboard

python 列出安装的包

windows下编译Python包

windows 10下安装GPU版MXNet

cs231n 笔记

Python语法资料汇总

最近访客更多访客>>

python 获取当前行号，函数名称，文件名

tensorflow 字符串转数字（hash函数，字符串解析为数字）