一个奇怪的连接池
object pool object pool pattern
项目里需要用到redis,同时也要有redis的连接池之类的东西,在github上找到了一个连接池(https://github.com/luca3m/redis3m),就是普通的连接池设计思想。
Pool里用一个set成员变量保存所有的连接,当调用getconnection成员方法的时候就在连接池里查找,如果有可用的连接,就获取返回该连接,并且从set中去掉该连接,当client使用完毕以后,再用put成员方法将connection put回连接池里,我觉得唯一需要修改的就是连接池没有最大连接数的限制。我已经pull request了,这也是我人生中第一个被不认识的人接受的request。。。
但是在看组里数据库的连接池的代码时,却发现组里的连接池并不是正常的连接池做法。
Mapreduce part 1
通过coursera课上一个hadoop最基本的例子来看mapreduce,统计单词出现的次数。
我们在hdfs上放置了两个文件,testfile1和testfile2
testfile1: A long time ago in a galaxy far far away testfile2: Another episode of Star Wars
MapReduce定义了如下的Map和Reduce两个抽象的编程接口,由我们来实现:
map: (source data) → [(k1; v1)]
map接受的输入:原始数据
map的输出:将原始文件处理后输出的键值对
在统计单词出现次数这个例子中,map的输入是文本,输出是<word,1>
#!/usr/bin/env python
#the above just indicates to use python to intepret this file
# ---------------------------------------------------------------
#This mapper code will input a line of text and output <word, 1>
#
# ---------------------------------------------------------------
import sys #a python module with system functions for this OS
# ------------------------------------------------------------
# this 'for loop' will set 'line' to an input line from system
# standard input file
# ------------------------------------------------------------
for line in sys.stdin:
#-----------------------------------
#sys.stdin call 'sys' to read a line from standard input,
# note that 'line' is a string object, ie variable, and it has methods that you can apply to it,
# as in the next line
# ---------------------------------
line = line.strip() #strip is a method, ie function, associated
# with string variable, it will strip
# the carriage return (by default)
keys = line.split() #split line at blanks (by default),
# and return a list of keys
for key in keys: #a for loop through the list of keys
value = 1
print('{0}\t{1}'.format(key, value) ) #the {} is replaced by 0th,1st items in format list
#also, note that the Hadoop default is 'tab' separates key from the value
reduce: (k1; [v1]) → [(k2; v2)] 输入: 由map输出的一组键值对[(k2; v2)] 将被进行合并处理将同样主键下的不同数值合并到一个列表[v2]中,故reduce的输入为(k1; [v1])
处理:对传入的中间结果列表数据进行某种整理或进一步的处理,并产生最终的某种形式的结果输出[(k3; v3)] 。
输出:最终输出结果[(k3; v3)]
在统计单词次数这个例子中,reduce的输出是<word,count>
#!/usr/bin/env python
# ---------------------------------------------------------------
#This reducer code will input a line of text and
# output <word, total-count>
# ---------------------------------------------------------------
import sys
last_key = None #initialize these variables
running_total = 0
# -----------------------------------
# Loop thru file
# --------------------------------
for input_line in sys.stdin:
input_line = input_line.strip()
# --------------------------------
# Get Next Word # --------------------------------
this_key, value = input_line.split("\t", 1) #the Hadoop default is tab separates key value
#the split command returns a list of strings, in this case into 2 variables
value = int(value) #int() will convert a string to integer (this program does no error checking)
# ---------------------------------
# Key Check part
# if this current key is same
# as the last one Consolidate
# otherwise Emit
# ---------------------------------
if last_key == this_key: #check if key has changed ('==' is # logical equalilty check
running_total += value # add value to running total
else:
if last_key: #if this key that was just read in
# is different, and the previous
# (ie last) key is not empy,
# then output
# the previous <key running-count>
print( "{0}\t{1}".format(last_key, running_total) )
# hadoop expects tab(ie '\t')
# separation
running_total = value #reset values
last_key = this_key
if last_key == this_key:
print( "{0}\t{1}".format(last_key, running_total))
-rw-r--r-- 1 cloudera supergroup 0 2015-11-14 01:57 /user/cloudera/output_word_0/_SUCCESS -rw-r--r-- 1 cloudera supergroup 61 2015-11-14 01:57 /user/cloudera/output_word_0/part-00000 -rw-r--r-- 1 cloudera supergroup 39 2015-11-14 01:57 /user/cloudera/output_word_0/part-00001
A 1 long 1 time 1 ago 1 in 1 a 1 galaxy 1 far 1 far 1 away 1
Another 1 episode 1 of 1 Star 1 Wars 1
-rw-r--r-- 1 cloudera supergroup 0 2015-11-14 02:05 /user/cloudera/output_word_1/_SUCCESS -rw-r--r-- 1 cloudera supergroup 94 2015-11-14 02:05 /user/cloudera/output_word_1/part-00000
A 1 Another 1 Star 1 Wars 1 a 1 ago 1 away 1 episode 1 far 2 galaxy 1 in 1 long 1 of 1 time 1
-rw-r--r-- 1 cloudera supergroup 0 2015-11-14 02:14 /user/cloudera/output_word_2/_SUCCESS -rw-r--r-- 1 cloudera supergroup 64 2015-11-14 02:14 /user/cloudera/output_word_2/part-00000 -rw-r--r-- 1 cloudera supergroup 30 2015-11-14 02:14 /user/cloudera/output_word_2/part-00001
A 1 Another 1 Wars 1 a 1 ago 1 episode 1 far 2 in 1 of 1 time 1
Star 1 away 1 galaxy 1 long 1
我们可以看到在不进行reduce的时候,输出就是map的输出,当有一个reducetask的时候,所有的key,value都被传到这个reduce中。而有两个reduce的时候,key value在被按key合并后就拆分到了两个reducetask中。
QT杂谈
当QString和std::string互相转换时,容易出现很多问题 Convert QString to string QString是使用utf-16编码的,但是std::string可能会是很多其他不同的编码格式。string自身并没有编码格式的概念,但是string里byte的来源会有很多种不同的编码格式。 我们常常使用的QString函数包括toUtf8,fromUtf8,toLocal8bit,fromLocal8bit,toStdString,fromStdString。
首先是编码格式的问题coding,unicode并不是一种特定的编码格式,而是一个标准,在不同的地方unicode编码可能对应不同的编码格式,通常是utf-16,大家通常区分的utf-8和unicode实际上是utf-16和utf-8的竞争utf-16&utf-8
你的计算机上可能会有ascii,utf-8,local8bit(就是你本地的编码格式,同样是8位,但是并不是统一的unicode编码格式),utf-16等等编码格式,要想保证QString的工作正常,最好的方法自然是保证QString的来源和输出都是使用了正确地对应的to,from方法,tostdstring和fromstdstring都是使用的ascii方法。但是如果你不知道自己的string来源是什么样的编码格式,那么也没有办法保证你的QString是正常工作的。
QT国际化,QString的出现是为了满足QT跨平台跨地域的需要,因此使用了utf的编码格式,无论是从文件中读取还是其他方式获取的byte,都尽量使用QT的方法来处理(使用QT的文件流,QT的http方法),这样可以保证你的byte不会被莫名其妙地改变某些东西。
SimpleHTTPTest
import sched, time
import httplib,sys
#from tester.models import Tasks
class planrequest:
planner = sched.scheduler(time.time, time.sleep)
plannum = 1
planinterval = 10
httpaddress = "127.0.0.1"
httpbody = ""
httpmethod = "POST"
httpport = 8081
taskresult = dict()
taskid = ""
def __init__(self, _plannum, _planinterval, _httpaddress, _httpbody, request):
self.plannum = _plannum
self.planinterval = _planinterval
self.httpaddress = _httpaddress
self.httpbody = _httpbody
taskid = self.getclientip(request) + str(time.time())
newtask = Tasks(task_text = taskid, task_body = self.httpbody, task_type = "P")
newtask.save()
def dowork(self, worker):
print "Doing work..."
#your header
headers = {"Content-type": "application/x-www-form-urlencoded","Accept": "text/plain"}
conn = httplib.HTTPConnection(self.httpaddress, self.httpport)
conn.request(self.httpmethod, self.httppath, self.httpbody, headers)
res = conn.getresponse()
#do something with result
if res.status not in self.taskresult:
self.taskresult[res.status] = 1
else:
self.taskresult[res.status] += 1
#check if all done.
if self.plannum > 0 :
self.plannum -= 1
worker.enter(float(self.planinterval), 1, self.dowork, (worker, ))
else:
#all done.
#doingtask = Tasks.objects.get(task_text = self.taskid)
#doingtask.task_result = self.taskresult
#doingtask.save()
def start(self):
self.planner.enter(float(self.planinterval), 1, self.dowork, (self.planner, ))
print time.time()
self.planner.run()
def getclientip(self, request):
try:
real_ip = request.META['HTTP_X_FORWARDED_FOR']
reqip = real_ip.split(",")[0]
except:
try:
reqip = request.META['REMOTE_ADDR']
except:
reqip = ""
return reqip