mitmproxy 代理流量

前言

  设置代理收集流量是很常见的漏洞扫描方法。很早之前@猪猪侠就有了https://github.com/ring04h/wyproxy 的思路。不得不说这是一种流量补充好方法,可以拿到爬虫爬不到的点。
  我一开始用的代理是mitmproxy1.8,代理同时就进行去重复和入数据库处理。这样出现的问题很明显:
  1.代理速度超级慢
  2.低版本的mitmproxy老是出现连接断开和证书错误的各种bug

解决

升级mitmproxy

  首先升级mitmproxy到2.0以上就解决了第二个问题,mitmproxy废弃了之前版本,直接用python3去写了mitmproxy的最新版本。
  因为不会python3,所以一直抵触去升级mitmproxy...升级以后非常稳定,证书设置过以后不会再有问题了,很稳定。现在看来还是得去接触新的技术,老的东西不维护难免会被淘汰。
  安装python3并安装mitmproxy模块:

wget https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz
tar xvf Python-3.6.3.tgz
cd Python-3.6.3/
./configure --enable-optimizations
make -j8
sudo make altinstall
sudo python3.6 -m pip install mitmproxy

  这样mitmproxy就安装好了。
  再说一下http证书安装的问题。
  启动一个代理,加上--insecure。不然会出现https://github.com/mitmproxy/mitmproxy/issues/1608 这个问题。

mitmdump -p 8088 --insecure

  找到一个设置证书的快方法,挂上代理,然后直接访问http://mitm.it/ 根据提醒就可以安装好https的证书了。

代理和数据处理分离

  第二个问题参考了https://github.com/5up3rc/NagaScan/tree/master/proxy 设计理念。他采用了代理和数据处理分离的方法让扫描速度变快。

mitmproxy代理程序只做一件事情就是把每一条记录流量直接记录到文件中
数据处理程序处理最新生成日志,进行url白名单,去重复等数据处理

mitmproxy代理代码:

# -*- coding: utf-8 -*-

""" Capture mitmproxy Request Logs into a file

This module is used to capture mitmproxy Requests and write to a file.

Usage:
  mitmdump -p 443 -s "proxy_mitmproxy.py logs.txt"  --insecure

"""

import sys


def get_request_info(flow, item):
  request_item = ""
  try:
      if item == "port" and flow.request.port:
          request_item = flow.request.port
  except Exception:
      pass
  try:
      if item == "protocol" and flow.request.scheme:
          request_item = flow.request.scheme
  except Exception:
      pass
  try:
      if item == "path" and flow.request.path:
          request_item = flow.request.path
  except Exception:
      pass
  try:
      if item == "host" and flow.request.host:
          request_item = flow.request.host
  except Exception:
      pass
  try:
      if item == "method" and flow.request.method:
          request_item = flow.request.method
  except Exception:
      pass
  try:
      if item == "post_data" and flow.request.content:
          request_item = flow.request.content.decode()
  except Exception:
      pass

  try:
      if item == "cookie" and flow.request.headers['cookie']:
          request_item = flow.request.headers['cookie']
  except Exception:
      pass
  try:
      if item == "referer" and flow.request.headers['referer']:
          request_item = flow.request.headers['referer']
  except Exception:
      pass

  try:
      if item == "auth" and flow.request.headers['Authorization']:
          request_item = flow.request.headers['Authorization']
  except Exception:
      pass
  return request_item


def request(flow):
  request = {}
  request['protocol'] = get_request_info(flow, 'protocol')
  request['host'] = get_request_info(flow, 'host')
  request['path'] = get_request_info(flow, 'path')
  request['port'] = get_request_info(flow, 'port')
  request['method'] = get_request_info(flow, 'method')
  request['cookie'] = get_request_info(flow, 'cookie')
  request['auth'] = get_request_info(flow, 'auth')
  request['referer'] = get_request_info(flow, 'referer')

  if request['method'] == "GET":
      request['post_data'] = ""
  else:
      request['post_data'] = get_request_info(flow, 'post_data')

  if int(request['port']) not in [80, 443]:
      url = "{}://{}:{}{}".format(request['protocol'], request['host'], request['port'], request['path'])
  else:
      url = "{}://{}{}".format(request['protocol'], request['host'], request['path'])

  content = chr(1).join([url, request['method'], request['post_data'],
                         request['cookie'], request['auth'], request['referer'], ])
  log_file_name = sys.argv[1]
  log_file = open(log_file_name, 'a')
  log_file.write("%s\n" % content)
  # for k, v in request.items():
  #     log_file.write("%s: %s\n"%(k,v))
  # log_file.write("======================================================\n\n\n")
  log_file.close()

启动的时候:

    mitmdump -p 443 -s "proxy_mitmproxy.py logs.txt"  --insecure

这样就把每一台数据记录到logs.txt中。

数据处理代码:
数据处理这边我用python进行类似tail的操作去对最新生成日志,进行url白名单,去重复等数据处理
代码:

class Tail():
  def __init__(self, file_name, callback=sys.stdout.write):
      self.file_name = file_name
      self.callback = callback

  def follow(self):
      try:
          with open(self.file_name) as f:
              f.seek(0, 2)
              while True:
                  line = f.readline()
                  if line:
                      self.callback(line)
                  time.sleep(0.1)
      except Exception, e:
          print '打开文件失败,囧,看看文件是不是不存在,或者权限有问题'
          print e
          
if __name__ == '__main__':
  log = Log("./logs/proxy.log")

  # 自己定义处理函数
  def test_tail(line):
      line = line.replace("\n", "")
      part = line.split('\x01')
      url = part[0]
      method = part[1]
      post_data = part[2]
      cookie = part[3]
      auth = part[4]
      referer = part[5]
      # print url
      print url, method, post_data, cookie, auth, referer
      # todo:进行url白名单,去重复等数据处理(代码太多,不贴了)

  py_tail1 = Tail('logs.txt', test_tail)
  py_tail1.follow()

结尾

  两个优化,代理速度起飞,你也快试试吧。

标签: none

已有 4 条评论

  1. testt

    请问大婶你这个Log()是从哪里来的? log = Log("./logs/proxy.log")

    1. wilson

      这个log 是我自己定义的。。

  2. Funhity

    大牛之前是不是在蓝盾的那个wilson啊

添加新评论