1 背景

最近用django写了好几个项目, 觉得django自带的orm简直是利器, 有一天突发奇想, 既然django能将自定义的models.py中的class转化成对应的数据库表, 那有没有某个库能根据这些class画一张ER图出来呢? 这样就能部分解决码农不愿写文档的的需求了, 实在是装逼神器. 但是网上搜了一下只搜到个pony的库, 可以画个ER图, pony会根据这个ER图生成python代码, 刚好和我的需求相反. 稍微思考了一下觉得只要根据类解析出表的各个字段, 然后根据外键获得表之间的关系, 最后使用plantuml或者graphviz理论上就能画出一个ER图了. 于是我准备自己造轮子, 哪怕最终实现效果不咋地, 就当是练手了.

首先我想的是用户输入一个models.py的文件, 然后我这个程序import进来后用dir以及type找到所有继承了models.Model的类(也就是最终要转化成数据库表的类), 这一步很容易就完成了. 接下来我起初以为能够用同样的方法获取class中继承了Field类的属性(也就是将来会转化成表中字段的属性), 但是这时候我发现生成的对象的属性并不是一个Field子类实例, 对于

class Student(models.Model):
    name = models.CharField(max_length=65)

这样的一个类来说, Student实例的name属性的类型就直接是一个字符串了, 其他Field也一样, 例如IntegerField的属性直接就是个int类型.

那么django是如何完成识别一个继承了models.Model的class的各个字段的呢? 于是就要看django的源码了. 稍微一想就会发现, django是在makemigrations命令中奖models.py中的各个class转化成数据库表的, 于是识别字段的工作肯定是在这一调用链中的某一环上.

2 django源码

makemigrations命令是manage.py的一个参数, 首先当然要来看这个manage.py:

#!/usr/bin/env python
import os
import sys

if __name__ == "__main__":
    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "AM.settings")

    from django.core.management import execute_from_command_line

    execute_from_command_line(sys.argv)

代码很简单, 只有execute_from_command_line(sys.argv)是真正干活的, 看名字就知道这个函数根据输入参数来执行动作. 根据import的路径找到这个函数所在的文件, 在Mac中是/Library/Python/2.7/site-packages/django/core/management/__init__.py

def execute_from_command_line(argv=None):
    """
    A simple method that runs a ManagementUtility.
    """
    utility = ManagementUtility(argv)
    utility.execute()

继续找ManagementUtility.execute函数, 就在同一个文件中.

self.fetch_command(subcommand).run_from_argv(self.argv)

这个函数前面是各种检查, 以及判断是不是version或者help子命令, 如果检查都通过, 就到了最后一句fetch_command.run_from_argv.

继续看fetch_command函数:

def fetch_command(self, subcommand):
    """
    Tries to fetch the given subcommand, printing a message with the
    appropriate command called from the command line (usually
    "django-admin.py" or "manage.py") if it can't be found.
    """
    # Get commands outside of try block to prevent swallowing exceptions
    commands = get_commands()
    try:
        app_name = commands[subcommand]
    except KeyError:
        sys.stderr.write("Unknown command: %r\nType '%s help' for usage.\n" %
            (subcommand, self.prog_name))
        sys.exit(1)
    if isinstance(app_name, BaseCommand):
        # If the command is already loaded, use it directly.
        klass = app_name
    else:
        klass = load_command_class(app_name, subcommand)
    return klass

看样子这个函数通过get_commands获取所有支持的命令, 然后根据用户的输入, 通过commands[sys.argv]来获取一个继承了BaseCommand的类, 这个类有个run_from_argv函数, 是真正执行命令的. 看样子我要找的makemigrations命令的执行体就在这个BaseCommand子类中. 于是继续找get_commands:

def get_commands():
    commands = {name: 'django.core' for name in find_commands(__path__[0])}

    if not settings.configured:
        return commands

    for app_config in reversed(list(apps.get_app_configs())):
        path = os.path.join(app_config.path, 'management')
        commands.update({name: app_config.name for name in find_commands(path)})

    return commands

get_commands是通过find_commands找到支持的命令的, 继续找find_commands:

def find_commands(management_dir):
    command_dir = os.path.join(management_dir, 'commands')
    try:
        return [f[:-3] for f in os.listdir(command_dir)
                if not f.startswith('_') and f.endswith('.py')]
    except OSError:
        return []

这个函数是遍历command_dir目录, 然后返回目录中不以’_’开头的.py文件文件名(去除了.py后缀), 也就是说这些文件的文件名就是manage.py所支持的命令. 于是打开command_dir, 也就是django/core/management/commands/, 发现其中果然有个makemigrations.py文件. 但是一看这个文件定义的类Command并没有run_from_argv函数, 那就应该是在基类BaseCommand中了, 根据import的路径找到这个基类所在的文件, 发现run_from_argv调用了execute函数, 而execute函数调用了handle函数, 这个handle函数却被Command类重载了, 所以饶了一圈最终真正干活的就是这个handle函数.

handle函数写了一大堆, 首先是检查makemigrations的参数, 然后调用了django.db模块中的一些内容, 又涉及到了一些特殊名词比如loader, conflict, autodetector, change等, 这里面又涉及到django实现的理论模型, 连蒙带猜觉得这个autodetector很有可能就是我想要的东西. 因为它是MigrationAutodetector类的实例, 而这个类的__init__函数是这样的:

class MigrationAutodetector(object):
    """
    Takes a pair of ProjectStates, and compares them to see what the
    first would need doing to make it match the second (the second
    usually being the project's current state).

    Note that this naturally operates on entire projects at a time,
    as it's likely that changes interact (for example, you can't
    add a ForeignKey without having a migration to add the table it
    depends on first). A user interface may offer single-app usage
    if it wishes, with the caveat that it may not always be possible.
    """

    def __init__(self, from_state, to_state, questioner=None):
        self.from_state = from_state
        self.to_state = to_state
        self.questioner = questioner or MigrationQuestioner()

根据注释, 这个类构造函数接受了旧状态和新状态两个输入, 而看注释的第二段, 这个所谓的”state”很有可能就表示了models.py中所有class的状态, 这个autodetector应该是根据新旧状态的不同计算出一些changes, 然后Command类根据changes调用其他模块生成具体的数据库命令.

再来看Command类的handle函数创建MigrationAutodetector实例时给它的参数:

# Set up autodetector
autodetector = MigrationAutodetector(
    loader.project_state(),
    ProjectState.from_apps(apps),
    InteractiveMigrationQuestioner(specified_apps=app_labels, dry_run=self.dry_run),
)

loder.project_state()返回的是系统旧的状态, ProjectState.from_apps(apps)返回的应该是新的状态, 于是看这个函数:

@classmethod
def from_apps(cls, apps):
    "Takes in an Apps and returns a ProjectState matching it"
    app_models = {}
    for model in apps.get_models(include_swapped=True):
        model_state = ModelState.from_model(model)
        app_models[(model_state.app_label, model_state.name.lower())] = model_state
        return cls(app_models)

其中apps是settings.py中配置的apps, 也就是所有已安装的app. apps.get_models()应该是返回这些已安装的app的所有models, 而ModelState.from_model(model)应该就是根据app的model计算出一个状态. 再看这个ModelState:

class ModelState(object):
    """
    Represents a Django Model. We don't use the actual Model class
    as it's not designed to have its options changed - instead, we
    mutate this one and then render it into a Model as required.

    Note that while you are allowed to mutate .fields, you are not allowed
    to mutate the Field instances inside there themselves - you must instead
    assign new ones, as these are not detached during a clone.
    """

从注释中发现, 终于找到正主了. 看他的from_model函数:

def from_model(cls, model, exclude_rels=False):
    """
    Feed me a model, get a ModelState representing it out.
    """

根据注释, 这个函数输入一个model, 返回一个ModelState. 它的源码挺长, 分开看:

fields = []
for field in model._meta.local_fields:
    if getattr(field, "rel", None) and exclude_rels:
        continue
    if isinstance(field, OrderWrt):
        continue
    name, path, args, kwargs = field.deconstruct()
    field_class = import_string(path)
    try:
        fields.append((name, field_class(*args, **kwargs)))
    except TypeError as e:
        raise TypeError("Couldn't reconstruct field %s on %s.%s: %s" % (
        name,
        model._meta.app_label,
        model._meta.object_name,
        e,
    ))

第二行就解决了背景中遇到的问题, 原来可以通过_meta.local_fields来获取一个model class的所有数据库相关字段, 然后用field.deconstrcut()来获取各个字段的详细信息.

3 后续

后来又去仔细找了一下, 发现了django-extensions带了model转ER图这个功能, 用起来也挺方便, 效果也不错.

Leave a Reply

电子邮件地址不会被公开。 必填项已用*标注

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>