Open edX语言词频统计

在翻译Open edX语言时,经常遇到翻译不一致的问题。由于transifex是众包翻译,不同的人对同一词语的理解存在差异。例如“Learner”这个词汇,可以翻译为“学习者”、“学员”、“学生”,不能说哪个翻译更好。但是在界面中,如果这三个中文词语都存在,会给使用者带来混乱,认为这些不是指向同一个含义。

广州英荔的@liuxing3169写了一个小程序,统计Open edX语言中的词频。从中可以看到哪些词语出现的频率高,再对频率高的词语进行规范,设定一个术语表给翻译者。这是他的源代码 https://github.com/liuxing3169/Simple-python-programming-exercises

目前算出来的部分结果如下

the,2313,
to,1674,
you,1066,
your,951,
a,947,
this,884,
for,856,
course,785,
and,754,
in,728,
of,700,
is,599,
or,536,
not,529,
be,434,
are,359,
with,341,
that,341,
have,296,
an,278,
can,278,
on,271,
if,259,
will,256,
please,242,
by,215,
must,211,
has,206,
content,192,
{platform_name},190,
as,188,
from,187,
email,185,
any,185,
all,182,
use,182,
access,180,
file,172,
certificate,171,
enter,169,
name,158,
no,156,
error,153,
new,148,
been,144,
at,142,
student,141,
enrollment,136,
add,135,
course.,132,
page,129,
account,129,
learners,125,
problem,124,
students,121,
information,118,
user,116,
date,114,
was,112,
team,110,
try,110,
id,109,
we,106,
cannot,105,
it,105,
create,105,
when,104,
image,102,
see,102,
video,102,
do,100,
group,99,
view,97,
verified,95,
only,95,
number,94,
courses,93,
more,92,
select,91,
there,91,
library,89,
about,88,
enrolled,85,
want,84,
api,84,
address,83,
upload,83,
code,82,
click,80,
could,79,
plural,79,
download,78,
does,77,
contact,77,
request,77,
sure,77,
list,77,
transcript,76,
verification,75,

可以看到其中有一些频率且容易不一致的词,如enrollment,learners,library等。Open edX中文社区将根据这个词表推出一套标准的术语定义,来方便翻译者。

Posted in Open edX.

edustack

edustack webmaster

发表评论

电子邮件地址不会被公开。 必填项已用*标注

thirty nine − = twenty nine

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据