# 根据字幕做视频摘要

It is a tool to make a abbreviated video based on subtitles. See Teacher Li Yongle explained the Platonic solid（review version)

This is a remix video, which is selected from the course of Platonic solid, and the deduction and expansion part were removed. The original video is about 12 minutes, and the remix video is only 4 minutes. It is convenient for quick review of concepts.

The process follows:

• then you can edit the subtitles, delete the unimportant parts, leaving only the words that need to be retained, but note that you still need to keep the original line breaks.
• Then the program will Automatically find reserved text in the subtitles, and find the corresponding time.
• At last, the program will download the video clips corresponding to the time, and connect them together.

Recommend to run this program on Google Colab. The source code on github, which can be uploaded to google colab to run.

This program requires webvtt-py, youtube-dl and ffmpeg and of course pandas

Note that on Google Colab, both webvtt-py and youtube-dl need to be installed each time.

In [4]:
try:
import webvtt
except:
!pip install webvtt-py
import webvtt
try:
except:
import pandas as pd
import os, subprocess, difflib


The first step is to download the video subtitles on youtube. There are two types of subtitles in youtube. One is uploaded by the author himself, and the other is automatic generated by speech recognition. The automatic subtitles are translated into various kinds by the machine learning.

If you download automatic subtitles, use --write-auto-sub. If you download the subtitles uploaded by the author, it is --write-sub. When downloading subtitles, I don’t think I need to download the video, so adding --skip-download. The language of the subtitles is choosing by "Language Tags", eg. simplified Chinese=zh-Hans, English=en

In [5]:
def download_youtube_sub(youtube_url,out_filename, lang="zh-Hans"):
'--write-auto-sub', # 如果下载作者自制的字幕则使用--write-sub
'--sub-lang', lang, # 选择语言
'--no-continue',    # 强制覆盖已经下载的文件
'--output', out_filename] # 输出文件名格式
# final out filename=out_filename+lang+'.vtt'
p=subprocess.call(vtt_command, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
if p!=0:
print(p.stdout.decode("utf-8"))
else:
return True


## 清理vtt字幕¶

vtt的字幕, 是一种还挺丰富(=复杂)的字幕格式, 里面不仅仅是时间戳和文字, 文字里可以加入丰富的特效. 特别是英文版本的vtt字幕, 为了表现出连贯的效果, 会把当前行与下一行字幕都显示出来, 于是每一句话其实记录了两次(中文似乎没有). 所以要进行一点清理. 将重复的部分删去. 只保留当前说的一句话, 和这句话的起止时间戳. 保存到一个pandas Dataframe里. 毕竟pandas后期好处理

## Clean up vtt subtitles¶

VTT subtitle is a kind of rich text (=complex) subtitle format, which is not only time stamps and texts, but also rich effects in the text. Especially the English version of vtt subtitles, the current line and the next line will be displayed, so each sentence is actually recorded twice (Chinese does not seem to). So cleaning up is need:

• Delete the duplicated part,
• keep only the current one, and the start and end time of the sentence.
• Save to a pandas Dataframe.

After all, pandas is easy to handle later.

In [6]:
def clean_vtt(vtt):
lines = []
starts = []
ends = []
for line in vtt:
extend_text=line.text.strip().splitlines()
repeat=len(extend_text)
lines.extend(extend_text)
starts.extend([line.start] * repeat)
ends.extend([line.end] * repeat)

previous = None
new_lines=[]
new_starts=[]
new_ends=[]

for l,s,e in zip(lines,starts,ends):
if l == previous:
continue
else:
new_lines.append(l)
new_starts.append(s)
new_ends.append(e)
previous = l

df={"start":new_starts,"end":new_ends,"text":new_lines}
df=pd.DataFrame(df)
return df


## merge vtt subtitles into scripts¶

One line one sentence. Keeping return \n is easier to read and edit.

In [7]:
def vtt_to_transcript(vtt):
df=clean_vtt(vtt)
transcript="\n".join(df["text"])
return transcript


## 在字幕中找到对应的句子¶

(在这里记录url, 是为了日后扩展成从多个字幕中找句子, 那时就要记录每个字幕从哪里来的, 这是后话, 暂时还没有开始)

## Find the corresponding sentence in the subtitle¶

Given a sentence, you need to find the timestamp of this sentence in the subtitles, so that you can slice the video according to the timestamp. There is a difflib in the standard library of python, which is used for comparison. I use get_close_matches to find the closest text. By this way, even if only a part of a sentence is choosen, it is possible to find the whole sentence.

Consider a given sentence that must be extracted from the subtitles, so don't worry about not finding the given sentence.

The advantages of using pandas appear here. You only need to merge two databases, then you can find the intersection of the same field and the same content, then you can get the timestamps.

(The url is recorded here, in order to expand to find sentences from multiple subtitles in the future. At that time, it is necessary to record where each subtitle comes from. This is a future plan, but it has not yet started.)

In [8]:
def find_text_in_vtt(text,vtt,youtube_url):
df=clean_vtt(vtt)
chosen_text=[]
for t in text.splitlines():
sentence=difflib.get_close_matches(t,df["text"],n=1)
if sentence:
chosen_text.extend(sentence)
df_chosen=pd.DataFrame(chosen_text,columns=["text"])
df_chosen=df_chosen.merge(df)
return df_chosen


The above is the part that handles subtitles.

Take the video of Li Yongle’s lecture as an example. The subtitles of this lesson.

In [11]:
if __name__=="__main__":
vtt_pre="test"
lang='zh-Hans'
vtt_filename='.'.join([vtt_pre,lang,"vtt"])

print(vtt_to_transcript(vtt)[:100]) #blog中只显示100个字符

各位同学大家好 我是李永乐老师



## Manually select the key parts¶

Yep, this program can't do artificial intelligence, only manual intelligence. How do I know which part you think is more important? So please copy and paste the subtitles printed above, and then delete the unwanted parts, keep the key parts, be careful not to change the origional wrap.

In [13]:
if __name__=="__main__":
keynote='''

'''


Then we can get the start and end timestamps of a sentence:

In [14]:
if __name__=="__main__":

                 text           end         start  \
0           什么叫柏拉图立体呢  00:00:15.539  00:00:13.859
1         他提出正多面体只有五种  00:00:24.339  00:00:21.719
2   所以我们就把正多面体称为柏拉图立体  00:00:27.019  00:00:24.519
3  首先它必须每一个面… 它是个正多面体  00:00:32.520  00:00:29.160
4        每个面都是同样的正多边形  00:00:38.100  00:00:32.679

url


• 通过ffmpeg下载视频片段, ffmpeg的-ss -to可以设定起止时间, 但如果直接使用起止时间, 应当用-copyts来强制使用原始视频的绝对时间. 而且-ss放在-i之前还是之后,也有不少学问. 具体请参考https://trac.ffmpeg.org/wiki/Seeking

• Get the real youtube video file address. The real video file is not what you see as www.youtube.com/watch?v=m9AE_G_9c7Y but a super complex address with a signature, I guess it might change on a different computer or in different time periods. But YouTube-dl is able to get this address. For convenience, I will directly capture the video + audio merged format. There is no need to download the video at this time.
• Download video clips via ffmpeg, ffmpeg's -ss -to can set the start and end time, but if you use the start and end time directly, you should use -copyts to force the absolute time of the original video. And -ss Before or after -i, there are also many studies. For details, please refer to https://trac.ffmpeg.org/wiki/Seeking

Here, x264 and mp3 are used to re-encode video and audio respectively, but resulting a slower speed. In theory, "copy" can be used, but I find out that because each fragment is short and the key frame is lost with high probability, it cannot re-encoded well. If so, it is very likely that the picture is still.

Although ffmpeg also has a python processing version, but that library is also generates the ffmpeg command, and I need to learn a bunch of custom syntax. So I prefer subprocess.run

In [15]:
def download_part_youtube(video_url, start, end,output_filename):
real_video_url = info_dict.get("url", None)

ffmpeg_command=['ffmpeg',
'-ss',start,
'-i',
real_video_url,
'-to', end,
'-c:v', 'libx264', '-c:a', 'libmp3lame', #视频重编码使用x264, 音频重编码使用mp3
'-copyts', # 强制使用原视频的绝对时间
'-y', # 强制覆盖
output_filename]
p=subprocess.run(ffmpeg_command, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
return(p.stdout.decode("utf-8"))


## Capture video clips based on text and merge¶

This is to merge the previous preparations together. According to the url in the extracted database, the start and end time, go to each url to download the video clips from the start to end time. These video clips are downloaded, the file name is record to a txt file. Then use ffmpeg's concat function to merge the files in the txt file into one video file. Finally, clean the temporary file and you're done.

In [16]:
def get_youtube_by_keynote(df, final_output):
#临时文件命名, 记录临时文件列表
temp_file_list=["tmp_{}.mp4".format(index) for index in range(len(df))]
temp_input='tmp_input_files.txt'
with open(temp_input,'w') as f:
for index in range(len(df)):
f.write("file '{}'\n".format(temp_file_list[index]))

# 遍历数据库, 下载每个视频片段
for index, row in df.iterrows():

# 将临时文件合并起来
ff_concat_command=["ffmpeg",
'-f','concat',
'-safe','0',
'-i', temp_input,
'-c:v', 'copy', '-c:a', 'copy', '-copyts', #合并似乎不需要重新编码
'-y',
final_output
]
subprocess.run(ff_concat_command, shell=False)
# 打扫临时文件
os.remove(temp_input)
for f in temp_file_list:
os.remove(f)


Merge together. If it run locally, just look for the contents of the local directory. If it is placed on the colab, to download file is to call the file.download function in the google.colab library.

In [ ]:
if __name__=="__main__":
final_output="final.mp4"

try:
except:

Another possible weird application is that I can also write some common sentences NOT in the video subtitles. If difflib.get_close_matches can find the close statement, it can be extracted. Then given a set of videos, For example, a video of a president's speech. You can also make a video that is “out of context”. Of course, every sentence is intermittent, and the background may be constantly changing. But it may be interesting. Some modifications needs to be performed on the searched part. Might do it later.