Summarizing Youtube Comments

These days I'm looking for ways to do quick prototyping for some projects that I want to work on. I've heard about streamlit before, but have never really tried it until now. It was super easy and straightforward. Let me walk through a quick walkthrough of this app that scrapes youtube comments, and even summarizes them for you using either an openai or perplexity api key.

I've mostly just started by using a template from streamlit and editing two files requirements.txt and streamlit_app.py.

requirements.txt just contains all of the packages that you usually install when you're running the app locally, so typically you'll have streamlit in there.
streamlit_app.py is the file that contains basically all your code, and in this case I have a couple of functions.

Comment Downloader

Downloading the comments from Youtube comes from this package called youtube-comment-downloader.

downloader = YoutubeCommentDownloader()
        comments = downloader.get_comments_from_url(Youtube_URL, sort_by=SORT_BY_POPULAR)
        
        all_comments_dict = {
            'cid': [],
            'text': [],
            'time': [],
            'author': [],
            'channel': [],
            'votes': [],
            'replies': [],
            'photo': [],
            'heart': [],
            'reply': [],
            'time_parsed': []
        }
        
        for comment in comments:
            for key in all_comments_dict.keys():
                all_comments_dict[key].append(comment[key])
        
        comments_df = pd.DataFrame(all_comments_dict)
        return comments_df

We essentially just get the comments from the URL, and then we put it into a pandas dataframe. In this case, we collected a lot of information into our dictionary including text, time, author, and more. We stored all of them into a pandas dataframe, that we'll use later for our summarization.

Summarization

For summarization, we use an openai chat completion call to summarize the comments. As a student, I have access to a free pro plan for Perplexity which gives me a 5$ credit every month, so in this implementation I use the perplexity api key and their model sonar-pro.

messages = [
    {"role": "system", "content": "You are an AI assistant focused on helping content creators to improve and grow their channels."},
    {"role": "user", "content": f"Summarize this text: {input_text} into five headers: 'why viewers watch', 'how to improve', 'how to improve content production' , 'how to improve channel growth', 'how to improve engagement'."}
]
client = OpenAI(api_key=api_key, base_url="https://api.perplexity.ai")
response = client.chat.completions.create(
    model="sonar-pro",
    messages=messages,
    max_tokens=max_tokens,
    temperature=0.7
)
return response.choices[0].message.content

In this snippet, I first clarify what the 'messages' are inside the chat completions call. First, we tell the system that they are an AI assistant, and then we provide the first message from the user asking them to summarize the text input_text. Then I specify the client by calling OpenAI, but also specifying the base_url to be perplexity as that's the api key I'm going to use. Then I call the chat completions create method, and specify the model to be sonar-pro, referencing the same messages above.

Streamlit App

So far, I've mostly just used and written simple python code that one might use on a jupyter notebook or a short script. But now, we can just put all of this into a streamlit app. In our main function, we'll use st to refer to streamlit methods and classes, and start with things like

st.header("YouTube Comments Downloader and Summarizer")
st.write("Download and summarize comments from YouTube videos effortlessly.")

st.divider()

url_text = st.text_input("Enter YouTube URL")
api_key = st.text_input("Enter Perplexity API Key", type="password")

Here, we essentially just start with a header, then a short description of the page. We added a divider, and then collect both the url_text and api_key as a text_input. The user has to input a youtube url and their own api key for summarization.

Now, we call the functions we defined above.

raw_df = youtube_url_to_df(url_text)
    
if raw_df is None or raw_df.empty:
    st.info('Please enter a valid YouTube link.')
else:
    download_df(raw_df, "Raw")
    
    st.subheader("Raw Comments")
    st.dataframe(raw_df[['text']].head(10))  # Show top 10 comments
    
    combined_comments = " ".join(raw_df['text'].tolist())
    
    if api_key:
        with st.spinner("Summarizing comments..."):
            summary = summarize_text(combined_comments, api_key)
            st.subheader("Summary of Comments")
            st.write(summary)
    else:
        st.warning("Please provide your API key to generate summaries.")

We first call the youtube_url_to_df function, and then check if the dataframe is empty. If it is, we show a message to the user to enter a valid youtube link. Otherwise, we download the raw comments, and show the top 10 comments. We then combine all the comments into one string, and then call the summarize_text function.

Finally, we can run the app by calling streamlit run streamlit_app.py in the terminal or we can run this online by hosting it on streamlit directly.

The code for this app can be found here and the streamlit app can be found here.