I’m so grateful that you’re contributing a PR to the repo. You’re improving the course for everyone by sharing your work and your insights. You’ll be recognized in GitHub as a repo collaborator. It’s a powerful way to demonstrate your expertise, and help others, and it’s massively appreciated.
How does a PR work?
On a typical work project, you’d push your code to the team repo with git push and then raise a PR to merge your changes with the main branch.
That’s not usually how it works with Open-Source repos. With 300,000 students across my courses, it would be chaotic if anyone could push to the repo. Rather, we follow a common process:
- Fork the repo, giving you a copy of the repo in your GitHub account, named like this:
https://github.com/your_username/repo_name - Create a branch for your changes with
git checkout -b branch_nameand then usegit add my_filenamethengit commit -m "my message"to explicitly add the files you’ve changed, thengit pushyour change to your fork - In the GitHub UI, raise a PR to merge your branch in your fork with the main repo. Confirm everything follows the guidelines below, then submit the PR.
With that in mind, here are the detailed instructions: https://chatgpt.com/share/6873c22b-2a1c-8012-bc9a-debdcf7c835b
For a PR that’s good to merge, please check:
- Ensure only changes to community contributions are in the PR
- Ensure all outputs are cleared
- Check that your PR involves only a manageable number of changes, and remove LLM generated extras like a verbose README
How to check a PR is in good shape
After you’ve submitted your PR, it will have a link like this:
https://github.com/ed-donner/repo_name/pull/pr_number
Open your PR, and click on the “Files Changed” tab to see all the changes that would be carried out if I merged your PR. The green number on the top right is how many new lines would be added to the repo. The red number is how many lines would be deleted (hint: there should be NO lines deleted!)
Below are the key reasons that I might not be able to merge a PR:
Issue 1: Add, change or delete ANYTHING outside community contributions
If any of the changes are to files outside community contributions, then I’m unlikely to be able to merge the PR.
You can see the files you’ve changed clearly on the “Files Changed” tab. Every file changed needs to be within Community Contributions directories.
If there are any changes outside Community Contributions, I need to test very thoroughly across many platforms. Even harmless looking changes can have unexpected side-effects. I make exceptions for important bug fixes, in which case please check with me first (and thank you!!)
Issue 2: Outputs not cleared
It’s important to clear the outputs of all your Notebooks before doing a PR, to avoid the repo growing too large.
If any of the files have more than 1,500 lines of code, or show as ‘Differences are too large to preview’ in GitHub, then it’s likely that you didn’t clear the outputs of your Notebook before pushing. Please clear the outputs and then resubmit.
Issue 3: PR is too large – an entire project of 10+ files or 3,000+ loc
If your PR involves changing more than 10 files or 3,000 lines of code in aggregate (aside from CrewAI projects which often need to be this large!):
I’m super grateful that you would work on such a substantive project and choose to share it with us. But I need to keep an eye on the overall size of the repo. Major projects are probably better handled differently:
Rather than embedding your entire project within Community Contributions – create a single Markdown file or Notebook (.md or .ipynb) in Community Contributions that describes your project, and contains a link to the URL in your repo where you have all your code. That way you still get to describe your findings, you still share your code with all students, without needing to replicate all the code.
If you’d be ok to update your PR like that, I’d be most grateful. As an added bonus, it should drive more traffic to your GitHub repo.
Issue 4: Additional information that might be LLM generated
It’s completely fine to use Agents / LLMs to assist you in making your projects – in fact, I encourage it!
However, the key thing is: you need to be the boss. You are at the helm of the process; the ideas and the design should come from you. The LLM is working for you, not the other way round..
One of the recent trends is to see overwhelmingly large PRs with tons of lower-value stuff:
- An extremely verbose README
- Masses of test classes, not important to illustrate your idea, but time consuming to review
- Heavily commented and overly-defensive code
- Unnecessary extras, like .env.example, requirements.txt, etc
- And the tell-tale emojis everywhere!
There comes a point when a PR is so heavyweight that it’s actually more work to review it (both for me and for other students).
If you use an LLM to generate code, then you are responsible for the final product. Avoid overly lengthy READMEs, unnecessary test files and other signs of LLM slop. An ideal PR is only a few files, and under 1,000 lines of code, except for advanced projects. Less is more!
Issue 5: Files that don’t belong in a codebase
If your PR contains files like a database file, or personal files, or source data that should be downloaded – instead of including these in your PR, please add instructions for how the user can create these files themselves.
Thank you so much
If I’ve sent you to this page, then I think I’m seeing one of these issues in your PR. I don’t mean to sound ungrateful – on the contrary, I’m super grateful for your contribution! If you’re able to make these changes, then I’ll gladly merge it to be shared with everyone.
