Monday, June 11, 2018

Design development of RTEMS release notes generator

Design development of RTEMS release generator:

1. At first, we need to figure out what our goal is. Our goal can be divided to a couple of tasks. For example, we need to include all needed data in the release notes, to fix formatting issues, etc.

2. After our goal is divided into a couple of tasks, we need to figure out which one is the most essential one. At this point of time, getting needed data is the most important thing because some of needed data is missing in the release notes.


3. Thirdly, we need to figure out what is the best solution for our problem. In general, there are two ways to get needed data. One is to parse HTML page, the other way is to parse the XML in RSS feed. Finally, we decide to use XML parser to parse the XML. For more detailed reasons, please read my other post in this blog:

parse HTML page VS parse XML page

How do I feel about working in an open source project?

It is my first time to work on an open source project, so everything is new and exciting for me. It has been almost a month since I work on RTEMS project. Definitely 5 stars to my mentors because they are very helpful. Also, other people are always willing to help which is really impressive. I understand why helping each other is one of the most essential spirits of open source community.
As a developer working on an open source community, every work is preferred to be public:
1. Email is public. Normally, email is preferred to be public. As a developer, I subscribe two important mailing, user@rtems.org and devel@rtems.org. user@rtems.org is for communication between users and developers. For example, if a user has any questions, suggestions and comments, he/she would send email to user@rtems.org, so I can get feedback from a user immediately as a developer. As for devel@rtems.org, it is used by developers, we can talk about technical questions like how to fix a bug here.

2. All code is public. Since code is pushed on github, it is easy for my mentors and other people to review it. Also, if a user who has a technical background is interested in how the project works, it is easy for he/she to get access to its source code. Notice: copyright belongs to a specific developer or an organization.
3. Code is supposed to be consistent. As a part of developer team, my code is supposed to be consistent with other developers. For example, details like white space and column limits should be aware. It is not only making the code more consistent and professional, but also more readable for later developers.

It is just a beginning for me to work on an open source project, I will keep going! 😊

Why do we need a release notes tool in the RTEMS project?

During my participation of google summer of code this year, the first project I work on is RTEMS release notes generator. Why do we need a release note tools in the RTEMS project?

1. Missing data in release notes. Currently, the release notes could be regularly used, but some essential data are missed in the release notes. Therefore, we need to extract all of the needed data from a ticket’s page to put on the release notes.

2. Formatting issues. Some data is not readable because of formatting issues. Therefore, it is also one of my goals to provide a better formatting of the ticket. Also, date formatting is not that reasonable. For example, the date is a local setting but not a full date.

The release notes tool fits in the RTEMS project quite well, because release notes can be generated automatically from the Trac data including all of the needed data now. Therefore, the release notes is not only more readable, but also contains more essential data.

Sunday, June 10, 2018

parse HTML page VS parse XML page


If a webpage is formatted by RSS, there are two ways for us to get text data from a webpage. One way is to parse HTML page, the other way is to parse the XML in RSS feed.

At first, I use an external python package called BeautifulSoup to extract data from HTML page, thanks to the reminder from my mentor, I realized that the result is fragile to parse HTML. Therefore, I decide to parse the XML in RSS feed. Also, to parse an XML in RSS feed, it is easier to use the XML parser in the standard python.


In general, there are two reasons for me to parse XML in RSS feed using XML parser:


1. It is more stable to parse XML than to parse HTML. The webpage we are scraping might change frequently, if we extract data from HTML, our code might no longer work once the web template is changed. I use BeautifulSoup before to extract data from HTML. To use BS, I extract data by finding data in each tag visually. For example, I extract a comment within <comment> blahblahblah <,comment>, however, once the web template is changed, this comment "blahblahblah" might be stored in a new tag as <others> blahblahblah </others>, if so, my codes does not work anymore. We do not need to worry about this problem parsing XML in RSS feed since RSS feed is a stable file format.

2. It is more efficient. After deciding to parse XML but not to parse HTML, I need a tool to help me. XML parser in the standard python would be the best choice. It is efficient and user friendly. For example, I need to parse the yellow section of #2988 ticket information on:

https://devel.rtems.org/ticket/2988

Code is short:




Wednesday, June 6, 2018

Use logging instead of print in python

Quick! Think about a way to output a message in python! If you think about print immediately, you are not the only one. It is true that print is the most popular method to output a message in python, but using logging is actually better.

Advantages of using logging instead of print:


1. User friendly. A user can see when and where a logging comes from.


2. Easier to manage.  It is easier to format them.


3. Easier to differentiate. Logging can be differentiated based on severity.