Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I am engaging in redesigning a linux c++ server application. The application act

ID: 654665 • Letter: I

Question

I am engaging in redesigning a linux c++ server application. The application acts as file relayer in the form of receiving file packets (incuding control packets and data packets) from client A, writing these packets into a local data file then create and update index file, finally start many threads to read the file data and forward these data packets to its receivers such as client B, client C, client D, etc. Flow chart is like: Client A -> Server -> Client B, Client C, Client D, ...

Currently the application is using Producer-Consumer pattern to pass data packets among modules, here are some modules we have: SocketModule(contains a thread for listening and reading incoming data of all sockets), SessionModule(for managing user tcp sessions), FileRelayModule(contains many threads for doing jobs like processing incoming packets and writing into local data file, reading local data files and updating index files and forwarding these data packets to receivers via TCP sockets).

The problem is that its performance is so bad, take an example, it takes about 15 minutes for Client A to send a file of size 200M to Client B, but I did a p2p file sending and receiving between different machines it takes only 20 seconds.

We think the main cause of bad performance is that we store sockets information in a global std::map, then every thread who wants to send out data (to clients) needs to wait for the lock for std::map, thus the application acts like single-threaded application.

As the original design of this application doesn't follow principles of design pattern and this results in many issues, so we want redesign it using MVC pattern, adding roles of thread manager and file manager. The first item I think of is to use a ThreadPool, then dispatch every accepted socket into the thread who has the least clients(sockets number). Is this possible or reasonable for this application? What would you do if you were the designer of this project?

Explanation / Answer

If I were the designer of this project, I would focus on the performance first, and then any design patterns.

Performance improvement starts with knowing where the bottlenecks are, not where you think they are. You could spend many hours (or days, weeks) optimizing a part of the program that does not influence the overall performance at all. There are tools for profiling concurrent programs, but I usually just let the threads write some log info about where they spend their time.

First of all, the benchmark time. The P2P situation with which you compare is not exactly equal to your own application. In the P2P case, the sending client already had the file, whereas in your application's case, that file needs to be uploaded first. From what I gather, this upload stage is a single-threaded process too, and it is worthwhile to check how long it takes. A comparable situation would be using FTP to upload the file, and then P2P to distribute it. Also, timings over a local network would be vastly different from timings over the internet.

With that out of the way, though, I'd say that a factor 15 improvement is a nice goal.

At first glance, I see the following potential bottlenecks:

The file is read in its entirety, and then further processed, before distribution starts. Perhaps this process can also be made concurrent, and perhaps sending file blocks to clients can start as soon as you have them, adding the extra information later. If need be, the client can do its own reconstruction of the complete file (just as with P2P).

Disk I/O. It looks like the file is written to disk, and each thread is performing disk I/O to read it. This may be sped up with caching or memory mapping. You may also want to check that your disk hardware can handle the I/O.

The std::map is a horrible data structure quite on its own, let alone when it is locked. It is a balanced red-black tree, so each insert or remove may trigger a rebalance, and each search is O(lg n). On top of that, memory allocation may seriously impact performance. What you would like is to see each socket as a Task, and have unfinished tasks in a simple TaskQueue. A (double-)linked list (possibly with a memory pool) or a std::vector acting as stack is enough. That will reduce the wait time to O(1) with a very low constant.

Socket I/O. Your threads may be waiting a long time before the I/O call with the clients is finished. By setting it up as asynchronous calls, they can handle writing to a socket, put the socket on the TaskQueue, and let another thread handle the callback.

A ThreadPool is a good idea, but I wouldn't burden each thread with multiple clients. Let them read a simple task from the TaskQueue, perform it quickly, and then insert the updated task back into the TaskQueue. Each thread just does a different thing based on some "command" in the task.

Last but not least, is the file binary or compressed? Reducing the size of data to transfer is also a valid optimization.

I think that what you are looking for in MVC is the responsiveness and concurrency, the "feel" that it acts fast, as found in a lot of MVC applications. That is however due, in part, to the messaging system that MVC uses, and its implementation. (The other parts are that different Views and Controllers can behave concurrently, and that the Model is non-blocking.) MVC in itself guarantees no performance gain.

So I would focus on the underlying patterns, that may even be used by MVC itself: ThreadPools, TaskQueue, and simultaneously asynchronous I/O and caching.

With that in place, I don't see a real use for MVC. MVC is an abstraction for different Views and different Controllers, that show and act upon the Model in different ways. What you have, however, is a lot of the same. So instead of, say, the Model telling each thread or client that it has a file ready, it can just put so many tasks on the queue, and not care about which process handles what exactly. As long as the queue is empty at the end of the process, you're okay.

Anyway, as I said, before you do anything, make sure that you know what the actual bottlenecks are.